Getting a median value in MySQL 8.0+
MySQL 8.0 has a new Window functions feature that allows doing aggregation per row.
If you are interested in details you are more than welcome to read an official documentation.
As for me I'm just going to show you how to get a median value from a set using that feature in a plain MySQL query.
Let's say we have a table with people's age and we want to get median value for the age.
CREATE TABLE `source` (
`id` integer PRIMARY KEY AUTO_INCREMENT,
`name` VARCHAR(200),
`age` integer
);
INSERT INTO source (`name`, `age`) VALUES
('Pete', 25),
('Steve', 7),
('John', 43),
('Alice', 28),
('Vera', 50)
;
First, we need to introduce a ranking and add totals to every row:
SELECT
`age` AS `age`,
count(1) over () AS `total_rows`,
row_number() over (ORDER BY `age` ASC) AS `ranking`
FROM `source`;
What do we have as a result here? A brand new table that contains the following columns:
Great. But where is our median?
Well, according to the definition to get the median you need to get the middle element of this table in case total_rows is odd or the arithmetic mean of the two middle values in case total_rows is even.
Let's make it a condition:
(
/* The first case: total_rows is even
* the result is arithmetic mean of the age
* for the rows with ranking
* equals to total_rows/2 and total_rows/2+1
*/
`total_rows` % 2 = 0
AND
(
`ranking` = round(`total_rows` / 2.0, 0)
OR
`ranking` = round(`total_rows` / 2.0, 0)+1
)
) OR (
/* The second case: total_rows is odd
* the result is the age from the row
* with ranking equals to total_rows/2
* it's a single value but we will use AVG anyway ;)
*/
total_rows % 2 > 0 AND ranking = round(t.total_rows / 2.0, 0)
)
So we are ready to make a complete query:
SELECT
AVG(`age`) AS `median_age`
FROM
(
SELECT
`age` AS `age`,
count(1) over () AS `total_rows`,
row_number() over (order by `age` asc) AS `ranking`
FROM `source`
) t
WHERE
(
`total_rows` % 2 = 0
AND
(
`ranking` = round(t.total_rows / 2.0, 0)
OR
`ranking` = round(t.total_rows / 2.0, 0)+1 )
) OR (
`total_rows` % 2 > 0
AND
`ranking` = round(t.total_rows / 2.0, 0)
);
That's it.
You're welcome to the SQLFiddle to try it on your own: