Getting a median value in MySQL 8.0+

MySQL 8.0 has a new Window functions feature that allows doing aggregation per row.

If you are interested in details you are more than welcome to read an official documentation.

https://dev.mysql.com/doc/refman/8.0/en/window-functions.html


As for me I'm just going to show you how to get a median value from a set using that feature in a plain MySQL query.

Let's say we have a table with people's age and we want to get median value for the age.

CREATE TABLE `source` (
  `id` integer PRIMARY KEY AUTO_INCREMENT,
  `name` VARCHAR(200),
  `age` integer
);

INSERT INTO source (`name`, `age`) VALUES
  ('Pete', 25),
  ('Steve', 7),
  ('John', 43),
  ('Alice', 28),
  ('Vera', 50)
;        

First, we need to introduce a ranking and add totals to every row:

SELECT 
`age` AS `age`,
count(1) over () AS `total_rows`, 
row_number() over (ORDER BY `age` ASC) AS `ranking`
FROM `source`;        

What do we have as a result here? A brand new table that contains the following columns:

  • `age` - exactly the same age that we have in the `source` table
  • `total_rows` - number of rows in the `source` table
  • `ranking` - an index for the age in the row as if the source table were sorted by age. So the lowest age has a ranking of 1, and the highest age has a ranking of total_rows


Great. But where is our median?

Well, according to the definition to get the median you need to get the middle element of this table in case total_rows is odd or the arithmetic mean of the two middle values in case total_rows is even.

Let's make it a condition:

(
/* The first case: total_rows is even 
 * the result is arithmetic mean of the age 
 * for the rows with ranking
 * equals to total_rows/2 and total_rows/2+1 
 */ 
  `total_rows` % 2 = 0
  AND
  (
    `ranking` = round(`total_rows` / 2.0, 0)
       OR 
    `ranking` = round(`total_rows` / 2.0, 0)+1
  )
) OR (
/* The second case: total_rows is odd 
 * the result is the age from the row 
 * with ranking equals to total_rows/2
 * it's a single value but we will use AVG anyway ;)
 */ 
  total_rows % 2 > 0 AND ranking = round(t.total_rows / 2.0, 0)
)        


So we are ready to make a complete query:

SELECT
AVG(`age`) AS `median_age`
FROM
(
  SELECT 
    `age` AS `age`,
    count(1) over () AS `total_rows`, 
    row_number() over (order by `age` asc) AS `ranking`
  FROM `source`
) t

WHERE
(
  `total_rows` % 2 = 0
  AND 
  (
    `ranking` = round(t.total_rows / 2.0, 0)
    OR
    `ranking` = round(t.total_rows / 2.0, 0)+1 )
) OR (
  `total_rows` % 2 > 0 
  AND
  `ranking` = round(t.total_rows / 2.0, 0)
);        

That's it.

You're welcome to the SQLFiddle to try it on your own:

https://www.db-fiddle.com/f/tx5rvnKmYyHgnR5qGf5aup/0

要查看或添加评论,请登录

Viktar Dubiniuk的更多文章

社区洞察

其他会员也浏览了