### 有沒有更好的方法來計算中位數（不是平均值）

#### [英]Is there a better way to calculate the median (not average)

Suppose I have the following table definition:

``````CREATE TABLE x (i serial primary key, value integer not null);
``````

I want to calculate the MEDIAN of `value` (not the AVG). The median is a value that divides the set in two subsets containing the same number of elements. If the number of elements is even, the median is the average of the biggest value in the lowest segment and the lowest value of the biggest segment. (See wikipedia for more details.)

Here is how I manage to calculate the MEDIAN but I guess there must be a better way:

``````SELECT AVG(values_around_median) AS median
FROM (
SELECT
DISTINCT(CASE WHEN FIRST_VALUE(above) OVER w2 THEN MIN(value) OVER w3 ELSE MAX(value) OVER w2 END)
AS values_around_median
FROM (
SELECT LAST_VALUE(value) OVER w AS value,
SUM(COUNT(*)) OVER w > (SELECT count(*)/2 FROM x) AS above
FROM x
GROUP BY value
WINDOW w AS (ORDER BY value)
ORDER BY value
) AS find_if_values_are_above_or_below_median
WINDOW w2 AS (PARTITION BY above ORDER BY value DESC),
w3 AS (PARTITION BY above ORDER BY value ASC)
) AS find_values_around_median
``````

Any ideas?

## 7 个解决方案

### #1

14

Indeed there IS an easier way. In Postgres you can define your own aggregate functions. I posted functions to do median as well as mode and range to the PostgreSQL snippets library a while back.

http://wiki.postgresql.org/wiki/Aggregate_Median

http://wiki.postgresql.org/wiki/Aggregate_Median

### #2

21

Yes, with PostgreSQL 9.4, you can use the newly introduced inverse distribution function `PERCENTILE_CONT()`, an ordered-set aggregate function that is specified in the SQL standard as well.

``````WITH t(value) AS (
SELECT 1   UNION ALL
SELECT 2   UNION ALL
SELECT 100
)
SELECT
percentile_cont(0.5) WITHIN GROUP (ORDER BY value)
FROM
t;
``````

### #3

7

A simpler query for that:

``````WITH y AS (
SELECT value, row_number() OVER (ORDER BY value) AS rn
FROM   x
WHERE  value IS NOT NULL
)
, c AS (SELECT count(*) AS ct FROM y)
SELECT CASE WHEN c.ct%2 = 0 THEN
round((SELECT avg(value) FROM y WHERE y.rn IN (c.ct/2, c.ct/2+1)), 3)
ELSE
(SELECT     value  FROM y WHERE y.rn = (c.ct+1)/2)
END AS median
FROM   c;
``````

### Major points

• Ignores NULL values.
• 忽略NULL值。
• Core feature is the row_number() window function, which has been there since version 8.4
• 核心功能是row_number（）窗口函數，自8.4版本以來一直存在
• The final SELECT gets one row for uneven numbers and `avg()` of two rows for even numbers. Result is numeric, rounded to 3 decimal places.
• 最后的SELECT為不均勻的數字獲取一行，為偶數數字獲取兩行的avg（）。結果為數字，四舍五入到小數點后3位。

Test shows, that the new version is 4x faster than (and yields correct results, unlike) the query in the question:

``````CREATE TEMP TABLE x (value int);
INSERT INTO x SELECT generate_series(1,10000);
INSERT INTO x VALUES (NULL),(NULL),(NULL),(3);
``````

### #4

0

For googlers: there is also http://pgxn.org/dist/quantile Median can be calculated in one line after installation of this extension.

### #5

0

Simple sql with native postgres functions only:

``````select
case count(*)%2
when 1 then (array_agg(num order by num))[count(*)/2+1]
else ((array_agg(num order by num))[count(*)/2]::double precision + (array_agg(num order by num))[count(*)/2+1])/2
end as median
from unnest(array[5,17,83,27,28]) num;
``````

Sure you can add coalesce() or something if you want to handle nulls.

### #6

0

``````CREATE TABLE array_table (id integer, values integer[]) ;

INSERT INTO array_table VALUES ( 1,'{1,2,3}');
INSERT INTO array_table VALUES ( 2,'{4,5,6,7}');

select id, values, cardinality(values) as array_length,
(case when cardinality(values)%2=0 and cardinality(values)>1 then (values[(cardinality(values)/2)]+ values[((cardinality(values)/2)+1)])/2::float
else values[(cardinality(values)+1)/2]::float end) as median
from array_table
``````

Or you can create a function and use it any where in your further queries.

``````CREATE OR REPLACE FUNCTION median (a integer[])
RETURNS float AS    \$median\$
Declare
abc float;
BEGIN
SELECT (case when cardinality(a)%2=0 and cardinality(a)>1 then
(a[(cardinality(a)/2)] + a[((cardinality(a)/2)+1)])/2::float
else a[(cardinality(a)+1)/2]::float end) into abc;
RETURN abc;
END;
\$median\$
LANGUAGE plpgsql;

select id,values,median(values) from array_table
``````

### #7

0

Use the Below function for Finding nth percentile

``````CREATE or REPLACE FUNCTION nth_percentil(anyarray, int)
RETURNS
anyelement as
\$\$
SELECT \$1[\$2/100.0 * array_upper(\$1,1) + 1] ;
\$\$
LANGUAGE SQL IMMUTABLE STRICT;
``````

In Your case it's 50th Percentile.

Use the Below Query to get the Median

``````SELECT nth_percentil(ARRAY (SELECT Field_name FROM table_name ORDER BY 1),50)
``````

This will give you 50th percentile which is the median basically.