箱ひげ図の中央値、下位四分位、上位四分位を計算するために、row_number を使用しようとしています。ただし、私のrow_numberソートは同点のためオフになっています。
サンプルデータは次のとおりです。
CREATE TABLE EStats
(
PersonID VARCHAR(30) NOT NULL,
Grade VARCHAR(25) NOT NULL,
CourseDate Date NOT NULL
);
INSERT INTO EStats
(
PersonID, Grade, CourseDate
)
VALUES
('100', '91', '2010-03-01'),
('101', '96', '2010-03-01'),
('102', '88', '2010-03-01'),
('103', '92', '2010-03-01'),
('104', '81', '2010-03-01'),
('105', '85', '2010-03-01'),
('106', '91', '2010-03-01'),
('107', '89', '2010-03-01'),
('108', '99', '2010-03-01'),
('109', '88', '2010-03-01'),
('110', '81', '2011-03-02'),
('111', '77', '2011-03-02'),
('112', '88', '2011-03-02'),
('113', '76', '2011-03-02'),
('114', '69', '2011-03-02'),
('115', '70', '2011-03-02'),
('116', '75', '2011-03-02'),
('117', '88', '2011-03-02'),
('118', '76', '2011-03-02'),
('119', '95', '2012-03-01'),
('120', '96', '2012-03-01'),
('121', '90', '2012-03-01'),
('122', '80', '2012-03-01'),
('123', '85', '2012-03-01'),
('124', '94', '2012-03-01'),
('125', '89', '2012-03-01'),
('126', '97', '2012-03-01'),
('127', '94', '2012-03-01'),
('128', '72', '2012-03-01'),
('129', '88', '2012-03-01'),
('130', '91', '2012-03-01')
これは、ソートが機能していないことを示す私の内部クエリの 1 つです。
SELECT
CourseDate,
Grade,
ROW_NUMBER() OVER (
PARTITION BY LEFT(CourseDate, 4)
ORDER BY Grade ASC) AS RowAsc,
ROW_NUMBER() OVER (
PARTITION BY LEFT(CourseDate, 4)
ORDER BY Grade DESC) AS RowDesc
FROM EStats
CourseDate 2010-03-01 の場合、RowAsc は次のことを行うことに注意してください。
10
9
8
6
7
5
3
4
2
1
ただし、偶数の数値が存在する場合に中央値を計算できるように、すべての行に連続した数値が必要です。(ランクとdense_rankは、それらが残す「ギャップ」のために機能しません)。
実は以下が全貌。繰り返しますが、blox プロット チャートの中央値、下位四分位数、上位四分位数、最小値、および最大値を計算しようとしています。どんな助けでも本当に感謝しています!
WITH Q3 AS
(
SELECT
CourseDate,
AVG(CAST(Grade AS Numeric)) AS Median
FROM
(
SELECT
CourseDate,
Grade,
ROW_NUMBER() OVER (
PARTITION BY LEFT(CourseDate, 4)
ORDER BY Grade ASC) AS RowAsc,
ROW_NUMBER() OVER (
PARTITION BY LEFT(CourseDate, 4)
ORDER BY Grade DESC) AS RowDesc
FROM EStats
)x
WHERE
RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
GROUP BY CourseDate
--ORDER BY CourseDate
),
Q2 AS
(
SELECT
x.CourseDate,
AVG(CAST(Grade AS Numeric)) AS LowerQuartile
FROM
(
SELECT
Estats.CourseDate,
Estats.Grade,
ROW_NUMBER() OVER (
PARTITION BY LEFT(EStats.CourseDate, 4)
ORDER BY Grade ASC) AS RowAsc,
ROW_NUMBER() OVER (
PARTITION BY LEFT(Estats.CourseDate, 4)
ORDER BY Grade DESC) AS RowDesc
FROM EStats JOIN Q3 on EStats.CourseDate = Q3.CourseDate
WHERE EStats.Grade < Q3.Median
)x
WHERE
RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
GROUP BY x.CourseDate
),
Q4 AS
(
SELECT
x.CourseDate,
AVG(CAST(Grade AS Numeric)) AS UpperQuartile
FROM
(
SELECT
Estats.CourseDate,
Estats.Grade,
ROW_NUMBER() OVER (
PARTITION BY LEFT(EStats.CourseDate, 4)
ORDER BY Grade ASC) AS RowAsc,
ROW_NUMBER() OVER (
PARTITION BY LEFT(Estats.CourseDate, 4)
ORDER BY Grade DESC) AS RowDesc
FROM EStats JOIN Q3 on EStats.CourseDate = Q3.CourseDate
WHERE EStats.Grade > Q3.Median
)x
WHERE
RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
GROUP BY x.CourseDate
)
SELECT Q3.CourseDate, Q3.Median AS Median, Q2.LowerQuartile, Q4.UpperQuartile, MIN(EStats.Grade) AS Min, MAX(EStats.Grade) AS Max
FROM Q3
JOIN Q2 ON Q3.CourseDate = Q2.CourseDate
JOIN Q4 ON Q3.CourseDate = Q4.CourseDate
JOIN EStats ON Q3.CourseDate = EStats.CourseDate
GROUP BY Q3.CourseDate, Q3.Median, Q2.LowerQuartile, Q4.UpperQuartile
ORDER BY Q3.CourseDate