mysql - MySQL でのオーバーラップの計算

Question

どのクラスが最も重複しているかを調べようとしています。データはMySQLに保存され、各学生は、受講するクラスごとにデータベースに完全に個別の行を持っています（私はそれを構成していないので、変更できません）。下の表の簡略版を貼り付けました。実際には約20の異なるコースがあります。

CREATE TABLE classes
(`student_id` int, `class` varchar(13));
INSERT INTO classes
(`student_id`, `class`)
VALUES
(55421, 'algebra'),
(27494, 'algebra'),
(64934, 'algebra'),
(65364, 'algebra'),
(21102, 'algebra'),
(90734, 'algebra'),
(20103, 'algebra'),
(57450, 'gym'),
(76411, 'gym'),
(24918, 'gym'),
(65364, 'gym'),
(55421, 'gym'),
(89607, 'world_history'),
(54522, 'world_history'),
(49581, 'world_history'),
(84155, 'world_history'),
(55421, 'world_history'),
(57450, 'world_history');

最終的には Circos (背景はこちら) を使用したいと考えていますが、重複が最も多い場所と最も少ない場所を理解し、人々に示すことができる方法があれば幸いです。これは私の頭の中ではありませんが、コースごとに 1 つの行と 1 つの列を持ち、異なるクラスが交差する場所にリストされたオーバーラップの数を含む出力テーブルを使用できると考えていました。各コースがそれ自体と交差する場所は、他のカテゴリと重複していない人の数を示すことができます。

score 1 · Accepted Answer

リンクを表す結果を生成することでそれを行うことができます: src -> dst = nb

1) 行列を取得

select c1.class src_class, c2.class dst_class
from (select distinct class from classes) c1
join (select distinct class from classes) c2
order by src_class, dst_class

マトリックスを生成するために「個別のクラスを選択」する必要はありません。クラスと GROUP BY を直接選択するだけです。ただし、ステップ 2 では、その固有の結果が必要です。

結果：

src_class      dst_class
-----------------------------
algebra        algebra
algebra        gym
algebra        world_history
gym            algebra
gym            gym
gym            world_history
world_history  algebra
world_history  gym
world_history  world_history

2) 出発地と目的地に一致する学生のリストに参加する

select c1.class src_class, c2.class dst_class, count(v.student_id) overlap
from (select distinct class from classes) c1
join (select distinct class from classes) c2
left join classes v on
(
    v.class = c1.class
    and v.student_id in (select student_id from classes
                         where class = c2.class)
)
group by src_class, dst_class
order by src_class, dst_class

個別の値 (ステップ 1) を使用すると、リンクがない (代わりに 0 を入れる) 場合でも、すべてのクラスを取得できます。

結果：

src_class      dst_class      overlap
-------------------------------------
algebra        algebra           7
algebra        gym               2
algebra        world_history     1
gym            algebra           2
gym            gym               5
gym            world_history     2
world_history  algebra           1
world_history  gym               2
world_history  world_history     6

3 - クラスが等しい場合は別の計算を行う

select c1.class src_class, c2.class dst_class, count(v.student_id) overlap
from (select distinct class from classes) c1
join (select distinct class from classes) c2
left join classes v on
(
    v.class = c1.class and
    (
        -- When classes are equals
        -- Students presents only in that class
        (c1.class = c2.class
         and 1 = (select count(*) from classes
                  where student_id = v.student_id))
    or
        -- When classes are differents
        -- Students present in both classes
        (c1.class != c2.class
         and v.student_id in (select student_id from classes
                              where class = c2.class))
    )
)
group by src_class, dst_class
order by src_class, dst_class

結果：

src_class      dst_class      overlap
-------------------------------------
algebra        algebra           5
algebra        gym               2
algebra        world_history     1
gym            algebra           2
gym            gym               2
gym            world_history     2
world_history  algebra           1
world_history  gym               2
world_history  world_history     4

mysql - MySQL でのオーバーラップの計算

2 に答える 2

1) 行列を取得

2) 出発地と目的地に一致する学生のリストに参加する

3 - クラスが等しい場合は別の計算を行う

Related

Reference