私はサッカープールのウェブサイトを持っています。毎週、友達が各ゲームの勝者を選びます。各プレイヤーのピックを他のプレイヤーと比較し、類似度をリストしたいと思います。特定の週の類似度を計算するのに役立つこのページを見つけました: Compare group of tags to find similarity/score with PHP/MySQL . Ivar Bonsaksenへの称賛、彼のソリューションはうまく機能しました!
ここでやりたいことは、過去数週間の各プレイヤーの累積類似度を表示することです。
照会する 3 つのテーブルがあります: プロファイル (spprofiles)、ゲーム (sp6games)、ピック (sp6picks)。Teams (sp6teams) という別のテーブルを使用してチームの名前を取得しますが、ここでは関係ありません。
Profiles (spprofiles)
+-----------+-------------+
| profileID | profilename |
+-----------+-------------+
| 52 | My Team A |
| 53 | Some Team B |
+-----------+-------------+
Games (sp6games)
+--------+--------+---------+------+
| gameID | weekID | visitor | home |
+--------+--------+---------+------+
| 1 | 2 | 9 | 21 |
| 2 | 2 | 14 | 6 |
| 17 | 3 | 6 | 9 |
| 18 | 3 | 30 | 21 |
+--------+--------+---------+------+
Picks (sp6picks)
+-----------+--------+------+
| profileID | gameID | pick |
+-----------+--------+------+
| 52 | 1 | 21 |
| 52 | 2 | 6 |
| 52 | 17 | 12 |
| 52 | 18 | 21 |
| 53 | 1 | 9 |
| 53 | 2 | 6 |
| 53 | 17 | 9 |
| 53 | 18 | 21 |
+-----------+--------+------+
今週のクエリは次のようになります。
$weekID = 3; //the current weekID
$profile = 52; //the current ProfileID
SELECT
targetProfiles.profileID AS targetID,
sourceProfiles.profileID AS sourceID,
COUNT(targetProfiles.profileID)
/
(((SELECT COUNT(*) FROM sp6Picks LEFT JOIN sp6Games USING (gameID) WHERE profileID = sourceProfiles.profileID AND weekID = $weekID)
+
(SELECT COUNT(*) FROM sp6Picks LEFT JOIN sp6Games USING (gameID) WHERE profileID = targetProfiles.profileID AND weekID = $weekID))/2)
AS similarity
FROM
spProfiles AS sourceProfiles
LEFT JOIN
(SELECT sp6Picks.* FROM sp6Picks LEFT JOIN sp6Games USING (gameID) WHERE weekID = $weekID) AS sourcePicks
ON (sourcePicks.profileID = sourceProfiles.profileID)
INNER JOIN
(SELECT sp6Picks.* FROM sp6Picks LEFT JOIN sp6Games USING (gameID) WHERE weekID = $weekID) AS targetPicks
ON (sourcePicks.pick = targetPicks.pick AND sourcePicks.profileID != targetPicks.profileID)
LEFT JOIN
spProfiles AS targetProfiles
ON (targetPicks.profileID = targetProfiles.profileID)
WHERE sourceProfiles.profileID = $profile
GROUP BY targetID
このクエリを週ごとに実行すると、次の結果が得られます。
$weekID = 2;
+----------+----------+------------+
| targetID | sourceID | similarity |
+----------+----------+------------+
| 53 | 52 | 0.5000 |
+----------+----------+------------+
$weekID = 3;
+----------+----------+------------+
| targetID | sourceID | similarity |
+----------+----------+------------+
| 53 | 52 | 0.5000 |
+----------+----------+------------+
これまでに作成した累積のクエリは次のようになります (ただし、他のバリエーションもいくつか試しました)。基本的には、WHERE 句を変更して以前の週を含めるようにしweekID <= $weekID
、Games テーブルをメインの FROM 句に追加しましたLEFT JOIN sp6games ON (targetPicks.gameID = sp6games.gameID)
。
$weekID = 3; //the current weekID
$profile = 52; //the current ProfileID
SELECT
targetProfiles.profileID AS targetID,
sourceProfiles.profileID AS sourceID,
COUNT(targetProfiles.profileID)
/
(((SELECT COUNT(*) FROM sp6Picks LEFT JOIN sp6Games USING (gameID) WHERE profileID = sourceProfiles.profileID AND weekID <= $weekID)
+
(SELECT COUNT(*) FROM sp6Picks LEFT JOIN sp6Games USING (gameID) WHERE profileID = targetProfiles.profileID AND weekID <= $weekID))/2)
AS similarity
FROM
spProfiles AS sourceProfiles
LEFT JOIN
(SELECT sp6Picks.* FROM sp6Picks LEFT JOIN sp6Games USING (gameID) WHERE weekID <= $weekID) AS sourcePicks
ON (sourcePicks.profileID = sourceProfiles.profileID)
INNER JOIN
(SELECT sp6Picks.* FROM sp6Picks LEFT JOIN sp6Games USING (gameID) WHERE weekID <= $weekID) AS targetPicks
ON (sourcePicks.pick = targetPicks.pick AND sourcePicks.profileID != targetPicks.profileID)
LEFT JOIN
spProfiles AS targetProfiles
ON (targetPicks.profileID = targetProfiles.profileID)
LEFT JOIN sp6games ON (targetPicks.gameID = sp6games.gameID)
WHERE sourceProfiles.profileID = $profile
GROUP BY targetID, weekID
結合された結果は 0.5000 になるはずですが、代わりに次のようになります。
$weekID = 3;
+----------+----------+------------+
| targetID | sourceID | similarity |
+----------+----------+------------+
| 53 | 52 | 0.7500 |
+----------+----------+------------+
問題はCOUNT(targetProfiles.profileID)
、週全体で正しく合計されないため、similarity
値がめちゃくちゃになることです。また、大規模なデータセットではあまり効率的ではないようです。
時間を割いて読んでくれてありがとう。