mysql - 重複する主キーなしで、各外部キーの結合されたテーブルから最初のレコードを取得します

Question

次のテーブル構造があります。

Tags:
Tag_ID | Name
1      | Tag1
2      | Tag2
3      | Tag3
4      | Tag4
5      | Tag5
6      | Tag6

Posts:
Post_ID | Title | Body
1       | Post1 | Post1
2       | Post2 | Post2
3       | Post3 | Post3
4       | Post4 | Post4
5       | Post5 | Post5
6       | Post6 | Post6
7       | Post7 | Post7
8       | Post8 | Post8
9       | Post9 | Post9
10      | Post10| Post10

TagsPosts:
Tag_ID | Post_ID
1      | 1
1      | 2
1      | 3
1      | 4
1      | 5
1      | 10
1      | 1
2      | 1
2      | 2
2      | 6
2      | 7
3      | 4
3      | 8
3      | 9
4      | 7
5      | 1
5      | 2
5      | 3
5      | 4
5      | 5
5      | 6
5      | 7
6      | 2

クエリから返す必要があるのはPosts、最も一般的な上位 3つと、重複を提供しない残りのTag上位 1 つです。PostTagsPosts

Desired Output:
Tag_ID | Post_ID
5      | 1
5      | 2
5      | 3
1      | 10
2      | 6
3      | 9
4      | 7

これまでのところPosts、最も一般的なTag使用方法のトップ 3 を特定できました。

SELECT Top(3) t.Tag_ID, p.Post_ID FROM Tags as t
INNER JOIN TagsPosts as tp ON t.Tag_ID = tp.Tag_ID
INNER JOIN Posts as p ON tp.Post_ID = p.Post_ID
WHERE t.Tag_ID IN (
    SELECT TOP(1) Tag_ID FROM TagsPosts GROUP BY Tag_ID ORDER BY COUNT(Tag_ID) DESC)

Result:
Tag_ID | Post_ID
5      | 1
5      | 2
5      | 3

Postまた、残りのTags使用についてトップ 1 を特定しました。

SELECT t.Tag_ID, p.Post_ID FROM Tags as t
INNER JOIN (
    SELECT t.Tag_ID, Max(p.Post_ID) as Post_ID FROM Tags as t
INNER JOIN TagsPosts as tp ON t.Tag_ID = tp.Tag_ID
INNER JOIN Posts as p ON tp.Post_ID = p.Post_ID
WHERE t.Tag_ID NOT IN (
        SELECT TOP(1) Tag_ID FROM TagsPosts GROUP BY Tag_ID ORDER BY COUNT(Tag_ID) DESC)
    AND
p.Post_ID NOT IN (
        SELECT Top(3) p.Post_ID FROM Tags as t
    INNER JOIN TagsPosts as tp ON t.Tag_ID = tp.Tag_ID
    INNER JOIN Posts as p ON tp.Post_ID = p.Post_ID
    WHERE t.Tag_ID IN (
        SELECT TOP(1) Tag_ID FROM TagsPosts GROUP BY Tag_ID ORDER BY COUNT(Tag_ID) DESC))
    GROUP BY t.Tag_ID) as s ON t.Tag_ID = s.Tag_ID
INNER JOIN Posts as p ON s.Post_ID = p.Post_ID

Result:
Tag_ID | Post_ID
1      | 10
2      | 7
3      | 9
4      | 7

これでほぼ完了ですが、ご覧のとおり、 duplicate が返されますPosts。

ちなみに、私はMySQLに不慣れなため、テストにはSQL Server 2008 Expressを使用していますが、MySQLデータベースに適用できるSQLクエリを特定するよう求められています。T-SQL で基本的なクエリを取得できれば、MySQL で使用される SQL に変換するのは非常に簡単になると考えました。

score 0 · Accepted Answer

ウィンドウ関数を使用して CTE に格納し、それを述語で参照します。そのように (SSMS からそのまま実行できる単純化されたバージョンのデータを使用して)。SQL-Server をリストしましたが、バージョンはリストしませんでした。テーブル関数はバージョン 2005 以降の SQL Server で実行できると思いますが、確かではありません。

declare @Tag table ( tagid int identity, name varchar(8));

insert into @Tag values ('Tag1'),('Tag2'),('Tag3'),('Tag4'),('Tag5'),('Tag6');

declare @Posts table (postid int identity, tagid int, postbody varchar(32));

insert into @Posts values (1,'Blah'),(1, 'Blahblah'),(2, 'Blahblah'),(3, 'Blahbodyblah'),(4, 'Blahblahblah'),(4, 'Blahbodyblah'),(4, 'Blah'),(5, 'Blah'),(5, 'Blahblah'),(6, 'Blahblah');

-- use a CTE
with a as 
    (
    select 
        p.postbody
    ,   count(t.tagid) as TimesTagged
        /* You stated you wanted a return of posts based on their occurrence.  I am counting a position 
        of the COUNTS OF TAGID's descending (greatest first) starting from one.  If you have a tie and want to 
        do those I would consider using DENSE_RANK.  You would have to insert more values where you get a third 
        occurence to become a TIE to see how Rank, Dense_Rank, and Row_number differ.  They all have their 
        purposes but the user should know what they want before determining which they use.
        */
    ,   row_number() over(order by count(t.tagid) desc) as PositionOfCountsTaggedByGreatestOrderFirst
    ,   Rank() over(order by count(t.tagid) desc) as PositionOfCountsTaggedByGreatestOrderFirst_Ranking
    ,   Dense_Rank() over(order by count(t.tagid) desc) as PositionOfCountsTaggedByGreatestOrderFirst_DenseRanking
    from @Tag t 
        join @Posts p on t.tagid = p.tagid
    group by p.postbody
    )
select *
from a
-- I only use Row_Number, you can change to use one of the other predicates above if you wish.
where PositionOfCountsTaggedByGreatestOrderFirst <= 3


/*
You are stating you only want the top three counts
windowed functions are better than using top IMHO as you can specify lists 'in', medians, and all other types
explicitly defined rather than having to repeating nested selects.  The only downer is you can not use 
a predicate on a windowed function directly.  Yout must create it and then in a nested select, CTE (as shown)
, a table variable, temp table, etc...  define a predicate on it.
*/

mysql - 重複する主キーなしで、各外部キーの結合されたテーブルから最初のレコードを取得します

1 に答える 1

Related

Reference