python - 最初のアイテムを辞書内のすべてのアイテムと反復して比較します

Question

助けてください、これを行う方法が見つからないようです。私は Web サイエンスプロジェクトに取り組んでおり、これは python を使用した 3 番目のプロジェクトです。

辞書の最初の項目を同じ辞書の他のすべての項目と比較する必要がありますが、他の項目は辞書です。

たとえば、次の値を持つ辞書があります。

{'25': {'Return of the Jedi (1983)': 5.0},
 '42': {'Batman (1989)': 3.0, 'E.T. the Extra-Terrestrial (1982)': 5.0},
 '8': {'Return of the Jedi (1983)': 5.0 },'542': {'Alice in Wonderland (1951)': 3.0, 'Blade Runner (1982)': 4.0}, '7': {'Alice in Wonderland (1951)': 3.0,'Blade Runner (1982)': 4.0}}

したがって、この場合、キー '25' と '42' に同じ映画「ジェダイの帰還」が含まれているかどうか、'25' と '8' に同じ映画があるかどうかを確認する必要があります。私は彼らがそうします、私はいくつの映画が重なるかを知る必要があります。

これはディクショナリの例です。ディクショナリ全体には 1000 個のキーが含まれており、サブディクショナリもかなり大きくなっています。

反復、辞書の比較、コピーの作成、マージ、結合を試みましたが、どうすればこれを行うことができるかを理解できないようです。

助けてください！

問題は、全体として少なくとも 2 つの同じ映画を含むキーを見つける必要があるため、両方のサブディクショナリをまだ比較できないことです。

score 2 · Accepted Answer

使用できますcollections.Counter：

>>> dic={'25': {'Return of the Jedi (1983)': 5.0}, '42': {'Batman (1989)': 3.0, 'E.T. the Extra-Terrestrial (1982)': 5.0}, '8': {'Return of the Jedi (1983)': 5.0 }}
>>> from collections import Counter
>>> c=Counter(movie  for v in dic.values() for movie in v)

>>> [k for k,v in c.items() if v>1] #returns the name of movies repeated more than once
['Return of the Jedi (1983)']
>>> c
Counter({'Return of the Jedi (1983)': 2,
         'Batman (1989)': 1,
         'E.T. the Extra-Terrestrial (1982)': 1})

各映画に関連するキーを取得するには、次を使用できますcollections.defaultdict。

>>> from collections import defaultdict
>>> movie_keys=defaultdict(list)
>>> for k,v in dic.items(): 
    for movie in v:
        movie_keys[movie].append(k)
...         
>>> movie_keys
defaultdict(<type 'list'>, {'Batman (1989)': ['42'], 'Return of the Jedi (1983)': ['25', '8'], 'E.T. the Extra-Terrestrial (1982)': ['42']})

score 0 · Accepted Answer

辞書には実際には「最初の」項目はありませんが、特定の映画を含むすべてのキーを次のように見つけることができます。

movies = {}
for k in data:
    for movie in data[k]:
        movies.setdefault(movie, []).append(k)

出力ムービーは次のようになります。

{'Return of the Jedi (1983)': [25, 8], 'Batman (1989)': [42], ...}

score 0 · Accepted Answer

'title',['offender1',...]私の答えは、複数回見られた映画のペアを含む辞書のみを返します。つまり、報告されません。これは、辞書内包表記の結果ではなく、ソリューションを返すだけで変更できます。'E.T. the Extra-Terrestrial (1982)''Return of the Jedi (1983)'overlaps

d は次のとおりです。

d = {'25': {'Return of the Jedi (1983)': 5.0},
     '42': {'Batman (1989)': 3.0, 'E.T. the Extra-Terrestrial (1982)': 5.0},
     '8': {'Return of the Jedi (1983)': 5.0 },
     '542': {'Alice in Wonderland (1951)': 3.0, 'Blade Runner (1982)': 4.0},
     '7': {'Alice in Wonderland (1951)': 3.0,'Blade Runner (1982)': 4.0}
     }

以下：

from collections import defaultdict
import itertools
def findOverlaps(d):
    overlaps = defaultdict(list)
    for (parentKey,children) in d.items(): #children is the dictionary containing movie_title,rating pairs
        for childKey in children.keys(): #we're only interested in the titles not the ratings, hence keys() not items()
            overlaps[childKey].append(parentKey) #add the parent 'id' where the movie_title came from
    return dict(((overlap,offenders) for (overlap,offenders) in overlaps.items() if len(offenders) > 1)) #return a dictionary, only if the movie title had more than one 'id' associated with it
print(findOverlaps(d))

プロデュース:

>>> 
{'Blade Runner (1982)': ['7', '542'], 'Return of the Jedi (1983)': ['25', '8'], 'Alice in Wonderland (1951)': ['7', '542']}

コードの背後にある理由:

d の各エントリはを表しid : { movie_title1: rating, movie_title2: rating }ます。ここで、 2 つ以上の個別のキーに関連付けられた値movie_title1で発生したとします。保存したい id

move_title2回以上見た映画の作品。
映画が見られた値idに関連付けられたのキー。

したがって、そのような結果の辞書が必要です

{ move_title1: {'id1','id2'}, movie_title2: {'id2','id5'}

python - 最初のアイテムを辞書内のすべてのアイテムと反復して比較します

3 に答える 3

Related

Reference