python - 一方のキーを使用して別の値をスキャンする辞書をマージする

Question

ある辞書のキーを使用して 2 つの辞書をマージして、別の辞書の値を確認するのに助けが必要です。true を返した場合、独自の値を他の辞書に追加します (更新しますが、既存の値を上書きしません)。

コード (申し訳ありませんが、史上初のカスタムスクリプト):

otuid2clusteridlist = dict()
finallist = otuid2clusteridlist
clusterid2denoiseidlist = dict()

#first block, also = finallist we append all other blocks into.
for line in open('cluster_97.ucm', 'r'):
    lineArray = re.split('\s+',line)
    otuid = lineArray[0]
    clusterid = lineArray[3]
    if otuid in otuid2clusteridlist:
        otuid2clusteridlist[otuid].append(clusterid)
    else:
        otuid2clusteridlist[otuid] = list()
        otuid2clusteridlist[otuid].append(clusterid)

#second block, higher tier needs to expand previous blocks hash
for line in open('denoise.ucm_test', 'r'):
    lineArray = re.split('\s+', line)
    clusterid = lineArray[4]
    denoiseid = lineArray[3]
    if clusterid in clusterid2denoiseidlist:
        clusterid2denoiseidlist[clusterid].append(denoiseid)
    else:
        clusterid2denoiseidlist[clusterid] = list()
        clusterid2denoiseidlist[clusterid].append(denoiseid)  

#print/return function for testing (will convert to write out later)
for key in finallist:
    print "OTU:", key, "has", len(finallist[key]), "sequence(s) which", "=", finallist[key]

ブロック 1 が戻る

OTU: 3 has 3 sequence(s) which = ['5PLAS.R2.h_35336', 'GG13_52054', 'GG13_798']
OTU: 5 has 1 sequence(s) which = ['DEX1.h_14175']
OTU: 4 has 1 sequence(s) which = ['PLAS.h_34150']
OTU: 7 has 1 sequence(s) which = ['DEX12.13.h_545']
OTU: 6 has 1 sequence(s) which = ['GG13_45705']

ブロック 2 リターン

OTU: GG13_45705 has 4 sequence(s) which = ['GG13_45705', 'GG13_6312', 'GG13_32148', 'GG13_35246']

したがって、目標は、ブロック 2 の出力をブロック 1 に追加することです。このように追加してほしい

...
 OTU: 6 has 4 sequence(s) which = ['GG13_45705', 'GG13_6312', 'GG13_32148', 'GG13_35246']

試みdic.updateましたが、ブロック 1 にキーが存在しないため、ブロック 2 のコンテンツをブロック 1 に追加するだけです。

私の問題はもっと複雑だと思います。ブロック 2 でブロック 1 の値のキーを調べ、そのリストに値を追加する必要があります。

for ループと .append (既に記述されているコードに似ています) を試してきましたが、これを解決するための python の全体的な知識が不足しています。

アイデア？

追加、

データのサブセット:

cluster_97.ucm (自分のファイルをブロック):

5 376 * DEX1.h_14175 DEX1.h_14175
6 294 * GG13_45705 GG13_45705
0 447 98.7 DEX22.h_37221 DEX29.h_4583
1 367 98.9 DEX14.15.h_35477 DEX27.h_779
1 443 98.4 DEX27.h_3794 DEX27.h_779
0 478 97.9 DEX22.h_7519 DEX29.h_4583

denoise.ucm_test (ブロック 2 のファイル):

11 294 * GG13_45705 GG13_45705
11 278 99.6 GG13_6312 GG13_45705
11 285 99.6 GG13_32148 GG13_45705
11 275 99.6 GG13_35246 GG13_45705

これらのサブセットを選択したのは、ファイル 1 の 2 行目がファイル 2 が更新するものであるためです。

やってみたいという方がいれば。

score 0 · Accepted Answer

値の一致を反映するように更新されました...

あなたの問題の解決策は、Python で変更可能なものをリストし、変更可能な値を持つ変数は単なる参照であるという事実にあると思います。したがって、値をリストにマッピングする 2 番目の辞書を使用できます。

import re

otuid2clusteridlist = dict()
finallist = otuid2clusteridlist
clusterid2denoiseidlist = dict()
known_clusters = dict()

#first block, also = finallist we append all other blocks into.
for line in open('cluster_97.ucm', 'r'):
    lineArray = re.split('\s+',line)
    otuid = lineArray[0]
    clusterid = lineArray[3]
    if otuid in otuid2clusteridlist:
        otuid2clusteridlist[otuid].append(clusterid)
    else:
        otuid2clusteridlist[otuid] = list()
        otuid2clusteridlist[otuid].append(clusterid)

    # remeber the clusters
    known_clusters[clusterid] = otuid2clusteridlist[otuid]

#second block, higher tier needs to expand previous blocks hash
for line in open('denoise.ucm_test', 'r'):
    lineArray = re.split('\s+', line)
    clusterid = lineArray[4]
    denoiseid = lineArray[3]
    if clusterid in clusterid2denoiseidlist:
        clusterid2denoiseidlist[clusterid].append(denoiseid)
    else:
        clusterid2denoiseidlist[clusterid] = list()
        clusterid2denoiseidlist[clusterid].append(denoiseid)

    # match the cluster and update as needed
    matched_cluster = known_clusters.setdefault(clusterid, [])
    if denoiseid not in matched_cluster:
        matched_cluster.append(denoiseid)



#print/return function for testing (will convert to write out later)
for key in finallist:
    print "OTU:", key, "has", len(finallist[key]), "sequence(s) which", "=", finallist[key]

必要かどうかわからなかったので、値からリストへのマッピングを保持するためclusterid2denoiseidlistに新しいを追加しました。known_clusters

実際の問題のすべてのエッジケースをカバーしたかどうかはわかりませんが、これにより、提供されたテスト入力が与えられたときに目的の出力が生成されます。

python - 一方のキーを使用して別の値をスキャンする辞書をマージする

1 に答える 1

Related

Reference