python - ネストされた辞書に次の条件を追加する

Question

リストの使用から辞書の使用に移行しました。しかし、何度も試行錯誤した結果、コードに少し混乱したので、ここに質問を投稿することにしました。

次のようなファイルがあります: [ファイル名 x50]:

10403   D   chr1    38
10616   D   chr1    22
14465   I   chr1    17
15171   I   chr1    17
16586   I   chr1    17
17237   I   chr1    17
26352   I   chr1    17
26752   I   chr1    17
27669   I   chr1    17
29097   I   chr1    23
29171   I   chr1    17
29902   D   chr1    11
32023   I   chr1    65

およびファイル（分子）：

     16 13829325    I   chrY    10
     13 49157314    I   chr4    10
      8 13867496    I   chrY    16
      7 49102622    I   chr4    10
      7 224204410   I   chr1    10
      7 10800122    I   chr21   15
      6 49157651    I   chr4    10
      6 33866712    I   chr16   10
      6 224204343   I   chr1    10
      6 13869965    I   chrY    10
      6 13838848    I   chrY    15
      6 13829325    I   chrY    15
      5 89876424    I   chr2    10
      5 61758798    I   chr7    23
      5 58977653    I   chrY    10
      5 49157099    I   chr4    10
      5 38806712    D   chr10   10
      5 29829794    I   chr20   10

私のコードは次のようになります。

def round_down(num):
    return num - (num%100000)


x50 = open('path','r')
moleculo = open('path','r')
out = open('path','w')


x50_region = {}
ct=0
for region in x50:
    #print region

    (pos,types,chrm,leng) = region.strip().split()

    start = int(pos)-20
    end = int(pos)+20
    rounded_start=round_down(start)

    if not (chrm in x50_region):
        x50_region[chrm]={}
    if not (types in x50_region[chrm]):
        x50_region[chrm][types]={}
    if not (rounded_start in x50_region[chrm][types]):
        x50_region[chrm][types][rounded_start]= []

    x50_region[chrm][types][rounded_start].append({'start':start,'end':end})


c=0
for moleculo_line in moleculo:
    if (c % 1000 == 0):print "c ",c
    c=c+1
    moleculo_data = moleculo_line.strip().split()
    moleculo_chrom = moleculo_data[3]
    moleculo_types = moleculo_data[2]
    moleculo_len = int(moleculo_data[4])
    moleculo_start_pos = int(moleculo_data[1])
    moleculo_end_pos = int(moleculo_data[1]) + int(moleculo_data[4])
    rounded_moleculo_start_position = round_down(moleculo_start_pos)
    rounded_moleculo_end_position = round_down(moleculo_end_pos)
    moleculo_len = int(moleculo_data[4])

    if moleculo_chrom in x50_region and rounded_moleculo_start_position in x50_region[moleculo_chrom][moleculo_types]:
        for region in x50_region[moleculo_chrom][moleculo_types][rounded_moleculo_start_position]:
            #print i,moleculo_len
            if region['start'] <= moleculo_start_pos and moleculo_start_pos <= region['end']:
                out.write(str(moleculo_line))
out.close()

アイデアは、ファイルをフィルタリングすることです。

ここで、別の条件を追加したいと思いました。私は、moleculo_len という変数を持っています。2 番目のファイルには、私のイベントの長さが含まれています。x50ファイルでこれに対応する値はlengと呼ばれます (pos,types,chrm,leng) = region.strip().split()

フィルタリングに別のステップを追加して、moleculo_len が長さの値の >= 50% にあるときに行が書き込まれるようにするにはどうすればよいですか?

質問が明確に述べられていることを願っています

python - ネストされた辞書に次の条件を追加する

0 に答える 0

Related

Reference