0

スペースで区切られたブロックとしてデータを含む巨大なテキスト ファイルで作業する必要があります。次のようになります。

>3D_helix;140
protein_name:AChR pore alpha subunit (Torpedo marmorata)
file_name:ACh_pore_alpha.txt
entry_date:3july03
refman_number:21022
endnote_number:
author:Miyazawa,A., Fujiyoshi,Y., Unwin,N.(2003) [Structure and gating mechanism of the acetylcholine receptor pore] {Nature, 423, 949-955}
remarks:Sequence is from PDB, chain A. There is additional 24 AA as signal sequence in Swiss-Prot.  TMhelices=4.
pir_number:
Swiss_Prot_entry:ACHA_TORMA
Swiss_Prot_number:P02711
Swiss_Prot_gene:CHRNA1
Swiss_Prot_name:Acetylcholine receptor subunit alpha
PDB_title:Acetylcholine Receptor Protein, alpha Chain
PDB_Identifier:1OED
N_terminal:in
number_tmsegs:4
tm_segments:A.211,237;B.243,271;C.275,300;D.403,436
sequence:SEHETRLVANLLENYNKVIRPVEHHTHFVDITVGLQLIQLINVDEVNQIVETNVRLRQQWIDVRLRWNPADYGGIKKIRLPSDDVWLPDLVLYNNADGDFAIVHMTKLLLDYTGKIMWTPPAIFKSYCEIIVTHFPFDQQNCTMKLGIWTYDGTKVSISPESDRPDLSTFMESGEWVMKDYRGWKHWVYYTCCPDTPYLDITYHFIMQRIPLYFVVNVIIPCLLFSFLTVLVFYLPTDSGEKMTLSISVLLSLTVFLLVIVELIPSTSSAVPLIGKYMLFTMIFVISSIIVTVVVINTHHRSPSTHTMPQWVRKIFINTIPNVMFFSTMKRASKEKQENKIFADDIDISDISGKQVTGEVIFQTPLIKNPDVKSAIEGVKYIAEHMKSDEESSNAAEEWKYVAMVIDHILLCVFMLICIIGTVSVFAGRLIELSQEG*

>1D_helix;141
protein_name:AChR pore beta subunit (Torpedo marmorata)
file_name:ACh_pore_beta.txt
entry_date:3july03
refman_number:21022
endnote_number:
author:Miyazawa,A., Fujiyoshi,Y., Unwin,N.(2003) [Structure and gating mechanism of the acetylcholine receptor pore] {Nature, 423, 949-955}
remarks:Sequence is from PDB, chain B. There is additional 24 AA as signal sequence in Swiss-Prot. TMhelices=4.
pir_number:
Swiss_Prot_entry:Q6S3I0_TORMA
Swiss_Prot_number:Q6S3I0
Swiss_Prot_gene:none
Swiss_Prot_name:Acetylcholine receptor beta subunit
PDB_title:Acetylcholine Receptor Protein, beta Chain
PDB_Identifier:1OED
N_terminal:in
number_tmsegs:4
tm_segments:A.224,241;B.249,274;C.290,306;D.438,462
sequence:SVMEDTLLSVLFENYNPKVRPSQTVGDKVTVRVGLTLTSLLILNEKNEEMTTSVFLNLAWTDYRLQWDPAAYEGIKDLSIPSDDVWQPDIVLMNNNDGSFEITLHVNVLVQHTGAVSWHPSAIYRSSCTIKVMYFPFDWQNCTMVFKSYTYDTSEVILQHALDAKGEREVKEIMINQDAFTENGQWSIEHKPSRKNWRSDDPSYEDVTFYLIIQRKPLFYIVYTIVPCILISILAILVFYLPPDAGEKMSLSISALLALTVFLLLLADKVPETSLSVPIIISYLMFIMILVAFSVILSVVVLNLHHRSPNTHTMPNWIRQIFIETLPPFLWIQRPVTTPSPDSKPTIISRANDEYFIRKPAGDFVCPVDNARVAVQPERLFSEMKWHLNGLTQPVTLPQDLKEAVEAIKYIAEQLESASEFDDLKKDWQYVAMVADRLFLYIFITMCSIGTFSIFLDASHNVPPDNPFA*

>3D_other;143
protein_name:AChR pore delta subunit (Torpedo marmorata)
file_name:ACh_pore_delta.txt
entry_date:4dec03
refman_number:21022
endnote_number:
author:Miyazawa,A., Fujiyoshi,Y., Unwin,N.(2003) [Structure and gating mechanism of the acetylcholine receptor pore] {Nature, 423, 949-955}
remarks:Sequence is from PDB, chain C. Sequence in PDB has first 21 AA removed relative to Swiss-Prot. TMhelices=4.
pir_number:
Swiss_Prot_entry:Q6S3H8_TORMA
Swiss_Prot_number:Q6S3H8
Swiss_Prot_gene:none
Swiss_Prot_name:Acetylcholine receptor delta subunit
PDB_title:Acetylcholine Receptor Protein, delta Chain
PDB_Identifier:1OED
N_terminal:in
number_tmsegs:4
tm_segments:A.226,253;B.257,285;C.289,316;D.452,483
sequence:VNEEERLINDLLIVNKYNKHVRPVKHNNEVVNIALSLTLSNLISLKETDETLTTNVWMDHAWYDHRLTWNASEYSDISILRLRPELIWIPDIVLQNNNDGQYNVAYFCNVLVRPNGYVTWLPPAIFRSSCPINVLYFPFDWQNCSLKFTALNYNANEISMDLMTDTIDGKDYPIEWIIIDPEAFTENGEWEIIHKPAKKNIYGDKFPNGTNYQDVTFYLIIRRKPLFYVINFITPCVLISFLAALAFYLPAESGEKMSTAICVLLAQAVFLLLTSQRLPETALAVPLIGKYLMFIMSLVTGVVVNCGIVLNFHFRTPSTHVLSTRVKQIFLEKLPRILHMSRVDEIEQPDWQNDLKLRRSSSVGYISKAQEYFNIKSRSELMFEKQSERHGLVPRVTPRIGFGNNNENIAASDQLHDEIKSGIDSTNYIVKQIKEKNAYDEEVGNWNLVGQTIDRLSMFIITPVMVLGTIFIFVMGNFNRPPAKPFEGDPFDYSSDHPRCA

各ブロックは、指定された 3 つのオプションのいずれかで始まります。各ブロックの行数はさまざまです。ファイルを次のように 3 つの部分 (または 3 つの個別のファイル) に分割したい:

part 1 contains all blocks starting with >3D_Helix
part 2 contains all blocks starting with >1D_helix
part 3 contains all blocks starting with >3d_other

私は次の方法を試しました

prot_file = open(sys.argv[1], "r")
flag = False
for line in prot_file:
    if line.startswith (">3D_other"):
        flag == True
    if flag == True:
            print line

ただし、最初の行、つまり 3d_helix のみが出力されます。私がオンラインで見つけたヒントのほとんどは、各ブロックのサイズに基づいてリストをブロックに分割します (つまり、サイズは特定の数、たとえば 13 に固定されていることが知られています)。ただし、私の場合、サイズがわからないため、使用できません。説明されているように、ファイルを分割する効率的なPythonicの方法が必要です。

4

1 に答える 1

1

これは私が思いついた解決策です:

#!/usr/bin/env python

INPUT_FILE = 'input.txt'
OUT_3D_HELIX = 'out_3dhelix.txt'
OUT_1D_HELIX = 'out_1dhelix.txt'
OUT_3D_OTHER = 'out_3dother.txt'

f_input = open(INPUT_FILE, 'r')
out_3dhelix = open(OUT_3D_HELIX, 'w')
out_1dhelix = open(OUT_1D_HELIX, 'w')
out_3dother = open(OUT_3D_OTHER, 'w')

dest_file = None
starting = True

try:
    for line in f_input:
        if starting:
            ## We are at a block start
            if line.startswith('>3D_helix;'):
                dest_file = out_3dhelix
            elif line.startswith('>1D_helix;'):
                dest_file = out_1dhelix
            elif line.startswith('>3D_other;'):
                dest_file = out_3dother
            else:
                continue   # Invalid line -- not a block beginning
            starting = False

        if not line.strip():  # Line is blank -- block end
            starting = True
            dest_file = None
            continue

        if dest_file is not None:  # And never should be, at this point..
            dest_file.write(line)

finally:
    ## Close files...
    f_input.close()
    out_3dhelix.close()
    out_1dhelix.close()
    out_3dother.close()

基本的に、次の行を書き込む宛先ファイルを変更するために、「ブロック開始」を検出して、すべてのファイルを行ごとに読み取ります。

于 2013-03-19T01:28:51.493 に答える