python - ファイル内の複数の文字列に複数の変更を加えて新しいファイルに出力する方法

Question

私はpythongプログラミングが初めてで、特定のソフトウェアで使用するために解析したいfastaファイルを持っています。ファイルには次の 2 行が含まれます: 1) スペースで区切られた配列識別子と分類法。分類法の最後の種名にもスペースが含まれる場合があります。2) DNA 配列 (以下の例を参照):

>123876987 Bacteria;test;test;test test test
ATCTGCTGCATGCATGCATCGACTGCATGAC
>239847239 Bacteria;test;test;test1 test1 test1
ACTGACTGCTAGTACGATCGCTGCTGCATGACTGAC

多くの苦労と助けを借りて、fasta ファイルをシーケンス ID と分類のみを示す分類ファイルに解析することができました。

123876987 Bacteria;test;test;test test test
239847239 Bacteria;test;test;test1 test1 test1

ただし、私が使用するソフトウェアでは、分類法ファイルを特別な方法でフォーマットする必要があります。分類法ファイルの内容は、1) fasta ファイルから「>」を削除する必要があります。2) 識別子と分類法を各シーケンスヘッダーからタブで区切ります (つまり、文字列内の最初のスペースを置き換えます)。タブで)、3) 分類文字列内のすべてのスペースを「_」に置き換え、分類をセミコロンで終了します (以下の例を参照)。

123876987    Bacteria;test;test;test_test_test;
239847239    Bacteria;test;test;test1_test1_test1;

私は自分の作業スクリプトをいじってそうしようとしています:

with open("test.fasta", "r") as fasta, open("test.tax", "w") as tax:
    while True:
        SequenceHeader= fasta.readline()
        Sequence= fasta.readline()
        if SequenceHeader == '':
            break
        tax.write(SequenceHeader.replace('>', ''))

それを次のように変更します。

with open("test.fasta", "r") as fasta, open("clean_corrected.tax", "w") as tax:
    while True:
        SequenceHeader= fasta.readline()
        Sequence= fasta.readline()      
        old = {'>',' '}
        new = {'','_'}
        CorrectedHeader = SequenceHeader.replace('old','new')
        if SequenceHeader == '':
            break
        tax.write(CorrectedHeader)

しかし、これはまったく機能しません。どうすればこれを行うことができるか知っている人はいますか？

助けてくれて本当にありがとうございます！

score 2 · Accepted Answer

以下が機能するはずです。

with open("test.fasta", "r") as fasta, open("test.tax", "w") as tax:
    for line in fasta:
        if line.startswith('>'):
            line = line[1:]                   # remove the '>' from start of line
            line = line.replace(' ', '\t', 1) # replace first space with a tab
            line = line.replace(' ', '_')     # replace remaining spaces with '_'
            line = line.strip() + ';\n'       # add ';' to the end
            tax.write(line)                   # write to the output file

python - ファイル内の複数の文字列に複数の変更を加えて新しいファイルに出力する方法

1 に答える 1

Related

Reference