python - Biopython を使用して複数のシーケンスを BLAST にアップロードするにはどうすればよいですか?

Question

単一の FASTA ファイルから複数の配列の BLASTN 検索を実行しようとしています。ファイルから単一のシーケンスを簡単にクエリできますが、1 つのファイル内のすべてのシーケンスをクエリするのに苦労しています。これらは比較的短い読み取りであるため、ファイルを個々のシーケンスに分割して、それぞれを個別にクエリすることは避けたいと思います。

これは私がこれまでに試したことです：

from Bio import SeqIO
from Bio.Blast import NCBIWWW

f_iterator = SeqIO.parse("file.fasta", "fasta")
f_record = f_iterator.next()
result_handle = NCBIWWW.qblast("blastn", "nt", f_record)
save_result = open("blast_result.xml", "w")
save_result.write(result_handle.read())
save_result.close()
result_handle.close()

誰にもアイデアはありますか？

score 1 · Accepted Answer

ファイルが既に FASTA 形式である場合は、open/read を使用できます。これは、Biopython cookbook から直接取得したものです。

http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc92

fasta_string = open("m_cold.fasta").read()

私はいつもこのような単純なスクリプトを実行しています:

from Bio.Blast import NCBIWWW

fasta_string = open("file.fasta").read()

result_handle = qblast(
"blastn",
"nt",
fasta_string,
)
save_file = open("out.xml", "w")

save_file.write(result_handle.read())

save_file.close()

result_handle.close()

それでもうまくいかない場合は、FASTA 形式が正しいことを確認してください。コンバーターはこちらから入手できます。

https://www.hiv.lanl.gov/content/sequence/FORMAT_CONVERSION/form.html

score 0 · Accepted Answer

単一のレコードではなく、複数のシーケンスの fasta ファイルの内容全体 (ファイルから直接読み取る) を単純に与えることはできませんか?

    from Bio.Blast import NCBIWWW

    with open("file.fasta", "r") as fasta_file:
        sequences = fasta_file.read()
        fasta_file.close()

    result_handle = NCBIWWW.qblast("blastn", "nt", sequences)
    save_result = open("blast_result.xml", "w")
    save_result.write(result_handle.read())
    save_result.close()
    result_handle.close()

python - Biopython を使用して複数のシーケンスを BLAST にアップロードするにはどうすればよいですか?

2 に答える 2

Related

Reference