python - 大きなfastaを複数のファイルに分割し、GI番号で名前を付けることはできません

Question

私は、Python と Biopython の両方に慣れていないということから始めなければなりません。大きな .fasta ファイル (複数のエントリを持つ) を、それぞれに 1 つのエントリを持つ単一のファイルに分割しようとしています。Biopython wiki/Cookbook サイトで次のコードのほとんどを見つけ、少しだけ変更しました。私の問題は、このジェネレーターがそれらに「1.fasta」、「2.fasta」などの名前を付けていることです。GI番号などの識別子で名前を付ける必要があります。

 def batch_iterator(iterator, batch_size) :
    """Returns lists of length batch_size.

    This can be used on any iterator, for example to batch up
    SeqRecord objects from Bio.SeqIO.parse(...), or to batch
    Alignment objects from Bio.AlignIO.parse(...), or simply
    lines from a file handle.

    This is a generator function, and it returns lists of the
    entries from the supplied iterator.  Each list will have
    batch_size entries, although the final list may be shorter.
    """
    entry = True #Make sure we loop once
    while entry :
        batch = []
        while len(batch) < batch_size :
            try :
                entry = next(iterator)
            except StopIteration :
                entry = None
            if entry is None :
                #End of file
                break
            batch.append(entry)
        if batch :
            yield batch

from Bio import SeqIO
infile = input('Which .fasta file would you like to open? ')
record_iter = SeqIO.parse(open(infile), "fasta")
for i, batch in enumerate(batch_iterator(record_iter, 1)) :
    outfile = "c:\python32\myfiles\%i.fasta" % (i+1)
    handle = open(outfile, "w")
    count = SeqIO.write(batch, handle, "fasta")
    handle.close()

交換しようとすると：

outfile = "c:\python32\myfiles\%i.fasta" % (i+1)

と：

outfile = "c:\python32\myfiles\%s.fasta" % (record_iter.id)

SeqIO の seq_record.id に似た名前になるようにすると、次のエラーが発生します。

    Traceback (most recent call last):
  File "C:\Python32\myscripts\generator.py", line 33, in [HTML]
    outfile = "c:\python32\myfiles\%s.fasta" % (record_iter.id)
AttributeError: 'generator' object has no attribute 'id'

ジェネレーター関数には属性「id」がありませんが、どうにか回避できますか? このスクリプトは、私がやろうとしていることに対して複雑すぎますか?!? ありがとう、チャールズ

score 2 · Accepted Answer

一度に 1 つのレコードのみが必要なため、batch_iterator ラッパーと列挙を捨てることができます。

for seq_record in record_iter:

そして、必要なのは、イテレータ全体ではなく、各レコードの id プロパティです。

for seq_record in record_iter:
    outfile = "c:\python32\myfiles\{0}.fasta".format(seq_record.id)
    handle = open(outfile, "w")
    count = SeqIO.write(seq_record, handle, "fasta")
    handle.close()

id参考までに、ジェネレーターのエラーは、 object からプロパティを取得しようとしていることが原因ですrecord_iter。record_iter単一のレコードではなく、レコードのセットであり、Python ジェネレーターとして保持されます。これは、進行中のリストのようなものです。そのため、ファイル全体を一度にメモリに読み込む必要はありません。使用がより効率的です。ジェネレーターの詳細: Python ジェネレーター関数は何に使用できますか? 、http://docs.python.org/tutorial/classes.html#generators、

python - 大きなfastaを複数のファイルに分割し、GI番号で名前を付けることはできません

1 に答える 1

Related

Reference