python - 複数の.fastaファイルを連結する

Question

何百もの.fastaファイルをすべてのシーケンスを含む単一の大きなfastaファイルに連結しようとしています。フォーラムでこれを達成するための特定の方法を見つけていません。私はhttp://zientzilaria.heroku.com/blog/2007/10/29/merging-single-or-multiple-sequence-fasta-filesからこのコードに出くわしましたが、これは少し適応させました。

Fasta.pyには次のコードが含まれています。

class fasta:
    def __init__(self, name, sequence):
        self.name = name
        self.sequence = sequence

def read_fasta(file):
    items = []
    index = 0
    for line in file:
        if line.startswith(">"):
           if index >= 1:
               items.append(aninstance)
           index+=1
           name = line[:-1]
           seq = ''
           aninstance = fasta(name, seq)
        else:
           seq += line[:-1]
           aninstance = fasta(name, seq)

    items.append(aninstance)
    return items

そして、これが.fastaファイルを連結するために適合されたスクリプトです：

import sys
import glob
import fasta

#obtain directory containing single fasta files for query
filepattern = input('Filename pattern to match: ')

#obtain output directory
outfile = input('Filename of output file: ')

#create new output file
output = open(outfile, 'w')

#initialize lists
names = []
seqs = []

#glob.glob returns a list of files that match the pattern
for file in glob.glob(filepattern):

    print ("file: " + file)

    #we read the contents and an instance of the class is returned
    contents = fasta.read_fasta(open(file).readlines())

    #a file can contain more than one sequence so we read them in a loop
    for item in contents:
        names.append(item.name)
        seqs.append(item.sequence)

#we print the output
for i in range(len(names)):
    output.write(names[i] + '\n' + seqs[i] + '\n\n')

output.close()
print("done")

fastaファイルを読み取ることはできますが、新しく作成された出力ファイルにはシーケンスが含まれていません。私が受け取るエラーはfasta.pyが原因で、これは私の能力を超えています。

Traceback (most recent call last):
  File "C:\Python32\myfiles\test\3\Fasta_Concatenate.py", line 28, in <module>
    contents = fasta.read_fasta(open(file).readlines())
  File "C:\Python32\lib\fasta.py", line 18, in read_fasta
    seq += line[:-1]
UnboundLocalError: local variable 'seq' referenced before assignment

助言がありますか？ありがとう！

score 8 · Accepted Answer

pythonこの仕事に使うのはやり過ぎだと思います。.fastaコマンドラインで、または.fa拡張子を持つ 1 つまたは複数の fasta ファイルを連結する簡単な方法は、次のとおりです。

cat *.fa* > newfile.txt

score 1 · Accepted Answer

問題は次のfasta.pyとおりです。

else:
       seq += line[:-1]
       aninstance = fasta(name, seq)

seqの先頭で前に初期化してみてくださいread_fasta(file)。

編集：詳細な説明

を最初に呼び出すときread_fasta、ファイルの最初の行はで始まらないため>、seqまだ初期化されていない（宣言されていない）文字列に最初の行を追加します。文字列（最初の行）をnullに追加します価値。スタックに存在するエラーが問題を説明しています。

UnboundLocalError: local variable 'seq' referenced before assignment

score 1 · Accepted Answer

コマンドプロンプト経由の Windows OS の場合: (メモフォルダーには必要なファイルのみが含まれている必要があります):

copy *.fasta **space** final.fasta

楽しみ。

score 1 · Accepted Answer

Pythonプログラマーではありませんが、質問コードは各シーケンスのデータを1行に凝縮し、シーケンスを空白行で区切ろうとしているようです。

になるだろう

  >seq1
  0000000011111111

  >seq2
  2222222233333333

これが実際に必要な場合、上記のcatベースのソリューションは機能しません。それ以外の場合は、猫が最も簡単で効果的な解決策です。

python - 複数の.fastaファイルを連結する

5 に答える 5

Related

Reference