python - 新しい行や最初の行を数えずに、座標系を使用してファイル内の特定の文字を取得するにはどうすればよいですか

Question

そのため、大きなファイル内の位置を指す座標系があります。

ファイルの最初の行の長さは可変で (ただし、常に ">" 文字で始まります)、そこから行は 50 文字の長さになり、その後に新しい行が続きます。これは数百万行続くことがあります。

たとえば、1,000,000-1,000,050 (1000000-1000050 で入力される) の間の文字を見つけて、これらを文字列に書き込めるようにしたいと考えています。ファイル内のその位置を探すにはどうすればよいですか? f.seek(1000000)を使ってみたのですが、1行目の長さの問題にぶち当たりました。f.seek 関数で最初の行の長さを 1000000 に追加しても、50 文字ごとに余分な文字 (改行) が追加されます。

数値が 1000000 ～ 1000050 ほどきれいになることはめったにありません。

score 1 · Accepted Answer

line_length=50
char_n=10000000 #zero-based index
count=50

with open('f.txt') as f:
    f.readline()
    start=f.tell()
    f.seek(start+int(char_n/line_length)*(line_length+1)+char_n%line_length)
    print(f.read(count))

score 0 · Accepted Answer

これが私が最終的に使用したものです。私が使用した小さなトライアルではうまくいくようです。

#reads input from user for exon coordinates
coords = raw_input("Please enter the coordinates of the Exon you would like to use\n")

#Reads the first part of coords for the chromosome (and, therefore, filename)
chr_index = coords[:coords.index(":")] + ".fa"

#get starting coordinate
coordStart = coords[coords.index(":")+1:coords.index("-")]

#get ending coordinate
coordEnd = coords[coords.index("-")+1:]

#open the file
f = open(chr_index, "r")

f.seek()
lenFirstLine = len(f.readline())

#create string containing the exon sequence
#move to start of the exon
f.seek(lenFirstLine+coordStart+coordstart/50)

#read the number of characters = to the len of the exon into exon
exon = f.read(coordEnd-coordStart)

python - 新しい行や最初の行を数えずに、座標系を使用してファイル内の特定の文字を取得するにはどうすればよいですか

2 に答える 2

Related

Reference