python - ファイルの行と文字を読み取る

Question

次のような入力ファイルがあります

some data...
some data...
some data...
...
some data...
<binary size="2358" width="32" height="24">
data of size 2358 bytes
</binary>
some data...
some data...

バイナリサイズの値 2358 は、ファイルによって異なる場合があります。次に、このファイル (変数) の 2358 バイトのデータを抽出し、別のファイルに書き込みます。

同じために次のコードを書きました。しかし、それは私にエラーを与えます。問題は、この 2358 バイトのバイナリデータを抽出して別のファイルに書き込むことができないことです。

c = responseFile.read(1)
ValueError: Mixing iteration and read methods would lose data

コードは -

import re

outputFile = open('output', 'w')    
inputFile = open('input.txt', 'r')
fileSize=0
width=0
height=0

for line in inputFile:
    if "<binary size" in line:
        x = re.findall('\w+', line)
        fileSize = int(x[2])
        width = int(x[4])
        height = int(x[6])
        break

print x
# Here the file will point to the start location of 2358 bytes.
for i in range(0,fileSize,1):
    c = inputFile.read(1)
    outputFile.write(c)


outputFile.close()
inputFile.close()

私の質問に対する最終的な回答 -

#!/usr/local/bin/python

import os
inputFile = open('input', 'r')
outputFile = open('output', 'w')

flag = False

for line in inputFile:
    if line.startswith("<binary size"):
        print 'Start of Data'
        flag = True
    elif line.startswith("</binary>"):
        flag = False
        print 'End of Data'
    elif flag:
        outputFile.write(line) # remove newline

inputFile.close()
outputFile.close()

# I have to delete the last extra new line character from the output.
size = os.path.getsize('output')
outputFile = open('output', 'ab')
outputFile.truncate(size-1)
outputFile.close()

score 3 · Accepted Answer

別のアプローチはどうですか？擬似コード:

for each line in input file:
    if line starts with binary tag: set output flag to True
    if line starts with binary-termination tag: set output flag to False
    if output flag is True: copy line to the output file

そして実際のコードでは:

outputFile = open('./output', 'w')    
inputFile = open('./input.txt', 'r')

flag = False

for line in inputFile:

    if line.startswith("<binary size"):
        flag = True
    elif line.startswith("</binary>"):
        flag = False
    elif flag:
        outputFile.write(line[:-1]) # remove newline


outputFile.close()
inputFile.close()

score 2 · Accepted Answer

最初のループを次のように変更してみてください。

while True:
    line = inputFile.readline()
    # continue the loop as it was

これにより反復がなくなり、読み取りメソッドのみが残るため、問題は解消されます。

score 1 · Accepted Answer

次の方法を検討してください。

import re

line = '<binary size="2358" width="32" height="24">'

m = re.search('size="(\d*)"', line)

print m.group(1)  # 2358

コードによって異なるため、ドロップインの置き換えではありませんが、正規表現の機能は異なります。

これは Python の正規表現グループキャプチャ機能を使用しており、文字列分割方法よりもはるかに優れています。

たとえば、属性の順序を変更した場合にどうなるかを考えてみましょう。例えば：

<binary width="32" size="2358" height="24">'
instead of
<binary size="2358" width="32" height="24">'

あなたのコードはまだ機能しますか？私はそうするでしょう。:-)

編集：あなたの質問に答えるには：

ファイルの先頭からnバイトのデータを読み取りたい場合は、次のようにすることができます。

bytes = ifile.read(n)

入力ファイルが十分に長くない場合、nバイト未満になる場合があることに注意してください。

「0 番目」のバイトから開始するのではなく、他のバイトから開始する場合は、次のようseek()に最初に使用します。

ifile.seek(9)
bytes = ifile.read(5)

これにより、バイト 9:13 または 10 番目から 14 番目のバイトが得られます。

python - ファイルの行と文字を読み取る

3 に答える 3

Related

Reference