python - Pythonを使用して逆方向にバイナリファイルを読み取る

Question

ファイルを逆方向 (最後から最初) に読み取ろうとしています。以下の例はこれを行っていますが、コミュニティに尋ねたいのですが、私の質問に対するよりエレガントな解決策はありますか?

import os, binascii

CHUNK = 10 #read file by blocks (big size)
src_file_path = 'd:\\src\\python\\test\\main.zip' 
src_file_size = os.path.getsize(src_file_path)
src_file = open(src_file_path, 'rb') #open in binary mode
while src_file_size > 0:
    #read file from last byte to first :)
    if src_file_size > CHUNK:
        src_file.seek(src_file_size - CHUNK)
        byte_list = src_file.read(CHUNK)
    else:
        src_file.seek(0)
        byte_list = src_file.read(src_file_size)
    s = binascii.hexlify(byte_list) #convert '\xFB' -> 'FB'
    byte_list = [(chr(s[i]) + chr(s[i+1])) for i in range(0, len(s), 2)] #split, note below
    print(byte_list[::-1]) #output reverse list
    src_file_size = src_file_size - CHUNK
src_file.close() #close file

UPD専門家の意見を知りたいのですが、Python の初心者として何に注意する必要がありますか? このコードに潜在的な欠陥はありますか?

前もって感謝します。

私はPython 3.3.1を使用しています注：ここからバイトごとに分割してください！

score 2 · Accepted Answer

mmap here を使用した tim-hoffman の優れた回答について詳しく説明します。（申し訳ありませんが、回答の代わりにコメントしますが、まだコメントするのに十分なスタックフーがありません）。

import mmap
# Reverses a binary byte-wise in an efficient manner
with open("out.bin","wb") as w:
    with open("in.bin,"rb") as f:
        # read-only access or you get an access-denied or need to use r+b permissions
        mm = mmap.mmap(f.fileno(),0,access=mmap.ACCESS_READ)
        w.write(mm[::-1])

score 1 · Accepted Answer

別のアプローチは、mmap を使用することです。

http://docs.python.org/2/library/mmap.html

この例では、テキストファイルの内容は「0987654321\n」です

>>> import mmap
>>> f = open("x.txt","r+b")
>>> mm = mmap.mmap(f.fileno(), 0)
>>> mm[0:]
'0987654321\n'
>>> 
>>> for i in range(len(mm),0,-1):
...     if i == 1:
...          print i,repr(mm[0:1])
...     else:
...          print i,repr(mm[i-1:i-2:-1])
... 
11 '\n'
10 '1'
9 '2'
8 '3'
7 '4'
6 '5'
5 '6'
4 '7'
3 '8'
2 '9'
1 '0'

その後、範囲とスライスを使用してチャンクサイズを変更できます。3回に分けて後退しましょう。

>>> for i in range(len(mm)-1,-1,-3):
...   if i < 3:
...      print i,repr(mm[0:i+1])
...   else:
...      print i,repr(mm[i:i-3:-1])
... 
10 '\n12'
7 '345'
4 '678'
1 '09'
>>>

大きな利点は、リバースなどを行う必要がないことです....

score 1 · Accepted Answer

質問から、コードで改善すべきことがいくつかわかります。まず、ループを使用するか、いくつかの組み込み関数をwhile使用して同じことを表現するより良い方法がほとんどの場合、Python でループが使用されることはめったにありません。for

コードは純粋にトレーニング目的かそうです。それ以外の場合は、最初に本当の目標は何かを尋ねます (問題を知っているため、より良い解決策は最初のアイデアとは大きく異なる可能性があるため)。

ここでの目標は、の位置を取得することseekです。あなたはサイズを知っています、あなたはチャンクサイズを知っています、あなたは逆に行きたいです. という名前の Python の目的のための組み込みジェネレーターがありrangeます。ほとんどの場合、単一の引数が使用されます。ただし、range(start, stop, step)完全な形式です。ジェネレーターはforループ内で繰り返すことができます。また、値を使用してそれらのリストを作成することもできます (ただし、後者のケースは必要ないことがよくあります)。の位置は次のseekように生成できます。

chunk = 10
sz = 235

lst = list(range(sz - chunk, 0, -chunk))
print(lst)

sz - chunkつまり、次の生成値に負の値を使用して、位置から開始し、ゼロで停止します (頻繁ではありません)。ここではlist()すべての値を繰り返し処理し、それらのリストを作成します。ただし、生成された値を直接反復処理できます。

for pos in range(sz - chunk, 0, -chunk):
    print('seek({}) and read({})'.format(pos, chunk))

if pos > 0:
    print('seek({}) and read({})'.format(0, pos))

最後に生成された位置は、ゼロまたは正です。このように、 lastifはより短い最後の部分を処理しchunkます。上記のコードをまとめると、次のように出力されます。

c:\tmp\_Python\wikicsm\so16443185>py a.py
[225, 215, 205, 195, 185, 175, 165, 155, 145, 135, 125, 115, 105, 95,
85, 75, 65, 55, 45, 35, 25, 15, 5]
seek(225) and read(10)
seek(215) and read(10)
seek(205) and read(10)
seek(195) and read(10)
seek(185) and read(10)
seek(175) and read(10)
seek(165) and read(10)
seek(155) and read(10)
seek(145) and read(10)
seek(135) and read(10)
seek(125) and read(10)
seek(115) and read(10)
seek(105) and read(10)
seek(95) and read(10)
seek(85) and read(10)
seek(75) and read(10)
seek(65) and read(10)
seek(55) and read(10)
seek(45) and read(10)
seek(35) and read(10)
seek(25) and read(10)
seek(15) and read(10)
seek(5) and read(10)
seek(0) and read(5)

print私は個人的に、ファイルオブジェクト、pos、およびチャンクサイズを取得する関数を呼び出して、's を置き換えます。ここでは、同じプリントを生成するための偽造されたボディ:

#!python3
import os

def processChunk(f, pos, chunk_size):
    print('faked f: seek({}) and read({})'.format(pos, chunk_size))


fname = 'a.txt'
sz = os.path.getsize(fname)     # not checking existence for simplicity
chunk = 16

with open(fname, 'rb') as f:
    for pos in range(sz - chunk, 0, -chunk):
        processChunk(f, pos, chunk)

    if pos > 0:
        processChunk(f, 0, pos)

withコンストラクトは、学ぶのに適したもう 1 つの要素です。(警告、Pascal のに似たものは何もありませんwith。) ブロックが終了した後、ファイルオブジェクトを自動的に閉じます。以下のコードwithはより読みやすく、今後変更する必要がないことに注意してください。はprocessChunkさらに開発されます。

def processChunk(f, pos, chunk_size):
    f.seek(pos)
    s = binascii.hexlify(f.read(chunk_size))
    print(s)

または、結果が逆の 16 進ダンプ (私のコンピューターでテストされた完全なコード) になるように、少し変更することもできます。

#!python3

import binascii
import os

def processChunk(f, pos, chunk_size):
    f.seek(pos)
    b = f.read(chunk_size)
    b1 = b[:8]                  # first 8 bytes
    b2 = b[8:]                  # the rest
    s1 = ' '.join('{:02x}'.format(x) for x in b1)
    s2 = ' '.join('{:02x}'.format(x) for x in b2)
    print('{:08x}:'.format(pos), s1, '|', s2)


fname = 'a.txt'
sz = os.path.getsize(fname)     # not checking existence for simplicity
chunk = 16

with open(fname, 'rb') as f:

    for pos in range(sz - chunk, 0, -chunk):
        processChunk(f, pos, chunk)

    if pos > 0:
        processChunk(f, 0, pos)

a.txtが最後のコードのコピーである場合、以下が生成されます。

c:\tmp\_Python\wikicsm\so16443185>py d.py
00000274: 75 6e 6b 28 66 2c 20 30 | 2c 20 70 6f 73 29 0d 0a
00000264: 20 20 20 20 20 20 20 70 | 72 6f 63 65 73 73 43 68
00000254: 20 20 69 66 20 70 6f 73 | 20 3e 20 30 3a 0d 0a 20
00000244: 6f 73 2c 20 63 68 75 6e | 6b 29 0d 0a 0d 0a 20 20
00000234: 72 6f 63 65 73 73 43 68 | 75 6e 6b 28 66 2c 20 70
00000224: 75 6e 6b 29 3a 0d 0a 20 | 20 20 20 20 20 20 20 70
00000214: 20 2d 20 63 68 75 6e 6b | 2c 20 30 2c 20 2d 63 68
00000204: 20 70 6f 73 20 69 6e 20 | 72 61 6e 67 65 28 73 7a
000001f4: 61 73 20 66 3a 0d 0a 0d | 0a 20 20 20 20 66 6f 72
000001e4: 65 6e 28 66 6e 61 6d 65 | 2c 20 27 72 62 27 29 20
000001d4: 20 3d 20 31 36 0d 0a 0d | 0a 77 69 74 68 20 6f 70
000001c4: 69 6d 70 6c 69 63 69 74 | 79 0d 0a 63 68 75 6e 6b
000001b4: 20 65 78 69 73 74 65 6e | 63 65 20 66 6f 72 20 73
000001a4: 20 20 23 20 6e 6f 74 20 | 63 68 65 63 6b 69 6e 67
00000194: 65 74 73 69 7a 65 28 66 | 6e 61 6d 65 29 20 20 20
00000184: 0d 0a 73 7a 20 3d 20 6f | 73 2e 70 61 74 68 2e 67
00000174: 0a 66 6e 61 6d 65 20 3d | 20 27 61 2e 74 78 74 27
00000164: 31 2c 20 27 7c 27 2c 20 | 73 32 29 0d 0a 0d 0a 0d
00000154: 27 2e 66 6f 72 6d 61 74 | 28 70 6f 73 29 2c 20 73
00000144: 20 20 70 72 69 6e 74 28 | 27 7b 3a 30 38 78 7d 3a
00000134: 66 6f 72 20 78 20 69 6e | 20 62 32 29 0d 0a 20 20
00000124: 30 32 78 7d 27 2e 66 6f | 72 6d 61 74 28 78 29 20
00000114: 32 20 3d 20 27 20 27 2e | 6a 6f 69 6e 28 27 7b 3a
00000104: 20 78 20 69 6e 20 62 31 | 29 0d 0a 20 20 20 20 73
000000f4: 7d 27 2e 66 6f 72 6d 61 | 74 28 78 29 20 66 6f 72
000000e4: 20 27 20 27 2e 6a 6f 69 | 6e 28 27 7b 3a 30 32 78
000000d4: 65 20 72 65 73 74 0d 0a | 20 20 20 20 73 31 20 3d
000000c4: 20 20 20 20 20 20 20 20 | 20 20 20 20 23 20 74 68
000000b4: 62 32 20 3d 20 62 5b 38 | 3a 5d 20 20 20 20 20 20
000000a4: 73 74 20 38 20 62 79 74 | 65 73 0d 0a 20 20 20 20
00000094: 20 20 20 20 20 20 20 20 | 20 20 20 23 20 66 69 72
00000084: 31 20 3d 20 62 5b 3a 38 | 5d 20 20 20 20 20 20 20
00000074: 75 6e 6b 5f 73 69 7a 65 | 29 0d 0a 20 20 20 20 62
00000064: 20 20 20 62 20 3d 20 66 | 2e 72 65 61 64 28 63 68
00000054: 20 20 66 2e 73 65 65 6b | 28 70 6f 73 29 0d 0a 20
00000044: 63 68 75 6e 6b 5f 73 69 | 7a 65 29 3a 0d 0a 20 20
00000034: 73 73 43 68 75 6e 6b 28 | 66 2c 20 70 6f 73 2c 20
00000024: 20 6f 73 0d 0a 0d 0a 64 | 65 66 20 70 72 6f 63 65
00000014: 62 69 6e 61 73 63 69 69 | 0d 0a 69 6d 70 6f 72 74
00000004: 74 68 6f 6e 33 0d 0a 0d | 0a 69 6d 70 6f 72 74 20
00000000: 23 21 70 79 |

の場合、Windowssrc_file_path = 'd:\\src\\python\\test\\main.zip'と同様にスラッシュを使用できます。または、src_file_path = r'd:\src\python\test\main.zip' のような生の文字列src_file_path = 'd:/src/python/test/main.zip'を使用できます。最後のケースは、バックスラッシュを二重にするのを避ける必要がある場合に使用されます -- 多くの場合、正規表現を書く場合です。

python - Pythonを使用して逆方向にバイナリファイルを読み取る

3 に答える 3

Related

Reference