php - ファイルコンテンツからインデックス位置を返す

Question

WAVファイルからインデックス位置を返そうとしています。

干し草の山で針の内容が見つかった場合は、干し草の山で針のインデックス位置を返す必要があります。

haystack = open("haystack.wav",'r').read()
needle = open("needle.wav",'r').read()

print(haystack.index(needle[:46]));

エラーが発生します：

Traceback (most recent call last):
  File "test.py", line 1, in <module>
    haystack = open("haystack.wav",'r').read()
  File "C:\Python33\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 5: character maps to <undefined>

PHPでこれを行うと機能します。

$needle = file_get_contents("needle.wav", false, null, 46);
$haystack = file_get_contents("haystack.wav");
echo strpos($haystack,$needle);

score 3 · Accepted Answer

Python 3で使用してファイルをバイナリとして読み取ると、オブジェクトが返されます。次に、使用できます：'rb'bytes.index

haystack = open("haystack.wav", 'rb').read()
needle = open("needle.wav", 'rb').read()

print(haystack.index(needle[:46]))

例：

>>> b'hello world'.index(b'world')
6
>>> b'hello world'.index(b'goodbye')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: substring not found

score 0 · Accepted Answer

これは、Pythonがオブジェクトでのアクセス方法に応じてバイトをintと交換する方法が原因で、一種の混乱でした。これについて少し説明します。mp3ファイルを新しいファイルに2回書き込むことで、これをテストしました。1つの観察結果は、針にメタデータがある場合、より長いファイルと比較する前にメタデータを削除する必要があるということです。私の場合、針には「ラメ＃...」が含まれていました。このmp3全体をより長いものに一致させるとしたら、一致することはありません。

def findneedle(bin1, bin2):
  with open(bin2,'rb') as haystack:
    with open(bin1,'rb') as needle:
      n = needle.read()
      h = []
      EOF = None
      while EOF != b'':
        EOF = haystack.read(1000)
        h.append(EOF)
        if (n in b''.join(h)):
          h = h[:-1]
          haystack.seek(haystack.tell() - 1000)
          while EOF != b'':
            EOF = haystack.read(1)
            h.append(EOF)
            if (n in b''.join(h)):
              return haystack.tell() - len(n)

index = findneedle('a.mp3','b.mp3')

score -1 · Accepted Answer

haystack = open("haystack.wav",'rb').read()十分でしょう。ただし、phpで.wavファイルを読み取ろうとしたことがないため、pythonとphpが同じバイナリエンコーディング構造を持っているかどうかはわかりません。

>>> a = open("A24.wav", "rb").read()
>>> a[:100]
'RIFF\xf4\xe9\x01\x00WAVEfmt \x10\x00\x00\x00\x01\x00\x01\x00D\xac\x00\x00\x88X\x01\x00\x02\x00\x10\x00data\xd0\xe9\x01\x00\xff\xff\x01\x00\xff\xff\x01\x00\xff\xff\x01\x00\xff\xff\x01\x00\xff\xff\x01\x00\xfe\xff\x04\x00\xfc\xff\x04\x00\xfc\xff\x02\x00\x00\x00\xfe\xff\x04\x00\xfb\xff\x05\x00\xfc\xff\x02\x00\xff\xff\x00\x00\x01\x00\xfe\xff\x04\x00'
>>>

'needle'の文字列と一致する'haystack'の文字列のインデックスを検索したい場合は、正規表現を使用して検索できます。

import re

haystack = open("haystack.wav", "rb").read()
needle = open("needle.wav", "rb").read()

regex = re.compile(needle[:46])
match = regex.search(haystack)

if match:
    print match.start()

php - ファイルコンテンツからインデックス位置を返す

3 に答える 3

Related

Reference