python - Python: 類似した名前のファイルを (再帰的に) 比較する

Question

これらの人たちの助けを借りて、次のコードを作成することができました。このコードは、2 つのファイル (つまり、SA1.WRD と SA1.PHN) を読み取り、それらをマージし、その結果をから切り出された単語のサブリストと比較します。辞書：

import sys import os import re import itertools

#generator function to merge sound and word files
def takeuntil(iterable, stop):
    for x in iterable:
        yield x
        if x[1] == stop:
            break

#open a dictionary file and create subset of words
class_defintion = re.compile('([1-2] [lnr] t en|[1-2] t en)')
with open('TIMITDIC.TXT') as w_list:
    entries = (line.split(' ', 1) for line in w_list)
    comp_set = [ x[0] for x in entries if class_defintion.search(x[1]) ]

#open word and sound files
total_words = 0
with open(sys.argv[1]) as unsplit_words, open(sys.argv[2]) as unsplit_sounds:
    sounds = (line.split() for line in unsplit_sounds)
    words = (line.split() for line in unsplit_words)
    output = [
    (word, " ".join(sound for _, _, sound in
        takeuntil(sounds, stop)))
    for start, stop, word in words
]
for x in output:
    total_words += 1

#extract words from above into list of words in dictionary set
glottal_environments = [ x for x in output if x[0] in comp_set ]

#open a dictionary filesいくつかのサブディレクトリを持つ大きなディレクトリで実行するために、後でパーツを変更しようとしています。各サブディレクトリには、.txt ファイル、.wav ファイル、.wrd、および .phn ファイルが含まれています。.wrd ファイルと .phn ファイルのみを開きたいのですが、ベースファイル名が一致する場合 (SA1 ではなく SA1.WRD と SA1.PHN) のみ、一度に 2 つ開くことができるようにしたいと考えています。 WRD および SI997.PHN.

私の当面の推測は、次のようなことをすることでした：

for root, dir, files in os.walk(sys.argv[1]):
    words = [f for f in files if f.endswith('.WRD')]
    phones = [f for f in files if f.endswith('.PHN')]
    phones.sort()
    words.sort()
    files = zip(words, phones)

どちらが返されますか:[('SA1.WRD', 'SA1.PHN'), ('SA2.WRD', 'SA2.PHN'), ('SI997.WRD', 'SI997.PHN')]

私の最初の質問は、私が正しい方向に進んでいるかどうかです。もしそうなら、私の 2 番目の質問は、これらのタプル内の各項目を読み取るファイル名としてどのように扱うことができるかです。

ご協力いただきありがとうございます。

編集：

コードのブロックを for ループに入れることができると考えました。

for f in files:
        #OPEN THE WORD AND PHONE FILES, COMAPRE THEM (TAKE A WORD COUNT)
        total_words = 0
        with open(f[0]) as unsplit_words, open(f[1]) as unsplit_sounds:

        ...

ただし、これにより、おそらく各タプルの各項目が一重引用符で囲まれているために、IOError が発生します。

更新元のスクリプトを変更して、os.path.join(root, f)以下に示すように含めました。スクリプトは、ディレクトリツリー内のすべてのファイルを処理しますが、最後に見つかった 2 つのファイルのみを処理します。の出力は次のprint filesとおりです。

[]
[('test/test1/SI997.WRD', 'test/test1/SI997.PHN')]
[('test/test2/SI997.WRD', 'test/test2/SI997.PHN')]

score 1 · Accepted Answer

ファイルシステムに関連してさまざまな部分をテストしましたが、実際のファイルで確認して、データで機能することを確認する方が簡単です.

パス名を含めることを許可する編集

import sys
import os
import os.path
import re
import itertools

#generator function to merge sound and word files
def takeuntil(iterable, stop):
    for x in iterable:
        yield x
        if x[1] == stop:
            break

def process_words_and_sounds(word_file, sound_file):
    #open word and sound files
    total_words = 0
    with open(word_file) as unsplit_words, open(sound_file) as unsplit_sounds:
        sounds = (line.split() for line in unsplit_sounds)
        words = (line.split() for line in unsplit_words)
        output = [
            (word, " ".join(sound for _, _, sound in
                            takeuntil(sounds, stop)))
            for start, stop, word in words
            ]
        for x in output:
            total_words += 1
    return total_words, output

for root, dir, files in os.walk(sys.argv[1]):
    words = [ os.path.join( root, f ) for f in files if f.endswith('.WRD')]
    phones = [ os.path.join( root, f ) for f in files if f.endswith('.PHN')]
    phones.sort()
    words.sort()
    files = zip(words, phones)
    # print files

output = []
total_words = 0
for word_sounds in files:
    word_file, sound_file = word_sounds
    word_count, output_subset = process_words_and_sounds(word_file, sound_file)
    total_words += word_count
    output.extend( output_subset )

#open a dictionary file and create subset of words
class_defintion = re.compile('([1-2] [lnr] t en|[1-2] t en)')
with open('TIMITDIC.TXT') as w_list:
    entries = (line.split(' ', 1) for line in w_list)
    comp_set = [ x[0] for x in entries if class_defintion.search(x[1]) ]

#extract words from above into list of words in dictionary set
glottal_environments = [ x for x in output if x[0] in comp_set ]

python - Python: 類似した名前のファイルを (再帰的に) 比較する

1 に答える 1

Related

Reference