python - 画像がシーケンスの一部であるかどうかを判断する最良の方法は何ですか

Question

画像ファイルがあり、Python を使用して画像シーケンスの一部であるかどうかを確認したいと思います。

たとえば、次のファイルから始めます。

/projects/image_0001.jpg

ファイルがシーケンスの一部であるかどうかを確認したい

/projects/image_0001.jpg
/projects/image_0002.jpg
/projects/image_0003.jpg
...

ファイル名がシーケンスの芸術であるかどうかを判断できれば、画像のシーケンスがあるかどうかを確認するのは簡単に思えます。つまり、ファイル名の数字のシーケンスがあるかどうか

私が最初に行ったのは、ファイルパスに番号を追加し、ハッシュを置き換える開始フレーム番号と終了フレーム番号を入力するようにユーザーに依頼することでした####が、これは明らかにユーザーフレンドリーではありません。正規表現などを使用して文字列内の一連の数字をチェックする方法はありますか?

score 3 · Accepted Answer

問題は、ファイル名自体に関する特定の情報を知ることよりも、ディスク上のシーケンスされたファイルを区別できることにあると思います。

その場合、あなたが探しているのは、次のようなリストを取得するのに十分スマートなものです。

/path/to/file_1.png
/path/to/file_2.png
/path/to/file_3.png
...
/path/to/file_10.png
/path/to/image_1.png
/path/to/image_2.png
...
/path/to/image_10.png

そして、次のような結果が返されます - /path/to/file_#.png と /path/to/image_#.png という 2 つのファイルシーケンスがあります。 2 回目のパスで、他のすべてのファイルがその要件を満たしているかを判断します。

また、ギャップをサポートするかどうかを知る必要があります (シーケンシャルにする必要がありますか)。

/path/to/file_1.png
/path/to/file_2.png
/path/to/file_3.png
/path/to/file_5.png
/path/to/file_6.png
/path/to/file_7.png

これは 1 つのシーケンス (/path/to/file_#.png) または 2 つのシーケンス (/path/to/file_1-3.png、/path/to/file_5-7.png) ですか?

また、数値ファイルをシーケンスでどのように処理しますか?

/path/to/file2_1.png
/path/to/file2_2.png
/path/to/file2_3.png

等

それを念頭に置いて、これは私がそれを達成する方法です：

    import os.path
    import projex.sorting
    import re

    def find_sequences( filenames ):
        """
        Parse a list of filenames into a dictionary of sequences.  Filenames not
        part of a sequence are returned in the None key

        :param      filenames | [<str>, ..]

        :return     {<str> sequence: [<str> filename, ..], ..}
        """
        local_filenames   = filenames[:]
        sequence_patterns = {}
        sequences         = {None: []}

        # sort the files (by natural order) so we always generate a pattern
        # based on the first potential file in a sequence
        local_filenames.sort(projex.sorting.natural)

        # create the expression to determine if a sequence is possible
        # we are going to assume that its always going to be the 
        # last set of digits that makes a sequence, i.e.
        #
        #    test2_1.png
        #    test2_2.png
        #
        # test2 will be treated as part of the name
        # 
        #    test1.png
        #    test2.png
        #
        # whereas here the 1 and 2 are part of the sequence
        #
        # more advanced expressions would be needed to support
        # 
        #    test_01_2.png
        #    test_02_2.png
        #    test_03_2.png

        pattern_expr = re.compile('^(.*)(\d+)([^\d]*)$')

        # process the inputed files for sequences
        for filename in filenames:
            # first, check to see if this filename matches a sequence
            found = False
            for key, pattern in sequence_patterns.items():
                match = pattern.match(filename)
                if ( not match ):
                    continue

                sequences[key].append(filename)
                found = True
                break

            # if we've already been matched, then continue on
            if ( found ):
                continue

            # next, see if this filename should start a new sequence
            basename      = os.path.basename(filename)
            pattern_match = pattern_expr.match(basename)
            if ( pattern_match ):
                opts = (pattern_match.group(1), pattern_match.group(3))
                key  = '%s#%s' % opts

                # create a new pattern based on the filename
                sequence_pattern = re.compile('^%s\d+%s$' % opts)

                sequence_patterns[key] = sequence_pattern
                sequences[key] = [filename]
                continue

            # otherwise, add it to the list of non-sequences
            sequences[None].append(filename)

        # now that we have grouped everything, we'll merge back filenames
        # that were potential sequences, but only contain a single file to the
        # non-sequential list
        for key, filenames in sequences.items():
            if ( key is None or len(filenames) > 1 ):
                continue

            sequences.pop(key)
            sequences[None] += filenames

        return sequences

そして使用例：

>>> test =   ['test1.png','test2.png','test3.png','test4.png','test2_1.png','test2_2.png','test2_3.png','test2_4.png']
>>> results = find_sequences(test)
>>> results.keys()
[None, 'test#.png', 'test2_#.png']

別のトピックである自然な並べ替えを参照する方法があります。projex ライブラリの自然な並べ替え方法を使用しました。これはオープンソースなので、使用したり見たりしたい場合は、ここにあります: http://dev.projexsoftware.com/projects/projex

ただし、そのトピックはフォーラムの他の場所で取り上げられているため、ライブラリのメソッドを使用しました.

score 2 · Accepted Answer

Python のreモジュールを使用して、文字列に一連の数字が含まれているかどうかを確認するのは比較的簡単です。次のようなことができます。

mo = re.findall('\d+', filename)

これにより、内のすべての数字シーケンスのリストが返されfilenameます。もしも：

結果は 1 つ (つまり、ファイル名には数字のシーケンスが 1 つしか含まれていません)、および
後続のファイル名には、同じ長さの 1 桁の数字列があり、かつ
2 番目の数字列は前の数字列よりも 1 大きい

...それなら、それらはシーケンスの一部なのかもしれません。

python - 画像がシーケンスの一部であるかどうかを判断する最良の方法は何ですか

2 に答える 2

Related

Reference