python - 効率的な方法で画像シーケンスをリストする方法は? Python での数列比較

Question

私は9つの画像のディレクトリを持っています:

image_0001、image_0002、image_0003
image_0010、image_0011
image_0011-1、image_0011-2、image_0011-3
画像_9999

次のように、効率的な方法でそれらをリストできるようにしたいと思います (9 つの画像に対して 4 つのエントリ)。

(image_000[1-3]、image_00[10-11]、image_0011-[1-3]、image_9999)

Pythonで、画像のディレクトリを短い/明確な方法で返す方法はありますか(すべてのファイルをリストすることなく)?

したがって、おそらく次のようなものです。

すべての画像を一覧表示し、数値で並べ替え、リストを作成します (各画像を最初から順番に数えます)。画像が欠落している場合 (新しいリストを作成) は、元のファイルリストが終了するまで続行します。これで、壊れていないシーケンスを含むリストがいくつか必要になります。

数字のリストを読みやすく/記述しやすくしようとしています。1000 個の連続したファイルのシーケンスがある場合、ファイル ['0001','0002','0003' など...] ではなく、ファイル [0001-1000] として明確にリストできます。

Edit1 (提案に基づく): フラット化されたリストが与えられた場合、グロブパターンをどのように導出しますか?

Edit2 問題を小さな断片に分解しようとしています。ソリューションの一部の例を次に示します。data1 は機能し、data2 は 64 として 0010 を返し、data3 (実際のデータ) は機能しません。

# Find runs of consecutive numbers using groupby.  The key to the solution
# is differencing with a range so that consecutive numbers all appear in
# same group.
from operator import itemgetter
from itertools import *

data1=[01,02,03,10,11,100,9999]
data2=[0001,0002,0003,0010,0011,0100,9999]
data3=['image_0001','image_0002','image_0003','image_0010','image_0011','image_0011-2','image_0011-3','image_0100','image_9999']

list1 = []
for k, g in groupby(enumerate(data1), lambda (i,x):i-x):
    list1.append(map(itemgetter(1), g))
print 'data1'
print list1

list2 = []
for k, g in groupby(enumerate(data2), lambda (i,x):i-x):
    list2.append(map(itemgetter(1), g))
print '\ndata2'
print list2

戻り値：

data1
[[1, 2, 3], [10, 11], [100], [9999]]

data2
[[1, 2, 3], [8, 9], [64], [9999]]

score 6 · Accepted Answer

開始点として追加したコードを使用して、達成したいことの実用的な実装を次に示します。

#!/usr/bin/env python

import itertools
import re

# This algorithm only works if DATA is sorted.
DATA = ["image_0001", "image_0002", "image_0003",
        "image_0010", "image_0011",
        "image_0011-1", "image_0011-2", "image_0011-3",
        "image_0100", "image_9999"]

def extract_number(name):
    # Match the last number in the name and return it as a string,
    # including leading zeroes (that's important for formatting below).
    return re.findall(r"\d+$", name)[0]

def collapse_group(group):
    if len(group) == 1:
        return group[0][1]  # Unique names collapse to themselves.
    first = extract_number(group[0][1])  # Fetch range
    last = extract_number(group[-1][1])  # of this group.
    # Cheap way to compute the string length of the upper bound,
    # discarding leading zeroes.
    length = len(str(int(last)))
    # Now we have the length of the variable part of the names,
    # the rest is only formatting.
    return "%s[%s-%s]" % (group[0][1][:-length],
        first[-length:], last[-length:])

groups = [collapse_group(tuple(group)) \
    for key, group in itertools.groupby(enumerate(DATA),
        lambda(index, name): index - int(extract_number(name)))]

print groups

これ['image_000[1-3]', 'image_00[10-11]', 'image_0011-[1-3]', 'image_0100', 'image_9999']は、あなたが望むものです。

歴史： @Mark Ransomが以下で指摘したように、私は最初に質問に逆に答えました。歴史のために、私の最初の答えは次のとおりでした。

globを探しています。試す：

import glob
images = glob.glob("image_[0-9]*")

または、あなたの例を使用して：

images = [glob.glob(pattern) for pattern in ("image_000[1-3]*",
    "image_00[10-11]*", "image_0011-[1-3]*", "image_9999*")]
images = [image for seq in images for image in seq]  # flatten the list

score 3 · Accepted Answer

さて、あなたの質問は魅力的なパズルであることがわかりました。数値範囲を「圧縮」する方法 (TODO としてマーク) は残しておきます。フォーマットの好みや、要素の最小数または最小文字列が必要かどうかに応じて、それを達成するさまざまな方法があるためです。説明の長さ。

このソリューションでは、単純な正規表現 (数字文字列) を使用して、各文字列を静的と変数の 2 つのグループに分類します。データが分類されたら、groupby を使用して静的データを最長一致グループに収集し、要約効果を実現します。整数インデックスセンティナルを (matchGrouper で) 結果に混ぜて、すべての要素から可変部分を (アンパックで) 再選択できるようにします。

import re
import glob
from itertools import groupby
from operator import itemgetter

def classifyGroups(iterable, reObj=re.compile('\d+')):
    """Yields successive match lists, where each item in the list is either
    static text content, or a list of matching values.

     * `iterable` is a list of strings, such as glob('images/*')
     * `reObj` is a compiled regular expression that describes the
            variable section of the iterable you want to match and classify
    """
    def classify(text, pos=0):
        """Use a regular expression object to split the text into match and non-match sections"""
        r = []
        for m in reObj.finditer(text, pos):
            m0 = m.start()
            r.append((False, text[pos:m0]))
            pos = m.end()
            r.append((True, text[m0:pos]))
        r.append((False, text[pos:]))
        return r

    def matchGrouper(each):
        """Returns index of matches or origional text for non-matches"""
        return [(i if t else v) for i,(t,v) in enumerate(each)]

    def unpack(k,matches):
        """If the key is an integer, unpack the value array from matches"""
        if isinstance(k, int):
            k = [m[k][1] for m in matches]
        return k

    # classify each item into matches
    matchLists = (classify(t) for t in iterable)

    # group the matches by their static content
    for key, matches in groupby(matchLists, matchGrouper):
        matches = list(matches)
        # Yield a list of content matches.  Each entry is either text
        # from static content, or a list of matches
        yield [unpack(k, matches) for k in key]

最後に、出力のきれいな印刷を実行するのに十分なロジックを追加し、例を実行します。

def makeResultPretty(res):
    """Formats data somewhat like the question"""
    r = []
    for e in res:
        if isinstance(e, list):
            # TODO: collapse and simplify ranges as desired here
            if len(set(e))<=1:
                # it's a list of the same element
                e = e[0]
            else: 
                # prettify the list
                e = '['+' '.join(e)+']'
        r.append(e)
    return ''.join(r)

fnList = sorted(glob.glob('images/*'))
re_digits = re.compile(r'\d+')
for res in classifyGroups(fnList, re_digits):
    print makeResultPretty(res)

私の画像のディレクトリは、あなたの例から作成されました。テスト用に、fnList を次のリストに置き換えることができます。

fnList = [
 'images/image_0001.jpg',
 'images/image_0002.jpg',
 'images/image_0003.jpg',
 'images/image_0010.jpg',
 'images/image_0011-1.jpg',
 'images/image_0011-2.jpg',
 'images/image_0011-3.jpg',
 'images/image_0011.jpg',
 'images/image_9999.jpg']

このディレクトリに対して実行すると、出力は次のようになります。

StackOverflow/3926936% python classify.py
images/image_[0001 0002 0003 0010].jpg
images/image_0011-[1 2 3].jpg
images/image_[0011 9999].jpg

score 2 · Accepted Answer

def ranges(sorted_list):
    first = None
    for x in sorted_list:
        if first is None:
            first = last = x
        elif x == increment(last):
            last = x
        else:
            yield first, last
            first = last = x
    if first is not None:
        yield first, last

increment関数は、読者の演習として残されています。

編集:入力として文字列の代わりに整数を使用する方法の例を次に示します。

def increment(x): return x+1

list(ranges([1,2,3,4,6,7,8,10]))
[(1, 4), (6, 8), (10, 10)]

入力内の連続する範囲ごとに、範囲の開始と終了を示すペアを取得します。要素が範囲の一部でない場合、開始値と終了値は同じです。

python - 効率的な方法で画像シーケンスをリストする方法は? Python での数列比較

3 に答える 3

Related

Reference