python - Python-膨大な数の組み合わせから特定の基準を満たすサブリストを作成する

Question

長い間読んでいて、初めて自分が取り組んでいることに対する答えを見つけることができませんでした。

それぞれ6文字の長さの93個の文字列のリストがあります。それらの93個の文字列から、セット内の他の文字列と比較して特定の基準を満たす20個のセットを特定したいと思います。itertools.combinationsはすべての可能な組み合わせを提供しますが、すべての条件を確認する価値があるわけではありません。

たとえば、list[0]とlist[1]を一緒に使用できないために、[list [0]、list [1]など]が失敗した場合、他の18個の文字列が何であっても、セットは毎回失敗します。そしてそれは無駄なチェックのトンです。

現在、これは20個のネストされたforループで動作していますが、より良い/より高速な方法が必要なようです。

for n1 in bclist:
    building = [n1]
    n2bclist = [bc for bc in bclist if bc not in building]
    for n2 in n2bclist:              #this is the start of what gets repeated 19 times
        building.append(n2)
        if test_function(building): #does set fail? (counter intuitive, True when fail, False when pass)
            building.remove(n2)
            continue
        n3bclist = [bc for bc in bclist if bc not in building]
        #insert the additional 19 for loops, with n3 in n3, n4 in n4, etc
        building.remove(n2)

20番目のforループには、20個のセットが存在する場合でも警告するprintステートメントがあります。forステートメントを使用すると、少なくとも1つの加算が失敗したときにセットを早期にスキップできますが、より大きな組み合わせが失敗したときの記憶はありません。

たとえば失敗するので、どちらのパス[list[0], list[1]]にスキップします。[list[0], [list[2]]次は[list[0], list[2], list[1]]、0と1が再び一緒になり、通過する場合と通過しない場合に移動するため、失敗し[list[0], list[2], list[3]]ます。私の懸念は、最終的には次のこともテストすることです。

[list[0], list[3], list[2]]
[list[2], list[0], list[3]]
[list[2], list[3], list[0]]
[list[3], list[0], list[2]]
[list[3], list[2], list[0]]

これらの組み合わせはすべて、前の組み合わせと同じ結果になります。基本的に、私はitertools.combinationsの悪魔を交換して、値の順序を気にしないときに値の順序を要因として扱うforループの悪魔に失敗する初期の値のために、失敗することがわかっているセットのすべての組み合わせをテストします。どちらの方法でも、コードが完了するまでにかかる時間が大幅に長くなります。

悪魔を取り除く方法についてのアイデアは大歓迎です。

score 1 · Accepted Answer

現在の方法を使用しますが、インデックスも追跡して、内部ループですでにチェックした要素をスキップできるようにします。

bcenum = list(enumerate(bclist))
for i1, n1 in bcenum:
    building = [n1]
    for i2, n2 in bcenum[i1+1:]:              #this is the start of what gets repeated 19 times
        building.append(n2)
        if test_function(building): #does set fail? (counter intuitive, True when fail, False when pass)
            building.remove(n2)
            continue
        for i3, n3 in bcenum[i2+1:]:
            # more nested loops
        building.remove(n2)

score 1 · Accepted Answer

def gen(l, n, test, prefix=()):
  if n == 0:
    yield prefix
  else:
    for i, el in enumerate(l):
      if not test(prefix + (el,)):
        for sub in gen(l[i+1:], n - 1, test, prefix + (el,)):
          yield sub

def test(l):
  return sum(l) % 3 == 0 # just a random example for testing

print list(gen(range(5), 3, test))

これにより、カーディナリティのサブセットが次のように選択さnれlますtest(subset) == False。

不必要な作業を避けようとします。ただし、93から20の要素を選択する1e20の方法があることを考えると、全体的なアプローチを再考する必要があるかもしれません。

score 0 · Accepted Answer

問題の2つの側面を利用できます。

順序は関係ありません
の場合test_function(L)はTrue、test_functionのサブリストLもTrue

list[0]---の代わりにインデックス0-92を処理することで、物事を少し単純化することもできます。リストの内容が何であるかを気にするのはそのlist[92]範囲内だけです。test_function

次のコードは、最初に実行可能なペアを見つけ、次に4つのセット、8つのセット、および16のセットを見つけることによってそれを行います。最後に、16と4のすべての実行可能な組み合わせを見つけて、20のリストを取得します。ただし、8のセットが100,000を超えていたため、それでも遅すぎて諦めました。おそらくあなたは同じ線に沿って何かをすることができますが、それをスピードアップしますitertoolsが、おそらく十分ではありません。

target = range(5, 25)
def test_function(L):
    for i in L:
        if not i in target:
            return True
def possible_combos(A, B):
    """
    Find all possible pairings of a list within A and a list within B
    """
    R = []
    for i in A:
        for j in B:
            if i[-1] < j[0] and not test_function(i + j):
                R.append(i + j)
    return R
def possible_doubles(A):
    """
    Find all possible pairings of two lists within A
    """
    R = []
    for n, i in enumerate(A):
        for j in A[n + 1:]:
            if i[-1] < j[0] and not test_function(i + j):
                R.append(i + j)
    return R
# First, find all pairs that are okay
L = range(92) 
pairs = []
for i in L:
    for j in L[i + 1:]:
        if not test_function([i, j]):
            pairs.append([i, j])

# Then, all pairs of pairs
quads = possible_doubles(pairs)
print "fours", len(quads), quads[0]
# Then all sets of eight, and sixteen
eights = possible_doubles(quads)
print "eights", len(eights), eights[0]
sixteens = possible_doubles(eights)
print "sixteens", len(sixteens), sixteens[0]

# Finally check all possible combinations of a sixteen plus a four
possible_solutions = possible_combos(sixteens, fours)
print len(possible_solutions), possible_solutions[0]

編集：私ははるかに良い解決策を見つけました。まず、に準拠する範囲（0〜92）内の値のすべてのペアを特定しtest_function、ペアを順番に保持します。おそらく、最初のペアの最初の値は解の最初の値である必要があり、最後のペアの2番目の値は解の最後の値である必要があります（ただし、この仮定は当てはまりtest_functionますか？そうでない場合安全な仮定find_paths では、開始と終了のすべての可能な値に対して繰り返す必要があります）。次に、最初から最後の値までのパスを見つけます。これは、長さが20値で、に準拠していtest_functionます。

def test_function(S):
    for i in S:
        if not i in target:
            return True
    return False

def find_paths(p, f):
    """ Find paths from end of p to f, check they are the right length,
        and check they conform to test_function
    """
    successful = []
    if p[-1] in pairs_dict:
        for n in pairs_dict[p[-1]]:
            p2 = p + [n]
            if (n == f and len(p2) == target_length and
                not test_function(p2)):
                successful.append(p2)
            else:
                successful += find_paths(p2, f)
    return successful

list_length = 93              # this is the number of possible elements
target = [i * 2 for i in range(5, 25)] 
    # ^ this is the unknown target list we're aiming for...
target_length = len(target)   # ... we only know its length
L = range(list_length - 1)
pairs = []
for i in L:
    for j in L[i + 1:]:
        if not test_function([i, j]):
            pairs.append([i, j])
firsts = [a for a, b in pairs]
nexts = [[b for a, b in pairs if a == f] for f in firsts]
pairs_dict = dict(zip(firsts, nexts))
print "Found solution(s):", find_paths([pairs[0][0]], pairs[-1][1])

score 0 · Accepted Answer

itertools.combinations注文の問題を解決するため、ソリューションの基礎を築く必要があります。短絡フィルタリングは比較的簡単に解決できます。

再帰的ソリューション

combinations作品の実装方法を簡単に確認しましょう。最も簡単な方法は、ネストされたforループのアプローチを採用し、それを再帰的なスタイルに変換することです。

def combinations(iterable, r):
    pool = tuple(iterable)
    for i in range(0, len(pool)):
        for j in range(i + 1, len(pool)):
            ...
                yield (i, j, ...)

再帰形式に変換：

def combinations(iterable, r):
    pool = tuple(iterable)
    def inner(start, k, acc):
        if k == r:
            yield acc
        else:
            for i in range(start, len(pool)):
                for t in inner(i + 1, k + 1, acc + (pool[i], )):
                    yield t
    return inner(0, 0, ())

フィルタの適用が簡単になりました。

def combinations_filterfalse(predicate, iterable, r):
    pool = tuple(iterable)
    def inner(start, k, acc):
        if predicate(acc):
            return
        elif k == r:
            yield acc
        else:
            for i in range(start, len(pool)):
                for t in inner(i + 1, k + 1, acc + (pool[i], )):
                    yield t
    return inner(0, 0, ())

これを確認しましょう：

>>> list(combinations_filterfalse(lambda t: sum(t) % 2 == 1, range(5), 2))
[(0, 2), (0, 4), (2, 4)]

反復ソリューション

ドキュメントitertools.combinationsにリストされているの実際の実装は、反復ループを使用します。

def combinations(iterable, r):
    pool = tuple(iterable)
    n = len(pool)
    if r > n:
        return
    indices = range(r)
    yield tuple(pool[i] for i in indices)
    while True:
        for i in reversed(range(r)):
            if indices[i] != i + n - r:
                break
        else:
            return
        indices[i] += 1
        for j in range(i+1, r):
            indices[j] = indices[j-1] + 1
        yield tuple(pool[i] for i in indices)

述語にエレガントに合わせるには、ループを少し並べ替える必要があります。

def combinations_filterfalse(predicate, iterable, r):
    pool = tuple(iterable)
    n = len(pool)
    if r > n or predicate(()):
        return
    elif r == 0:
        yield ()
        return
    indices, i = range(r), 0
    while True:
        while indices[i] + r <= i + n:
            t = tuple(pool[k] for k in indices[:i+1])
            if predicate(t):
                indices[i] += 1
            elif len(t) == r:
                yield t
                indices[i] += 1
            else:
                indices[i+1] = indices[i] + 1
                i += 1
        if i == 0:
            return
        i -= 1
        indices[i] += 1

もう一度確認します。

>>> list(combinations_filterfalse(lambda t: sum(t) % 2 == 1, range(5), 2))
[(0, 2), (0, 4), (2, 4)]
>>> list(combinations_filterfalse(lambda t: t == (1, 4), range(5), 2))
[(0, 1), (0, 2), (0, 3), (0, 4), (1, 2), (1, 3), (2, 3), (2, 4), (3, 4)]
>>> list(combinations_filterfalse(lambda t: t[-1] == 3, range(5), 2))
[(0, 1), (0, 2), (0, 4), (1, 2), (1, 4), (2, 4)]
>>> list(combinations_filterfalse(lambda t: False, range(5), 2))
[(0, 1), (0, 2), (0, 3), (0, 4), (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)]
>>> list(combinations_filterfalse(lambda t: False, range(5), 0))
[()]

比較

再帰的ソリューションは単純であるだけでなく、より高速であることがわかります。

In [33]: timeit list(combinations_filterfalse_rec(lambda t: False, range(20), 5))
10 loops, best of 3: 24.6 ms per loop

In [34]: timeit list(combinations_filterfalse_it(lambda t: False, range(20), 5))
10 loops, best of 3: 76.6 ms per loop

python - Python-膨大な数の組み合わせから特定の基準を満たすサブリストを作成する

4 に答える 4

再帰的ソリューション

反復ソリューション

比較

Related

Reference