python - Pythonで2つのリストの2つの要素をランダムにマージし、結果のエントリがすべて一意であることを確認する方法は?

Question

要素の数が等しい 2 つのリスト A と B がありますが、各リストの要素は必ずしも異なるとは限りません。

A と B の要素をランダムに組み合わせて新しいリストを作成したいと思います (ランダムな組み合わせが重要です)。

ただし、結果のリスト内の各ペアが一意であることも確認する必要があります。

これまでのところ、私は次のように問題に取り組んできました。これは小さなリストでは機能しますが、多くの組み合わせを持つ大きなリストには明らかに適していません。

from random import shuffle

# Create a list of actors and events for testing
events = ['P1','P1','P1','P2','P2','P2','P3','P3','P3','P4','P5','P6','P7','P7']
actors = ['IE','IE','ID','ID','IA','IA','IA','IC','IB','IF','IG','IH','IH','IA']

# Randomize the elements of each list
shuffle(events)
shuffle(actors)

# Merge the two lists into a new list of pairs
edgelist = zip(events,actors)

# If the new list of pairs has all unique elements, then it is a good solution, otherwise try again at random
x = set(edgelist)

if len(edgelist) == len(x):
  break
else:
  while True:
    shuffle(events)
    shuffle(actors)
    edgelist = zip(events,actors)
    x = set(edgelist)
    if len(edgelist) == len(x):
      break

# Display the solution
print 'Solution obtained: '
for item in edgelist:
  print item

より大きな入力リストにスケーリングする変更または代替アプローチを提案できる人はいますか?

役立つ返信をありがとう。

アップデート

これは、当初考えられていたよりも難しい問題であることがわかりました。私は今、解決策を持っていると思います。信じられないほどうまくスケーリングできない場合がありますが、小規模または中規模のリストでは問題なく機能します。開始する前に解決策が可能かどうかを確認するため、入力リストの分布に関する仮定は必要ありません。結果のリストの度数分布が元のリストと一致することを示すために、数行のコードも含めました。

# Randomize the elements
shuffle(events)

# Make sure a solution is possible
combinations = len(set(events))*len(set(actors))
assert combinations >= len(events) and combinations >= len(actors) and len(events) == len(actors), 'No soluton possible!'

# Merge the two lists into a new list of pairs (this will contain duplicates)
edgelist = zip(events,actors)

# Search for duplicates
counts = collections.Counter(edgelist)
duplicates = [i for i in counts if counts[i] > 1]
duplicate_count = len(duplicates)

while duplicate_count > 0:

  # Get a single duplicate to address
  duplicate = duplicates[0]

  # Find the position of the duplicate in the in edgelist
  duplicate_pos = edgelist.index(duplicate)

  # Search for a replacement
  swap = choice(edgelist)
  swap_pos = edgelist.index(swap)

  if (swap[0],duplicate[1]) not in edgelist: 
    edgelist[duplicate_pos] = (swap[0],duplicate[1])
    edgelist[swap_pos] = (duplicate[0],swap[1])

  # Update duplicate count
  counts = collections.Counter(edgelist)
  duplicates = [i for i in counts if counts[i] > 1]
  duplicate_count = len(duplicates)


# Verify resulting edgelist and frequency distributions

print 'Event Frequencies: '
print sorted([y for (x,y) in list(collections.Counter(events).items())], reverse=True)

print 'Edgelist Event Frequencies: '        
print sorted([y for (x,y) in list(collections.Counter([x for (x,y) in edgelist]).items())], reverse=True)

print 'Actor Frequencies: '        
print sorted([y for (x,y) in list(collections.Counter(actors).items())], reverse=True)

print 'Edgelist Actor Frequencies: '        
print sorted([y for (x,y) in list(collections.Counter([y for (x,y) in edgelist]).items())], reverse=True)

assert len(set(edgelist)) == len(events) == len(actors)

score 1 · Accepted Answer

そうですね、両方のリストをシャッフルする必要はありません。ペアリングは「よりランダム」にはなりません。

更新: 元のソリューションとは異なるソリューションを投稿しました。これは再帰的であり、常に有効な回答を返すことが保証されています。それが不可能な場合は None を返します。

from random import shuffle

def merge(events, actors, index=None):
    """Returns list of unique pairs of events and actors, none of which may on index"""
    if len(events) == 0:
        return []
    if index is None:
        index = set()

    merged = None
    tried_pairs = set()
    for event in events:
        for i, actor in enumerate(actors):
            pair = (event, actor)
            if pair not in index and pair not in tried_pairs:
                new_index = index.union([pair])
                rest = merge(events[1:], actors[:i]+actors[i+1:], new_index)

                if rest is not None:
                    # Found! Done.
                    merged = [pair] + rest
                    break
                else:
                    tried_pairs.add(pair)
        if merged is not None:
            break

    return merged


if __name__ == '__main__':
    _events = ['P1','P1','P1','P2','P2','P2','P3','P3','P3','P4','P5','P6','P7','P7']
    _actors = ['IE','IE','ID','ID','IA','IA','IA','IC','IB','IF','IG','IH','IH','IA']
    shuffle(_events)
    edgelist = merge(_events, _actors)

    if edgelist is not None:
        print 'Solution obtained: '
        for item in edgelist:
            print item
    else:
        print 'No possible solution with these events and actors.'

注：「インデックス」変数は、OPのソリューションでのエッジリストのチェックに似ています。「tried_pairs」変数は、特定の再帰ステップごとの最適化にすぎず、同じペアを何度も再試行することを回避します (たとえば、アクターに複数の連続した同一のアイテムがある場合)。

score 0 · Accepted Answer

これはうまくいくようですが、どれだけうまくスケーリングできるかは完全にはわかりません...

# Create tuples of lists of actors and events for testing
testcases = [
    (['P1','P1','P1','P2','P2','P2','P3','P3','P3','P4','P5','P6','P7','P7'],
     ['IE','IE','ID','ID','IA','IA','IA','IC','IB','IF','IG','IH','IH','IA']),

    (['P1','P1','P1','P1','P1','P1','P1','P2','P2','P2','P2','P2','P2','P2'],
     ['IE','IE','IE','IE','IE','IE','IE','ID','ID','ID','ID','ID','ID','ID']),

    (['P1','P2','P3','P4','P5','P6','P7','P8','P9','PA','PB','PC','PD','PE'],
     ['IE','IE','IE','IE','IE','IE','IE','ID','ID','ID','ID','ID','ID','ID']),
]

events, actors = testcases[0]

import random

def random_choice(items):
    """ yield list items in random order """
    items = items[:]  # preserve input argument
    random.shuffle(items)
    while items:
        yield items.pop()

pairs = set()
for event in random_choice(events):
    for index in random_choice(range(len(actors))):
        pair = (event, actors[index])
        if pair not in pairs:  # unique?
            pairs.add(pair)
            actors.pop(index)
            break

# Display the solution (in sorted order)
print 'Solution obtained has %d unique pairs: ' % len(pairs)
for item in sorted(list(pairs)):
    print item

リストからランダムな要素を選択し、それを削除するにはどうすればよいですか?という質問に対する両方の回答からアイデアを使用しました。.

score -1 · Accepted Answer

# Create a list of actors and events for testing
events = ['P1','P1','P1','P2','P2','P2','P3','P3','P3','P4','P5','P6','P7','P7']
actors = ['IE','IE','ID','ID','IA','IA','IA','IC','IB','IF','IG','IH','IH','IA']

# Randomize the elements
shuffle(events)

# Merge the two lists into a new list of pairs
edgelist = zip(events,actors)

# remove duplicates
x = set(edgelist)

# If there were not enough unique elements : add new ones as needed.
while len(x)<len(edgelist):
    x.add((choice(events), choice(actor)))

# Display the solution
print 'Solution obtained: '
for item in x:
  print item

python - Pythonで2つのリストの2つの要素をランダムにマージし、結果のエントリがすべて一意であることを確認する方法は?

3 に答える 3

Related

Reference