python - Pythonの複数のリストで最も類似した番号を見つける

Question

Pythonでは、0〜360の範囲の浮動小数点数（角度）の3つのリストがあり、リストは同じ長さではありません。数字が最も近いトリプレット（各リストから1つの数字）を見つける必要があります。（これは実際のデータであるため、いずれの数値も同じになる可能性はほとんどありません。）一致を測定するために単純な最低標準偏差の方法を使用することを考えていましたが、これを実装します。ネストされたforループを使用して、すべての可能な組み合わせの標準偏差を比較し、各リストをループして、一時変数に最良に一致するトリプレットのインデックスを保存させることができますが、誰かがより良いまたはよりエレガントな方法を持っているかどうか疑問に思いましたこのようなことをしてください。ありがとう！

score 6 · Accepted Answer

これを行うための確立されたアルゴリズムがあれば、私は驚かないでしょう。もしそうなら、あなたはそれを使うべきです。でも、1つはわからないので、少し推測します。

私がそれをしなければならなかった場合、私が最初に試みることは、すべての数字のすべての可能な組み合わせをループして、それにかかる時間を確認することです。データセットが十分に小さい場合は、巧妙なアルゴリズムを発明する価値はありません。セットアップを示すために、サンプルコードを含めます。

# setup
def distance(nplet):
    '''Takes a pair or triplet (an "n-plet") as a list, and returns its distance.
    A smaller return value means better agreement.'''
    # your choice of implementation here. Example:
    return variance(nplet)

# algorithm
def brute_force(*lists):
    return min(itertools.product(*lists), key = distance)

大規模なデータセットの場合は、次のようにします。最初に、最初のリストの番号ごとに1つのトリプレットを作成し、最初のエントリをその番号に設定します。次に、この部分的に塗りつぶされたトリプレットのリストを確認し、それぞれについて、最初のリストの番号に最も近い2番目のリストから番号を選択し、それをトリプレットの2番目のメンバーとして設定します。次に、トリプレットのリストを確認し、それぞれについて、最初の2つの数値に最も近い3番目のリストから数値を選択します（合意メトリックで測定）。最後に、束を最大限に活用します。このサンプルコードは、リストの長さでランタイムを線形に保つ方法を示しています。

def item_selection(listA, listB, listC):
    # make the list of partially-filled triplets
    triplets = [[a] for a in listA]
    iT = 0
    iB = 0
    while iT < len(triplets):
        # make iB the index of a value in listB closes to triplets[iT][0]
        while iB < len(listB) and listB[iB] < triplets[iT][0]:
            iB += 1
        if iB == 0:
            triplets[iT].append(listB[0])
        elif iB == len(listB)
            triplets[iT].append(listB[-1])
        else:
            # look at the values in listB just below and just above triplets[iT][0]
            # and add the closer one as the second member of the triplet
            dist_lower = distance([triplets[iT][0], listB[iB]])
            dist_upper = distance([triplets[iT][0], listB[iB + 1]])
            if dist_lower < dist_upper:
                triplets[iT].append(listB[iB])
            elif dist_lower > dist_upper:
                triplets[iT].append(listB[iB + 1])
            else:
                # if they are equidistant, add both
                triplets[iT].append(listB[iB])
                iT += 1
                triplets[iT:iT] = [triplets[iT-1][0], listB[iB + 1]]
        iT += 1
    # then another loop while iT < len(triplets) to add in the numbers from listC
    return min(triplets, key = distance)

問題は、これが実際に最良のトリプレットを見つけられない状況を想像することができます。たとえば、最初のリストの数値が2番目のリストの数値に近いが、3番目のリストの数値にはまったく近くない場合です。したがって、試すことができるのは、リストの6つの可能な順序すべてに対してこのアルゴリズムを実行することです。それが最良のトリプレットを見つけることができない特定の状況を考えることはできませんが、それでも存在する可能性があります。いずれにせよ、リストがソートされていると仮定すると、巧妙な実装を使用すると、アルゴリズムはO（N）のままになります。

def symmetrized_item_selection(listA, listB, listC):
    best_results = []
    for ordering in itertools.permutations([listA, listB, listC]):
        best_results.extend(item_selection(*ordering))
    return min(best_results, key = distance)

もう1つのオプションは、リスト1とリスト2の間、リスト1とリスト3の間、およびリスト2とリスト3の間で、考えられるすべての数値のペアを計算することです。数字。最も近いペアから始めて、ペアごとにリストを調べ、すでに見たものと番号を共有するペアに遭遇したときはいつでも、それらをトリプレットにマージします。適切な合意の尺度として、最初のトリプレットを見つけたら、反復する必要のある最大ペア距離が得られます。それに到達したら、最も近いトリプレットを選択するだけです。見つかった。それは一貫して可能な限り最良のトリプレットを見つけるはずだと思いますが、ペアのリストをソートする必要があるため、O（N ^ 2 log N）になります。

def pair_sorting(listA, listB, listC):
    # make all possible pairs of values from two lists
    # each pair has the structure ((number, origin_list),(number, origin_list))
    # so we know which lists the numbers came from
    all_pairs = []
    all_pairs += [((nA,0), (nB,1)) for (nA,nB) in itertools.product(listA,listB)]
    all_pairs += [((nA,0), (nC,2)) for (nA,nC) in itertools.product(listA,listC)]
    all_pairs += [((nB,1), (nC,2)) for (nB,nC) in itertools.product(listB,listC)]
    all_pairs.sort(key = lambda p: distance(p[0][0], p[1][0]))
    # make a dict to track which (number, origin_list)s we've already seen
    pairs_by_number_and_list = collections.defaultdict(list)
    min_distance = INFINITY
    min_triplet = None
    # start with the closest pair
    for pair in all_pairs:
        # for the first value of the current pair, see if we've seen that particular
        # (number, origin_list) combination before
        for pair2 in pairs_by_number_and_list[pair[0]]:
            # if so, that means the current pair shares its first value with
            # another pair, so put the 3 unique values together to make a triplet
            this_triplet = (pair[1][0], pair2[0][0], pair2[1][0])
            # check if the triplet agrees more than the previous best triplet
            this_distance = distance(this_triplet)
            if this_distance < min_distance:
                min_triplet = this_triplet
                min_distance = this_distance
        # do the same thing but checking the second element of the current pair
        for pair2 in pairs_by_number_and_list[pair[1]]:
            this_triplet = (pair[0][0], pair2[0][0], pair2[1][0])
            this_distance = distance(this_triplet)
            if this_distance < min_distance:
                min_triplet = this_triplet
                min_distance = this_distance
        # finally, add the current pair to the list of pairs we've seen
        pairs_by_number_and_list[pair[0]].append(pair)
        pairs_by_number_and_list[pair[1]].append(pair)
    return min_triplet

注意：この回答のすべてのコードサンプルは、実際に行うよりも少し明確に記述して、それらがどのように機能するかを理解できるようにしています。しかし、実際にそれを行うときは、より多くのリスト内包表記などを使用します。

NB2。コードが機能するという保証はありません:-Pですが、大まかなアイデアが得られるはずです。

python - Pythonの複数のリストで最も類似した番号を見つける

1 に答える 1

Related

Reference