performance - 長いリストで発生するペアを検出するためのアルゴリズム

Question

N個の（必ずしも明確である必要はない）数の長いシーケンスが与えられた場合、

{1, 50, 3, 99, 1, 2, 100, 99, 4, 100, 4, 100} (could be very long)

そして、M個の順序対の小さなセット、

(1, 2)

(2, 1)

(1, 3)    

(99, 50)

(99, 100)

順序対がリストのどこかにあるかどうかを検出したいと思います（それらは分離できますが、順序は重要です）。たとえば、上記のカウントは次のようになります。

(1, 2): 2 (each 1 pairs with the later 2)

(2, 1): 0 (no 1's come after the 2)

(1, 3): 1 (only one of the 1's come before the 3)

(99, 50): 0 (no 99's come before the 50)

(99, 100): 5 (3 times for the first 99 and 2 times for the second)

順序対のすべての数値がリストに表示されることが保証されていると仮定すると、これらのカウントを単純なO（N * M）時間（順序対ごとにブルートフォース検索によって達成される）よりも速く抽出するアルゴリズムはありますか？

副次的な質問として、カウントではなくブール値の発生のみに制限する場合、高速なアルゴリズムがあるでしょうか。あれは：

(1, 2): yes

(2, 1): no

(1, 3): yes

(99, 50): no

(99, 100): yes

どんな助けでもいただければ幸いです。

score 5 · Accepted Answer

2つのハッシュを保持します。1つはそれらが発生する最小の位置へのマッピング番号であり、もう1つはそれらが発生する最大の位置へのマッピング番号です。順序対（a、b）は、least [a] <greatest [b]の場合（および両方のハッシュキーが存在する場合）に順番に表示されます。前処理時間は線形であり、スペース使用量は線形であり、クエリ時間は一定です（ハッシュの複雑さに関する標準的な仮定の下で）。

カウントバージョンに関しては、私が考えることができる最善の方法は、各番号をソートされた順序で発生する位置に1つのハッシュマッピングを維持することです。ペアをクエリするには、位置リストを「マージ」し、これまでのa要素の数とペアの発生数を追跡します。次のb要素を選択した場合は、ペアの数をa要素の数だけ増やします。次の要素を選択した場合は、要素の数を増やしてください。（a == bの場合、戻り長は2を選択します。）

score 1 · Accepted Answer

これがO（n）ソリューションです...

unordered_map<int, unordered_set<int>> pairs = ...;

void process(int n)
{
    // keep list of pairs that have their first seen, indexed by second...
    static unordered_map<int, vector<pair<int,int>>> live;

    // if next item is in live list, we have found some pairs...
    for (auto found_pair : live[n])
        process(found_pair);

    // add pairs to live list that have a first of the current item
    for (auto pair : pairs[n])
        for (auto second : pair.second)
           live.insert(second, make_pair(pair.first, second));
}

score 1 · Accepted Answer

アクティブなペアのリストを保持し、番号のリストをループすることができます。ペアの最初の番号を見つけるたびに、そのペアをアクティブリストにコピーします。アクティブリストでペアの2番目の番号を見つけるたびに、そのペアのカウントを増やします。

C＃の例：

public class Pair {

  public int First { get; private set; }
  public int Second { get; private set; }
  public int Count { get; set; }

  public Pair(int first, int second) {
    First = first;
    Second = second;
    Count = 0;
  }

}

int[] values = {1, 50, 3, 99, 1, 2, 100, 99, 4, 100, 4, 100};

List<Pair> pairs = new List<Pair>();
pairs.Add(new Pair(1, 2));
pairs.Add(new Pair(2, 1));
pairs.Add(new Pair(1, 3));
pairs.Add(new Pair(99, 50));
pairs.Add(new Pair(99, 100));

List<Pair> active = new List<Pair>();

foreach (int value in values) {
  foreach (Pair p in active) {
    if (p.Second == value) {
      p.Count++;
    }
  }
  foreach (Pair p in pairs) {
    if (p.First == value) {
      active.Add(p);
    }
  }
}

foreach (Pair p in pairs) {
  Console.WriteLine("({0},{1}) : Count: {2}", p.First, p.Second, p.Count);
}

出力：

(1,2) : Count: 2
(2,1) : Count: 0
(1,3) : Count: 1
(99,50) : Count: 0
(99,100) : Count: 5

改善の考え：

Dictionary<int, List<Pair>>ペアのリストにはを使用できます。
ペアにアクティブカウントを入れることができるので、アクティブリストに別の参照を追加する代わりに、アクティブカウントを増やします。

score -2 · Accepted Answer

すべての数が異なると仮定します。力ずくの解決策が唯一の解決策だと思いませんか。

performance - 長いリストで発生するペアを検出するためのアルゴリズム

4 に答える 4

Related

Reference