c++ - nth_element はどのように実装されていますか?

Question

nth_elementStackOverflow およびその他の場所には、 O(n)であり、通常は Introselect で実装されているという多くの主張があります: http://en.cppreference.com/w/cpp/algorithm/nth_element

これをどのように達成できるか知りたいです。ウィキペディアの Introselect の説明を見たところ、さらに混乱しました。アルゴリズムは QSort と Median-of-Medians をどのように切り替えることができますか?

ここで Introsort の論文を見つけました: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.14.5196&rep=rep1&type=pdfしかし、それは言う:

この論文では、ソートの問題に集中し、後のセクションで簡単に選択の問題に戻ります。

nth_elementがどのように実装されているかを理解するために STL 自体を読み込もうとしましたが、すぐに理解できなくなります。

Introselect の実装方法の疑似コードを教えてもらえますか? もちろん、STL 以外の実際の C++ コードでも構いません :)

score 22 · Accepted Answer

std::nth_element免責事項:が標準ライブラリでどのように実装されているかはわかりません。

クイックソートがどのように機能するかを知っていれば、このアルゴリズムに必要なことを行うように簡単に変更できます。クイックソートの基本的な考え方は、各ステップで、ピボットより小さいすべての要素が左側のサブ配列にあり、ピボット以上のすべての要素が右側のサブ配列にあるように、配列を 2 つの部分に分割することです。 . (三項クイックソートとして知られるクイックソートの修正により、すべての要素がピボットに等しい 3 番目のサブ配列が作成されます。次に、右のサブ配列には、ピボットよりも厳密に大きいエントリのみが含まれます。) 次に、左と右のサブを再帰的にソートすることでクイックソートを続行します。 -配列。

両方のサブ配列に再帰するのではなく、 n番目の要素のみを所定の位置に移動したい場合は、左または右のサブ配列に降りる必要があるかどうかをすべてのステップで知ることができます。(ソートされた配列のn番目の要素にはインデックスnがあるため、インデックスを比較する必要があるため、これを知っています。)したがって、クイックソートが最悪の場合の縮退に悩まされない限り、各ステップで残りの配列のサイズを約半分にします。 . (もう一方のサブ配列をもう一度見ることはありません。) したがって、平均して、各ステップで次の長さの配列を扱っています。

Θ( N )
Θ( N /2)
Θ( N /4)
…</li>

各ステップは、処理する配列の長さに比例します。(一度ループして、ピボットとの比較に応じて、各要素がどのサブ配列に入る必要があるかを決定します。)

Θ(log( N )) ステップの後、最終的にシングルトン配列に到達し、完了することがわかります。N (1 + 1/2 + 1/4 + …)を合計すると、 2 Nになります。または、平均的なケースでは、ピボットが常に正確に中央値になることを期待できないため、およそ Θ( N ) になります。

score 16 · Accepted Answer

You asked two questions, the titular one

How is nth_element Implemented?

Which you already answered:

There are a lot of claims on StackOverflow and elsewhere that nth_element is O(n) and that it is typically implemented with Introselect.

Which I also can confirm from looking at my stdlib implementation. (More on this later.)

And the one where you don't understand the answer:

How can an algorithm switch between QSort and Median-of-Medians?

Lets have a look at pseudo code that I extracted from my stdlib:

nth_element(first, nth, last)
{ 
  if (first == last || nth == last)
    return;

  introselect(first, nth, last, log2(last - first) * 2);
}

introselect(first, nth, last, depth_limit)
{
  while (last - first > 3)
  {
      if (depth_limit == 0)
      {
          // [NOTE by editor] This should be median-of-medians instead.
          // [NOTE by editor] See Azmisov's comment below
          heap_select(first, nth + 1, last);
          // Place the nth largest element in its final position.
          iter_swap(first, nth);
          return;
      }
      --depth_limit;
      cut = unguarded_partition_pivot(first, last);
      if (cut <= nth)
        first = cut;
      else
        last = cut;
  }
  insertion_sort(first, last);
}

Without getting into details about the referenced functions heap_select and unguarded_partition_pivot we can clearly see, that nth_element gives introselect 2 * log2(size) subdivision steps (twice as much as needed by quickselect in the best case) until heap_select kicks in and solves the problem for good.

score 10 · Accepted Answer

STL (バージョン 3.3 だと思います)のコードは次のとおりです。

template <class _RandomAccessIter, class _Tp>
void __nth_element(_RandomAccessIter __first, _RandomAccessIter __nth,
                   _RandomAccessIter __last, _Tp*) {
  while (__last - __first > 3) {
    _RandomAccessIter __cut =
      __unguarded_partition(__first, __last,
                            _Tp(__median(*__first,
                                         *(__first + (__last - __first)/2),
                                         *(__last - 1))));
    if (__cut <= __nth)
      __first = __cut;
    else 
      __last = __cut;
  }
  __insertion_sort(__first, __last);
}

それを少し単純化しましょう：

template <class Iter, class T>
void nth_element(Iter first, Iter nth, Iter last) {
  while (last - first > 3) {
    Iter cut =
      unguarded_partition(first, last,
                          T(median(*first,
                                   *(first + (last - first)/2),
                                   *(last - 1))));
    if (cut <= nth)
      first = cut;
    else 
      last = cut;
  }
  insertion_sort(first, last);
}

ここで行ったのは、ユーザーが合法的にマクロとして定義できるものからコードを保護するためだけに、二重のアンダースコアと _Uppercase を削除することでした。また、テンプレート型の推定にのみ役立つはずの最後のパラメーターを削除し、簡潔にするためにイテレーター型の名前を変更しました。

ご覧のとおり、残りの範囲に残る要素が 4 つ未満になるまで繰り返し範囲を分割し、その後単純に並べ替えます。

では、なぜ O(n) なのか? まず、最大 3 つの要素があるため、最大 3 つの要素の最終的な並べ替えは O(1) です。さて、残っているのは繰り返しの分割です。パーティション化自体は O(n) です。ただし、ここでは、すべてのステップで、次のステップで触れる必要がある要素の数が半分になるため、O(n) + O(n/2) + O(n/4) + O(n/8) になります。合計すると O(2n) 未満になります。O(2n) = O(n) であるため、平均して線形の複雑さがあります。

c++ - nth_element はどのように実装されていますか?

3 に答える 3

Related

Reference