c - Cで非常に大きな配列をソートする方法

Question

long long私は C で400 万のオーダーでソートしたいと考えています。通常malloc()、配列として使用して呼び出すバッファだけですが、 qsort()400 万 * 8 バイトは連続したメモリの 1 つの巨大なチャンクです。

これを行う最も簡単な方法は何ですか? これについては、純粋な速度よりも使いやすさを評価します。私はライブラリを使用したくないので、結果は Windows と Linux の両方で適度なネットブックで実行する必要があります。

score 11 · Accepted Answer

バッファを割り当ててを呼び出すだけqsortです。最近では、控えめなネットブックでも 32MB はそれほど大きくありません。

本当に分割する必要がある場合: 小さいチャンクをソートし、それらをファイルに書き込み、それらをマージします (マージでは、マージされる各要素に対して 1 つの線形パスが必要です)。しかし、本当に、しないでください。並べるだけ。

(Knuth の第 2 巻には、「外部ソート」と呼ばれるソートアンドマージアプローチについての良い議論があります。Knuth がそれを書いていたとき、外部データは磁気テープにあったはずですが、原理はあまりよくありません。ディスクの場合は異なります: I/O を可能な限りシーケンシャルにする必要があります. SSD ではトレードオフが少し異なります.)

score 1 · Accepted Answer

1

32MB? それは大きすぎません....クイックソートでうまくいくはずです。

于 2011-04-07T21:49:22.520 に答える

score 0 · Accepted Answer

Your best option would be to prevent having the data unordered if possible. Like it has been mentioned, you'd be better of reading the data from disk (or network or whatever the source) directly into a selforganizing container (a tree, perhaps std::set will do).

That way, you'll never have to sort through the lot, or have to worry about memory management. If you know the required capacity of the container, you might squeeze out additional performance by using std::vector(initialcapacity) or call vector::reserve up front.

You'd then best be advised to use std::make_heap to heapify any existing elements, and then add element by element using push_heap (see also pop_heap). This essentially is the same paradigm as the self-ordering set but

duplicates are ok
the storage is 'optimized' as a flat array (which is perfect for e.g. shared memory maps or memory mapped files)

(Oh, minor detail, note that sort_heap on the heap takes at most N log N comparisons, where N is the number of elements)

Let me know if you think this is an interesting approach. I'd really need a bit more info on the use case

c - Cで非常に大きな配列をソートする方法

3 に答える 3

Related

Reference