sorting - CUDAスラストを使用して多くのアレイを同時にソートする

Question

20+すでにGPU上にある、それぞれ同じ長さの配列を同じキーで並べ替える必要があります。キーもソートされるため、直接使用することはできませんsort_by_key()（次の配列をソートするのにキーが役に立たなくなります）。これが私が代わりに試したことです：

thrust::device_vector<int>  indices(N); 
thrust::sequence(indices.begin(),indices.end());
thrust::sort_by_key(keys.begin(),keys.end(),indices.begin());

thrust::gather(indices.begin(),indices.end(),a_01,a_01);
thrust::gather(indices.begin(),indices.end(),a_02,a_02);
...
thrust::gather(indices.begin(),indices.end(),a_20,a_20);

gather()これは、入力とは異なる出力の配列を想定しているため、機能していないようです。つまり、これは機能します。

thrust::gather(indices.begin(),indices.end(),a_01,o_01);
...

20+ただし、このタスクに余分な配列を割り当てたくないのです。ここに似た、thrust :: tuple、thrust :: zip_iterator、thrust :: sort_by_keys（）を使用した解決策があることを私は知っています。10ただし、タプル内の配列までしか組み合わせることができません。キーベクトルを再度複製する必要があります。このタスクにどのように取り組みますか？

score 4 · Accepted Answer

複数の配列をソートする古典的な方法は、 2回使用するいわゆるバックツーバックアプローチだと思います。同じ配列内の要素が同じキーを持つように、キーベクトルthrust::stable_sort_by_keyを作成する必要があります。例えば：

Elements: 10.5 4.3 -2.3 0. 55. 24. 66.
Keys:      0    0    0  1   1   1   1

この場合、2つの配列があります。最初の配列には3要素があり、2番目の配列には要素があり4ます。

thrust::stable_sort_by_key最初に、次のようなキーとしてマトリックス値を使用して呼び出す必要があります。

thrust::stable_sort_by_key(d_matrix.begin(),
                           d_matrix.end(),
                           d_keys.begin(),
                           thrust::less<float>());

その後、あなたは

Elements: -2.3 0 4.3 10.5 24. 55. 66.
Keys:       0  1  0    0   1   1   1

つまり、配列要素は順序付けられていますが、キーは順序付けられていません。次に、電話をかけるのに1秒必要ですthrust::stable_sort_by_key

thrust::stable_sort_by_key(d_keys.begin(),
                           d_keys.end(),
                           d_matrix.begin(),
                           thrust::less<int>());

そのため、キーに従って並べ替えを実行します。そのステップの後、あなたは

Elements: -2.3 4.3 10.5 0 24. 55. 66.
Keys:       0   0   0   1  1   1   1

これが最終的な望ましい結果です。

以下に、次の問題を考慮した完全な実例を示します。行列の各行を個別に並べ替えます。これは、すべてのアレイの長さが同じである特定のケースですが、このアプローチは、長さが異なる可能性のあるアレイで機能します。

#include <cublas_v2.h>

#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/generate.h>
#include <thrust/sort.h>
#include <thrust/functional.h>
#include <thrust/random.h>
#include <thrust/sequence.h>

#include <stdio.h>
#include <iostream>

#include "Utilities.cuh"

/**************************************************************/
/* CONVERT LINEAR INDEX TO ROW INDEX - NEEDED FOR APPROACH #1 */
/**************************************************************/
template <typename T>
struct linear_index_to_row_index : public thrust::unary_function<T,T> {

    T Ncols; // --- Number of columns

    __host__ __device__ linear_index_to_row_index(T Ncols) : Ncols(Ncols) {}

    __host__ __device__ T operator()(T i) { return i / Ncols; }
};

/********/
/* MAIN */
/********/
int main()
{
    const int Nrows = 5;     // --- Number of rows
    const int Ncols = 8;     // --- Number of columns

    // --- Random uniform integer distribution between 10 and 99
    thrust::default_random_engine rng;
    thrust::uniform_int_distribution<int> dist(10, 99);

    // --- Matrix allocation and initialization
    thrust::device_vector<float> d_matrix(Nrows * Ncols);
    for (size_t i = 0; i < d_matrix.size(); i++) d_matrix[i] = (float)dist(rng);

    // --- Print result
    printf("Original matrix\n");
    for(int i = 0; i < Nrows; i++) {
        std::cout << "[ ";
        for(int j = 0; j < Ncols; j++)
            std::cout << d_matrix[i * Ncols + j] << " ";
        std::cout << "]\n";
    }

    /*************************/
    /* BACK-TO-BACK APPROACH */
    /*************************/
    thrust::device_vector<float> d_keys(Nrows * Ncols);

    // --- Generate row indices
    thrust::transform(thrust::make_counting_iterator(0),
                      thrust::make_counting_iterator(Nrows*Ncols),
                      thrust::make_constant_iterator(Ncols),
                      d_keys.begin(),
                      thrust::divides<int>());

    // --- Back-to-back approach
    thrust::stable_sort_by_key(d_matrix.begin(),
                               d_matrix.end(),
                               d_keys.begin(),
                               thrust::less<float>());

    thrust::stable_sort_by_key(d_keys.begin(),
                               d_keys.end(),
                               d_matrix.begin(),
                               thrust::less<int>());

    // --- Print result
    printf("\n\nSorted matrix\n");
    for(int i = 0; i < Nrows; i++) {
        std::cout << "[ ";
        for(int j = 0; j < Ncols; j++)
            std::cout << d_matrix[i * Ncols + j] << " ";
        std::cout << "]\n";
    }

    return 0;
}

score 1 · Accepted Answer

device_vector代わりにポインタを操作しても問題がない場合は、実際には1つの追加の配列を割り当てるだけで済みます。

thrust::device_vector<int>  indices(N); 
thrust::sequence(indices.begin(),indices.end());
thrust::sort_by_key(keys.begin(),keys.end(),indices.begin());

thrust::device_vector<int> temp(N);
thrust::device_vector<int> *sorted = &temp;
thrust::device_vector<int> *pa_01 = &a_01;
thrust::device_vector<int> *pa_02 = &a_02;
...
thrust::device_vector<int> *pa_20 = &a_20;

thrust::gather(indices.begin(), indices.end(), *pa_01, *sorted);
pa_01 = sorted; sorted = &a_01;
thrust::gather(indices.begin(), indices.end(), *pa_02, *sorted);
pa_02 = sorted; sorted = &a_02;
...
thrust::gather(indices.begin(), indices.end(), *pa_20, *sorted);
pa_20 = sorted; sorted = &a_20;

またはそのようなものはとにかく動作するはずです。スコープ外になったときに一時デバイスベクトルが自動的に割り当て解除されないように修正する必要があります-自動device_vectorsを使用する代わりに、cudaMallocを使用してCUDAデバイスポインターを割り当て、device_ptrでラップすることをお勧めします。

sorting - CUDAスラストを使用して多くのアレイを同時にソートする

2 に答える 2

Related

Reference