c++ - ViennaCL: 行列ベクトル積が失敗する

Question

ViennaCL ライブラリを使用して、OpenCL で単純な行列ベクトル製品を作成しようとしています。

これが私のメインです：

#include "viennacl/scalar.hpp"
#include "viennacl/vector.hpp"
#include "viennacl/matrix.hpp"
#include "viennacl/linalg/prod.hpp"
#include "viennacl/matrix_proxy.hpp"
#include "viennacl/linalg/lu.hpp"

int main()
{
    viennacl::ocl::set_context_device_type(0, viennacl::ocl::gpu_tag());
    std::vector<viennacl::ocl::device> devices = viennacl::ocl::current_context().devices();
    viennacl::ocl::current_context().switch_device(devices[0]);

    int Nx=10;
    int Ny=10;

    //GPU vectors
    viennacl::matrix<float> vcl_A(Nx,Ny);
    viennacl::vector<float> vcl_b(Ny);
    viennacl::vector<float> vcl_c(Nx);

    //CPU vectors
    std::vector<float> stl_A(Nx*Ny);
    std::vector<float> stl_b(Ny);
    std::vector<float> stl_c(Nx);


    //filling CPU vectors

    for (unsigned int i = 0; i < Nx; ++i)
        for (unsigned int j = 0; j < Ny; ++j)
            stl_A[i*Ny + j] = (float) (rand()%100);

    for (unsigned int i = 0; i < stl_b.size(); ++i)
        stl_b[i] = (float) (rand()%100);


    //copying input data to GPU

    viennacl::fast_copy(&(stl_A[0]),
        &(stl_A[0]) + stl_A.size(),
        vcl_A);

    viennacl::fast_copy(stl_b, vcl_b);


    //launching product c = A*b

    vcl_c = viennacl::linalg::prod(vcl_A, vcl_b);


    //copying output data back to CPU

    viennacl::copy(vcl_c, stl_c);

    viennacl::backend::finish();
}

その後、私の stl_c ベクトルの最初の係数は正しく計算されますが、他の 9 つの係数は0. 次元を上の値に変更すると、ベクトルの先頭に複数の右 coef が得られますが、他のすべての coef に対して多数のゼロが得られます。

私のコピーのいくつかは間違った方法で行われていると思いますが、おそらく私の製品操作が原因です (ローカル/グローバルサイズの問題ですが、ViennaCL がすべて処理してくれると思います)。

私が間違っていることのアイデアはありますか? ヘルプやアドバイスをいただければ幸いです。

(VS 2012 でコードを実行しています。GPU は NVIDIA Geforce gtx 670 です)

score 2 · Accepted Answer

1. The problem:

The documentation for viennacl::matrix in the page manual-types-matrix states:

The internal memory buffer of a matrix<> is by default padded with zeros so that the internal matrix size is a multiple of e.g. a power of two. When using fast_copy() on a matrix, the padded zeros need to be taken into account correctly. Query internal_size1() and internal_size2() to do so.

This means that the elements of the viennacl::matrix are not contiguous, contrarily to the ones in the std::vector you use to simulate a matrix. Therefore, this line does not do what you expect:

viennacl::fast_copy(&(stl_A[0]), &(stl_A[0]) + stl_A.size(), vcl_A);

2. The solution:

So, how to properly copy a host matrix to a ViennaCL matrix?

A possibility is to use a std::vector<std::vector<float>> to represent the host matrix and then use viennacl::copy instead of vienna::fast_copy, and the padding of elements will be taken care of for you.

std::vector<std::vector<float>> stl_A(Ny);

for (unsigned int i = 0; i < Ny; ++i) {
    stl_A[i].resize(Nx);

    for (unsigned int j = 0; j < Nx; ++j)
        stl_A[i][j] = (float)(rand() % 100);
}

viennacl::copy(stl_A, vcl_A);

Another possibility, as suggested in the documentation, is to match the internal layout of a viennacl::matrix in your host matrix, by using internal_size instead of Nx and Ny when calculating element offsets (but not iterating over them).

std::vector<float> stl_A(vcl_A.internal_size());

for (unsigned int i = 0; i < Ny; ++i)
    for (unsigned int j = 0; j < Nx; ++j)
        stl_A[i*vcl_A.internal_size2() + j] = (float)(rand() % 100);

viennacl::fast_copy(&(stl_A[0]), &(stl_A[0]) + stl_A.size(), vcl_A);

3. The note:

Both code examples provided above are for row-major matrices. For column-major matrices, swap the loops and use internal_size1() instead.

c++ - ViennaCL: 行列ベクトル積が失敗する

1 に答える 1

Related

Reference