c++ - C++11 タプルのパフォーマンス

Question

std::tuple単一の要素を含む多くの場合に使用することで、コードをより一般化しようとしています。たとえば、tuple<double>代わりにdouble. しかし、この特定のケースのパフォーマンスを確認することにしました。

簡単なパフォーマンスベンチマークテストを次に示します。

#include <tuple>
#include <iostream>

using std::cout;
using std::endl;
using std::get;
using std::tuple;

int main(void)
{

#ifdef TUPLE
    using double_t = std::tuple<double>;
#else
    using double_t = double;
#endif

    constexpr int count = 1e9;
    auto array = new double_t[count];

    long long sum = 0;
    for (int idx = 0; idx < count; ++idx) {
#ifdef TUPLE
        sum += get<0>(array[idx]);
#else
        sum += array[idx];
#endif
    }
    delete[] array;
    cout << sum << endl; // just "external" side effect for variable sum.
}

実行結果:

$ g++ -DTUPLE -O2 -std=c++11 test.cpp && time ./a.out
0  

real    0m3.347s
user    0m2.839s
sys     0m0.485s

$ g++  -O2 -std=c++11 test.cpp && time ./a.out
0  

real    0m2.963s
user    0m2.424s
sys     0m0.519s

タプルは厳密に静的にコンパイルされたテンプレートであり、その場合、すべての get<> 関数は通常の変数アクセスだけで機能していると思いました。ところで、このテストのメモリ割り当てサイズは同じです。この実行時間の差はなぜ起こるのでしょうか?

編集: tuple<> オブジェクトの初期化に問題がありました。テストをより正確にするには、1 行を変更する必要があります。

     constexpr int count = 1e9;
-    auto array = new double_t[count];
+    auto array = new double_t[count]();

     long long sum = 0;

その後、同様の結果を観察できます。

$ g++ -DTUPLE -g -O2 -std=c++11 test.cpp && (for i in $(seq 3); do time ./a.out; done) 2>&1 | grep real
real    0m3.342s
real    0m3.339s
real    0m3.343s

$ g++ -g -O2 -std=c++11 test.cpp && (for i in $(seq 3); do time ./a.out; done) 2>&1 | grep real
real    0m3.349s
real    0m3.339s
real    0m3.334s

score 14 · Accepted Answer

タプルのすべてのデフォルトコンストラクト値 (つまり、すべてが 0) の double は、デフォルトで初期化されません。

生成されたアセンブリでは、タプルを使用する場合にのみ、次の初期化ループが存在します。それ以外の場合は同等です。

.L2:
    movq    $0, (%rdx)
    addq    $8, %rdx
    cmpq    %rcx, %rdx
    jne .L2

c++ - C++11 タプルのパフォーマンス

1 に答える 1

Related

Reference