performance - 同時性をテストするためのベンチマークの問題

Question

私が現在行っているプロジェクトの1つでは、さまざまな並行対応プログラミング言語のパフォーマンス（とりわけ）を調べる必要があります。

現時点では、スタックレスpythonとC ++ PThreadsの比較を検討しているため、これら2つの言語に焦点を当てていますが、他の言語はおそらく後でテストされる予定です。もちろん、比較は可能な限り代表的で正確でなければならないので、私の最初の考えは、いくつかの標準的な並行/マルチスレッドベンチマーク問題を探し始めることでした。残念ながら、まともなまたは標準のテスト/問題/ベンチマークは見つかりませんでした。

だから私の質問は次のとおりです：プログラミング言語のパフォーマンスをテストする（そしてプロセスの長所と短所を明らかにする）ための良い、簡単な、または迅速な問題についての提案がありますか？

score 3 · Accepted Answer

確かに、並行性のパフォーマンスについては、言語ではなくハードウェアとコンパイラをテストする必要がありますか？

私は、並行性の観点から言語がどれほど簡単で生産的であるか、そしてそれがプログラマーをロックミスからどれだけ「隔離」するかという観点から言語を見ているでしょう。

編集：並列アルゴリズムを設計する研究者としての過去の経験から、ほとんどの場合、同時パフォーマンスは、アルゴリズムがどのように並列化され、基盤となるハードウェアをどのようにターゲットにするかに大きく依存することがわかると思います。

また、ベンチマークは悪名高いほど等しくありません。これは、並列環境ではさらに当てはまります。たとえば、非常に大きな行列を「クランチ」するベンチマークは、ベクトルパイプラインプロセッサに適していますが、並列ソートは、より汎用的なマルチコアCPUに適している場合があります。

これらは役に立つかもしれません：

並列ベンチマーク

NASパラレルベンチマーク

score 1 · Accepted Answer

まあ、いくつかの古典がありますが、異なるテストは異なる機能を強調します。一部の分散システムは、より堅牢で、より効率的なメッセージパッシングなどを備えている場合があります。より多くのマシンにスケールアップする通常の方法は、より多くの小さなメッセージを送信することであるため、メッセージオーバーヘッドが高くなるとスケーラビリティが低下する可能性があります。あなたが試すことができるいくつかの古典的な問題は、分散したエラトステネスのふるいまたは不十分に実装されたフィボナッチ数列計算機です（つまり、シリーズの8番目の数、7番目のマシンのスピン、6番目のマシンのスピンを計算します）。ほとんどすべての分割統治アルゴリズムを同時に実行できます。また、コンウェイのライフゲームまたは熱伝達の同時実装を行うこともできます。

すばやく実装するのが最も簡単なのは、実装が不十分なフィボナッチ計算機ですが、スレッドの作成に重点が置かれすぎて、それらのスレッド間の通信にあまり重点が置かれていません。

score 0 · Accepted Answer

同時実行パフォーマンスの言語ではなく、ハードウェアとコンパイラをテストする必要がありますか?

いいえ、ハードウェアとコンパイラは私のテスト目的には関係ありません。ある言語で書かれたコードが別の言語のコードとどれだけうまく競合できるかをテストできる、いくつかの優れた問題を探しています。並行プログラミングを行うために、特定の言語で利用可能な構造を実際にテストしています。基準の 1 つはパフォーマンス (時間で測定) です。

私が探している他のテスト基準のいくつかは次のとおりです。

正しいコードを書くのはどれほど簡単か。並列プログラミングはシングルスレッドプログラムを記述するよりも難しいことは誰もが知っていることです。
並行プログラミングに使用される手法は何ですか: イベント駆動型、アクターベース、メッセージ解析など...
プログラマー自身が書く必要のあるコードの量と、プログラマーのために自動的に行われるコードの量: これは、指定されたベンチマーク問題でテストすることもできます。
抽象化のレベルと、機械語コードに変換する際のオーバーヘッドの量

したがって、実際には、唯一かつ最良のパラメーターとしてパフォーマンスを探しているわけではありません(実際には、言語自体ではなくハードウェアとコンパイラーに送られます)。どのような問題に最も適しているか、その弱点と強みは何かなど...

これは単なる小さなプロジェクトであり、テストも小規模に保つ必要があることに注意してください。（したがって、すべてを厳密にテストすることは現実的ではありません）

score 0 · Accepted Answer

以下に、pthreadのマルチスレッドパフォーマンスをテストするために一緒にハッキングしたコードを示します。私はそれをクリーンアップしておらず、最適化も行っていません。そのため、コードは少し生です。

計算されたマンデルブロー集合をビットマップとして保存するコードは私のものではありません。ここで見つけることができます

#include <cstdlib> //for atoi
#include <iostream>
#include <iomanip> //for setw and setfill
#include <vector>

#include "bitmap_Image.h" //for saving the mandelbrot as a bmp

#include <pthread.h>

pthread_mutex_t mutexCounter;
int sharedCounter(0);
int percent(0);

int horizPixels(0);
int vertPixels(0);
int maxiter(0);

//doesn't need to be locked
std::vector<std::vector<int> > result; //create 2 dimensional vector

void *DoThread(void *null) {
    double curX,curY,xSquare,ySquare,x,y;
    int i, intx, inty, counter;
    counter = 0;

    do {
        counter++;
        pthread_mutex_lock (&mutexCounter); //lock
            intx = int((sharedCounter / vertPixels) + 0.5);
            inty = sharedCounter % vertPixels;
            sharedCounter++;
        pthread_mutex_unlock (&mutexCounter); //unlock

        //exit thread when finished
        if (intx >= horizPixels) {
            std::cout << "exited thread - I did " << counter << " calculations" << std::endl;
            pthread_exit((void*) 0);
        }

        //set x and y to the correct value now -> in the range like singlethread
        x = (3.0 / horizPixels) * (intx - (horizPixels / 1.5));
        y = (3.0 / vertPixels) * (inty - (vertPixels / 2));

        curX = x + x*x - y*y;
        curY = y + x*y + x*y;
        ySquare = curY*curY;
        xSquare = curX*curX;

        for (i=0; i<maxiter && ySquare + xSquare < 4;i++){
          ySquare = curY*curY;
          xSquare = curX*curX;
          curY = y + curX*curY + curX*curY;
          curX = x - ySquare + xSquare;
        }
        result[intx][inty] = i;
     } while (true);
}

int DoSingleThread(const double x, const double y) {
    double curX,curY,xSquare,ySquare;
    int i;

    curX = x + x*x - y*y;
    curY = y + x*y + x*y;
    ySquare = curY*curY;
    xSquare = curX*curX;

    for (i=0; i<maxiter && ySquare + xSquare < 4;i++){
      ySquare = curY*curY;
      xSquare = curX*curX;
      curY = y + curX*curY + curX*curY;
      curX = x - ySquare + xSquare;
    }
    return i;

}

void SingleThreaded(std::vector<std::vector<int> >&  result) {
    for(int x = horizPixels - 1; x != -1; x--) {
        for(int y = vertPixels - 1; y != -1; y--) {
            //3.0 -> so we always have -1.5 -> 1.5 as the window; (x - (horizPixels / 2) will go from -horizPixels/2 to +horizPixels/2
            result[x][y] = DoSingleThread((3.0 / horizPixels) * (x - (horizPixels / 1.5)),(3.0 / vertPixels) * (y - (vertPixels / 2)));
        }
    }
}

void MultiThreaded(int threadCount, std::vector<std::vector<int> >&  result) {
    /* Initialize and set thread detached attribute */
    pthread_t thread[threadCount];
    pthread_attr_t attr;
    pthread_attr_init(&attr);
    pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);


    for (int i = 0; i < threadCount - 1; i++) {
        pthread_create(&thread[i], &attr, DoThread, NULL);
    }
    std::cout << "all threads created" << std::endl;

    for(int i = 0; i < threadCount - 1; i++) {
        pthread_join(thread[i], NULL);
    }
    std::cout << "all threads joined" << std::endl;
}

int main(int argc, char* argv[]) {
    //first arg = length along horizontal axis
    horizPixels = atoi(argv[1]);

    //second arg = length along vertical axis
    vertPixels = atoi(argv[2]);

    //third arg = iterations
    maxiter = atoi(argv[3]);

    //fourth arg = threads
    int threadCount = atoi(argv[4]);

    result = std::vector<std::vector<int> >(horizPixels, std::vector<int>(vertPixels,21)); // init 2-dimensional vector
    if (threadCount <= 1) {
        SingleThreaded(result);
    } else {
        MultiThreaded(threadCount, result);
    }


    //TODO: remove these lines
    bitmapImage image(horizPixels, vertPixels);
    for(int y = 0; y < vertPixels; y++) {
      for(int x = 0; x < horizPixels; x++) {
            image.setPixelRGB(x,y,16777216*result[x][y]/maxiter % 256, 65536*result[x][y]/maxiter % 256, 256*result[x][y]/maxiter % 256);
            //std::cout << std::setw(2) << std::setfill('0') << std::hex << result[x][y] << " ";
        }
        std::cout << std::endl;
    }

    image.saveToBitmapFile("~/Desktop/test.bmp",32);
}

次の引数を指定してプログラムを使用すると、良い結果が得られます。

マンデルブロ 5120 3840 256 3

そうすれば、幅が 5 * 1024 の画像が得られます。高さ 5 * 768、256 色 (残念ながら 1 つまたは 2 つしか取得できません) および 3 つのスレッド (1 つのメインスレッドはワーカースレッドの作成以外は何も実行せず、2 つのワーカースレッド)

score 0 · Accepted Answer

マンデルブロー集合(より正確にはエスケープ時間アルゴリズム) を使用して、さまざまな言語のベンチマークを行うことにしました。
元のアルゴリズムは簡単に実装でき、そこからマルチスレッドのバリアントを作成するのはそれほど手間がかからないため、私には非常に適しています。

以下は私が現在持っているコードです。これはまだシングルスレッドのバリアントですが、結果に満足したらすぐに更新します。

#include <cstdlib> //for atoi
#include <iostream>
#include <iomanip> //for setw and setfill
#include <vector>


int DoThread(const double x, const double y, int maxiter) {
    double curX,curY,xSquare,ySquare;
    int i;

    curX = x + x*x - y*y;
    curY = y + x*y + x*y;
    ySquare = curY*curY;
    xSquare = curX*curX;

    for (i=0; i<maxiter && ySquare + xSquare < 4;i++) {
      ySquare = curY*curY;
      xSquare = curX*curX;
      curY = y + curX*curY + curX*curY;
      curX = x - ySquare + xSquare;
    }
    return i;
}

void SingleThreaded(int horizPixels, int vertPixels, int maxiter, std::vector<std::vector<int> >&  result) {
    for(int x = horizPixels; x > 0; x--) {
        for(int y = vertPixels; y > 0; y--) {
            //3.0 -> so we always have -1.5 -> 1.5 as the window; (x - (horizPixels / 2) will go from -horizPixels/2 to +horizPixels/2
            result[x-1][y-1] = DoThread((3.0 / horizPixels) * (x - (horizPixels / 2)),(3.0 / vertPixels) * (y - (vertPixels / 2)),maxiter);
        }
    }
}

int main(int argc, char* argv[]) {
    //first arg = length along horizontal axis
    int horizPixels = atoi(argv[1]);

    //second arg = length along vertical axis
    int vertPixels = atoi(argv[2]);

    //third arg = iterations
    int maxiter = atoi(argv[3]);

    //fourth arg = threads
    int threadCount = atoi(argv[4]);

    std::vector<std::vector<int> > result(horizPixels, std::vector<int>(vertPixels,0)); //create and init 2-dimensional vector
    SingleThreaded(horizPixels, vertPixels, maxiter, result);

    //TODO: remove these lines
    for(int y = 0; y < vertPixels; y++) {
      for(int x = 0; x < horizPixels; x++) {
            std::cout << std::setw(2) << std::setfill('0') << std::hex << result[x][y] << " ";
        }
        std::cout << std::endl;
    }
}

Linux で gcc を使用してテストしましたが、他のコンパイラやオペレーティングシステムでも動作することは確かです。動作させるには、次のようにいくつかのコマンドライン引数を入力する必要があります。

マンデルブロ 106 500 255 1

最初の引数は幅 (x 軸)
です。2 番目の引数は高さ (y 軸)
です。3 番目の引数は最大反復回数 (色の数)
です。現在は使用されていません）

私の決意では、上記の例は、マンデルブロ集合の素敵な ASCII アート表現を与えてくれます。しかし、さまざまな引数で自分で試してみてください（幅になるため、最初の引数が最も重要になります）

score -1 · Accepted Answer

ベンチマークゲームが 2008 年 9 月にクアッドコアマシンに移行して以来、さまざまなプログラミング言語の多くのプログラムが、クアッドコアを利用するように書き直されました。たとえば、最初の 10 個のマンデルブロプログラムです。

performance - 同時性をテストするためのベンチマークの問題

6 に答える 6

Related

Reference