2

ここで説明したように、分散環境でコレスキー分解を介して行列を反転しています。私のコードは正常に動作しますが、分散プロジェクトが正しい結果を生成することをテストするために、シリアル バージョンと比較する必要がありました。結果はまったく同じではありません!

たとえば、結果マトリックスの最後の 5 つのセルは次のようになります。

serial gives:
-250207683.634793 -1353198687.861288 2816966067.598196 -144344843844.616425 323890119928.788757
distributed gives:
-250207683.634692 -1353198687.861386 2816966067.598891 -144344843844.617096 323890119928.788757

それについてIntel フォーラムに投稿しましたが、私が得た答えは、分散バージョンで行うすべての実行で同じ結果が得られるというものでした。彼らは(別のスレッドで)これに応答できないようです:

シリアル実行と分散実行の間で同じ結果を得る方法は? これは可能ですか?これにより、算術エラーが修正されます。

this:mkl_cbwr_set(MKL_CBWR_AVX);を設定して、メモリを整列させるために , を使用しようとしmkl_malloc()ましたが、何も変わりませんでした。同じ結果が得られますが、分散バージョンのプロセスを 1 つ生成した場合のみです (これにより、ほぼシリアルになります)。

私が呼び出している分散ルーチン: pdpotrf()およびpdpotri()

私が呼び出しているシリアル ルーチン: dpotrf()およびdpotri()

4

2 に答える 2

2

As the other answer mentions getting the exact same results between serial and distributed is not guaranteed. One common technique with HPC/distributed workloads is to validate the solution. There are a number of techniques from calculating percent error to more complex validation schemes, like the one used by the HPL. Here is a simple C++ function that calculates percent error. As @HighPerformanceMark notes in his post the analysis of this sort of numerical error is incredibly complex; this is a very simple method, and there is a lot of info available online about the topic.

#include <iostream>
#include <cmath>

double calc_error(double a,double x)
{
  return std::abs(x-a)/std::abs(a);
}
int main(void)
{
  double sans[]={-250207683.634793,-1353198687.861288,2816966067.598196,-144344843844.616425, 323890119928.788757};
  double pans[]={-250207683.634692, -1353198687.861386, 2816966067.598891, -144344843844.617096, 323890119928.788757};
  double err[5];
  std::cout<<"Serial Answer,Distributed Answer, Error"<<std::endl;
  for (int it=0; it<5; it++) {
    err[it]=calc_error(sans[it], pans[it]);
    std::cout<<sans[it]<<","<<pans[it]<<","<<err[it]<<"\n";
  }
return 0;
}

Which produces this output:

Serial Answer,Distributed Answer, Error
-2.50208e+08,-2.50208e+08,4.03665e-13
-1.3532e+09,-1.3532e+09,7.24136e-14
2.81697e+09,2.81697e+09,2.46631e-13
-1.44345e+11,-1.44345e+11,4.65127e-15
3.2389e+11,3.2389e+11,0

As you can see the order of magnitude of the error in every case is on the order of 10^-13 or less and in one case non-existent. Depending on the problem you are trying to solve error on this order of magnitude could be considered acceptable. Hopefully this helps to illustrate one way of validating a distributed solution against a serial one, or at least gives one way to show how far apart the parallel and serial algorithm are.

When validating answers for big problems and parallel algorithms it can also be valuable to perform several runs of the parallel algorithm, saving the results of each run. You can then look to see if the result and/or error varies with the parallel algorithm run or if it settles over time.

Showing that a parallel algorithm produces error within acceptable thresholds over 1000 runs(just an example, the more data the better for this sort of thing) for various problem sizes is one way to assess the validity of a result.

In the past when I have performed benchmark testing I have noticed wildly varying behavior for the first several runs before the servers have "warmed up". At the time I never bother to check to see if error in the result stabilized over time the same way performance did, but it would be interesting to see.

于 2015-08-18T15:39:34.597 に答える