c++ - Performance penalty for large C++ dll's with autogenerated C code

Question

I am working on a piece of software that needs to call a family of optimisation solvers. Each solver is an auto-generated piece of C code, with thousands of lines of code. I am using 200 of these solvers, differing only in the size of optimisation problem to be solved.

All-in-all, these auto-generated solvers come to about 180MB of C code, which I compile to C++ using the extern "C"{ /*200 solvers' headers*/ } syntax, in Visual Studio 2008. Compiling all of this is very slow (with the "maximum speed /O2" optimisation flag, it takes about 8hours). For this reason I thought it would be a good idea to compile the solvers into a single DLL, which I can then call from a separate piece of software (which would have a reasonable compile time, and allow me to abstract away all this extern "C" stuff from higher-level code). The compiled DLL is then about 37MB.

The problem is that when executing one of these solvers using the DLL, execution requires about 30ms. If I were to compile only that single one solvers into a DLL, and call that from the same program, execution is about 100x faster (<1ms). Why is this? Can I get around it?

The DLL looks as below. Each solver uses the same structures (i.e. they have the same member variables), but they have different names, hence all the type casting.

extern "C"{
#include "../Generated/include/optim_001.h"
#include "../Generated/include/optim_002.h"
/*etc.*/
#include "../Generated/include/optim_200.h"
}

namespace InterceptionTrajectorySolver
{

__declspec(dllexport) InterceptionTrajectoryExitFlag SolveIntercept(unsigned numSteps, InputParams params, double* optimSoln, OutputInfo* infoOut)
{
  int exitFlag;

  switch(numSteps)
  {
  case   1:
    exitFlag = optim_001_solve((optim_001_params*) &params, (optim_001_output*) optimSoln, (optim_001_info*) &infoOut);
    break;
  case   2:
    exitFlag = optim_002_solve((optim_002_params*) &params, (optim_002_output*) optimSoln, (optim_002_info*) &infoOut);
    break;
  /*
    ...
    etc.
    ...
  */
  case   200:
    exitFlag = optim_200_solve((optim_200_params*) &params, (optim_200_output*) optimSoln, (optim_200_info*) &infoOut);
    break;
  }

  return exitFlag;
};

};

score 1 · Accepted Answer

コードが例の各ケース部分にインライン化されているかどうかはわかりません。関数がインライン関数であり、すべてを 1 つの関数内に配置している場合、コードが仮想メモリに配置されるため、コードが実行されるときに CPU の多くのジャンプが必要になるため、非常に遅くなります。すべてがインライン化されていない場合は、おそらくこれらの提案が役立つかもしれません。

あなたのソリューションは、次の方法で改善される可能性があります...

A) 1) プロジェクトを 200 個の個別の dll に分割します。次に、.bat ファイルなどでビルドします。2) 「MyEntryPoint」と呼ばれる各 dll にエクスポート関数を作成し、必要に応じて動的リンクを使用してライブラリをロードします。これは、多数の小さな dll プラグインがロードされた忙しい音楽プログラムに相当します。GetProcAddress を使用して EntryPoint への関数ポインターを取得します。

または...

B) 各ソリューションを個別の .lib ファイルとしてビルドします。これにより、ソリューションごとに非常に迅速にコンパイルされ、それらをすべてリンクできます。すべての関数への関数ポインターの配列を作成し、代わりにルックアップを介して呼び出します。

結果 = SolveInterceptWhichStep;

すべてのライブラリを 1 つの大きなライブラリに結合するのに 8 時間もかからないはずです。そんなに時間がかかる場合は、非常に間違ったことをしています。

と...

コードを別の実際の .cpp ファイルに入れてみてください。おそらく、その特定のコンパイラは、それらがすべて異なるユニットにある場合などにより良い仕事をするでしょう...そして、各ユニットがコンパイルされると、何も変更しなければコンパイルされたままになります。

score 0 · Accepted Answer

コードを生成する理由は、実行時のパフォーマンスと正確性を向上させるためだと思います。私も同じことをします。

この手法を試して、実行時のパフォーマンスの問題が何であるかを調べることをお勧めします。

100:1 のパフォーマンスの違いが見られる場合、つまり、プログラムを中断してプログラムの状態を見るたびに、99% の確率で問題が何であるかがわかります。

ビルド時間に関する限り、モジュール化することは理にかなっています。クレイジーな I/O を行っていることを意味しない限り、実行時間に大きな影響を与えることはありません。

score 0 · Accepted Answer

最初の呼び出しの前にセットアップに大きなオーバーヘッドがある可能性があるため、オプティマイザーへの複数の呼び出しのタイミングを測定して平均化してください。

次に、その 200 ブランチの条件ステートメント (スイッチ) がパフォーマンスにどのような影響を与えているかも確認してください! テストプロジェクトでソルバーを 1 つだけ呼び出し、それらすべてを DLL でリンクして、テスト用のスイッチを削除してみてください。まだパフォーマンスが遅いと思いますか?

c++ - Performance penalty for large C++ dll's with autogenerated C code

3 に答える 3

Related

Reference