3

二次計画法を検討していて、さまざまなライブラリを見てきました。QuadProg ++のさまざまなEigenバリアント(KDEフォーラムBenjamin StephensStackOverflowの投稿)を見てきました。テストと同じように、GitHubで入手可能なwingsitのEigenバリアントをフォークして、テンプレートを介してパフォーマンスを測定するためのコンパイル時サイズの問題を実装しました。

wingsitのコードからの動的サイズ(MatrixXD / VectorXD)の場合よりも、テンプレートの場合の方がパフォーマンスが悪いことがわかりました。これは簡単な質問ではないことは知っていますが、これがなぜそうなるのか、誰かが理由を知ることができますか?

注:問題のサイズ/反復回数を増やす必要があります。可能な場合はそれを投稿します。

編集:Ubuntu12.04でGCC4.6.3を使用しています。これらは私が使用しているフラグです(wingsitのコードから変更されています):

CFLAGS  = -O4 -Wall -msse2 -fopenmp      # option for obj
LFLAGS  = -O4 -Wall -msse2 -fopenmp      # option for exe (-lefence ...)
4

1 に答える 1

3

The benefits of static-sized code generally diminishes as the sizes grow bigger. The typical benefits of static-sized code mainly include (but not limited to) the following:

  • stack-based allocations which are faster than heap allocations. However, at large sizes, stack-based allocations are no longer feasible (stack-overflow), or even beneficial from a point of view of pre-fetching and locality of reference.
  • loop unrolling when the compiler sees a small static-sized loop, it can just unroll it, and possibly use SSE instructions. This doesn't apply at larger sizes.

In other words, for small sizes (up to maybe N=12 or so), static-sized code can be much better and faster than the equivalent dynamically-sized code, as long as the compiler is fairly aggressive about in-lining and loop unrolling. But, when sizes are larger, there is no point.

Additionally, there are a number of drawbacks about static-sized code too:

  • No efficient move-semantics / swapping / copy-on-write strategies since such code is usually implemented with static arrays (in order to have the benefits mentioned above) which cannot be simply swapped (as in, swapping the internal pointer).
  • Larger executables which contain functions that are spread out. Say you have one function template to multiply two matrices, and so, a new compiled function (inlined or not) is generated for each size combination. Then, you have some algorithm that does a lot of matrix multiplications, well, each multiplication will have to jump to (or execute inline) the specialization for that size combination. At the end, you end up with a lot more code that needs to be cached, and then, cache misses become a lot more frequent, and that will completely destroy the performance of your algorithm.

So, the lesson to draw from that is to use static-sized vectors and matrices only for small things like 2 to 6 dimensional vectors or matrices. But beyond that, it's preferable to go with dynamically-sized code (or maybe try static-sized code, but verify that it performs better, because it is not guaranteed to do so). So, I would advise you to reconsider your idea of using static-sized code for larger problems.

于 2012-11-15T17:51:52.227 に答える