c++ - C++ での参照速度

Question

私はプロジェクトに取り組んでおり、実行時間の大幅な低下の原因を見つけようとしており、ロジックから最適化することができた単一のメソッドに絞り込みました。問題は、私の解決策には、コードの別のセクションの実行が非常に遅くなる参照の使用が含まれていることです...私が答えたい質問は、マップが参照である場合とは対照的に、内部ループの評価に時間がかかる理由ですローカル変数？

最適化前の古い方法は次のとおりです。

// old method: create an empty map, populate it
// and then assign it back to the path object later
map<int,float> screenline_usage; 

for (int i=0; i<numCandidates; ++i)
{
  // timing starts here. 
  map<int, float>& my_screenline_usage =
    path->get_combined_screenline_usage(legnr, stop_id);
  map<int, float>::iterator it = my_screenline_usage.begin(); 
  for (; it != my_screenline_usage.end(); ++it) 
    screenline_usage[it->first] += usage * it->second; 
  // timing ends here, this block evaluated 4 million times for overall execution time of ~12 seconds
}

// This function call is evaluated 400k times for an overall execution time of ~126 seconds
path->set_zone_screenline_usage(access_mode, zone_id, screenline_usage); 

// TOTAL EXECUTION TIME: 138 seconds.

最適化後の新しい方法:

// new method: get a reference to internal path mapping and populate it
map<int, float>& screenline_usage =
  path->get_zone_screenline_usage(access_mode, zone_id);
screenline_usage.clear();

for (int i=0; i<numCandidates; ++i)
{
  // timing starts here
  map<int, float>& my_screenline_usage =
    path->get_combined_screenline_usage(legnr, stop_id);
  map<int, float>::iterator it = my_screenline_usage.begin(); 
  for (; it != my_screenline_usage.end(); ++it) 
    screenline_usage[it->first] += usage * it->second; 
  // timing ends here, this block evaluated 4 million times for overall execution time of ~76 seconds
}

// New method... no need to assign back to path object (0 seconds execution :)
// TOTAL EXECUTION TIME: 76 seconds (62 second time saving) ... but should be able to do it in just 12 seconds if the use of reference didn't add so much time :(

そのコードから呼び出される関連するサブルーチンは次のとおりです。

// This is the really slow routine, due to the copy assignment used. 
void set_zone_screenline_usage(int access_mode, int zone_id,
  map<int,float>& screenline_usage)
{
  m_container[access_mode][zone_id] = screenline_usage; 
}

map<int,float>& get_zone_screenline_usage(int access_mode, int zone_id)
{
  return m_container[access_mode][zone_id]; 
}

注: タイミング情報は、上記のコードが約 40 万回評価される 1 回の実行に関するものです。タイミングは、RDTSC タイムスタンプカウンターにアクセスするために作成したいくつかのクラスを使用して行われます (はい、TSC はタイムスタンプカウンターを意味します)。numCandidates の平均値は 10 で、screenline_usage マップに配置される要素の平均数は 25 です。

更新: まず、ここに関わってくれたすべての人に感謝します。最終的に、これは c++ 参照とはまったく関係がなく、キャッシュの一貫性に関係していると思います。上記の最適化されたコードを vector& とメンバー変数マップとして実装されたハッシュ関数に置き換えました

// newest method: get a reference to internal path mapping (as vector) and populate it 
// map<int,int> m_linkNum_to_SlNum declared in header and populated in constructor. 
vector<float>& screenline_usage = path->get_zone_screenline_usage(access_mode, zone_id);

for (int i=0; i<numCandidates; ++i)
{
  // timing starts here
  map<int, float>& my_screenline_usage =
    path->get_combined_screenline_usage(legnr, stop_id);
  map<int, float>::iterator it = my_screenline_usage.begin(); 
  for (; it != my_screenline_usage.end(); ++it) 
    screenline_usage[m_linkNum_to_SlNum[it->first]] += usage * it->second; 
  // timing ends here, this block evaluated 4 million times for overall execution time of ~9 seconds
}

// Newest method... again no need to assign back to path object (0 seconds execution :)
// TOTAL EXECUTION TIME: just 9 seconds (129 second time saving) ... this is even better than using a locally constructed map which took 12 seconds in the inner loop :)

ここでは、ベクトルがローカルではなく連続したメモリブロックであり、ハッシュ関数 (m_linkNum_to_SlNum) がローカルメンバー変数であることを考えると、このアプローチはキャッシュに収まるコード/データにつながるように思えます。データを得るためにメインメモリにアクセスする必要がないため、速度が大幅に向上します。これらの調査結果を踏まえた他の結論は大歓迎です。

score 4 · Accepted Answer

おそらく、C++ コンパイラはローカルマップの一部のコードをインライン化できますが、マップが参照の場合はインライン化できません。

score 3 · Accepted Answer

あなたの質問は不明確です。値で渡すよりも参照でマップを渡す方が速いのはなぜですか？もしそうなら、答えは簡単です。値でマップを返すことは、マップ全体をコピーすることを意味し、大きなマップをコピーすることは非常にコストのかかる操作です。

一方、質問が次の場合：新しいマップを作成するよりも、既存のマップへの参照を取得してデータを入力する方が速いのはなぜですか。

 screenline_usage[it->first] += usage * it->second;

キー[it->first]はpath->get_zone_screenline_usage内のマップにすでに存在するため、これは単なる更新操作であり、メモリ割り当ては必要ありません。ただし、screenline_usageが空のマップである場合、新しい[it-> first]にアクセスするたびに、最初にヒープから新しいマップノードを割り当てる必要があり、コストがかかります。

score 2 · Accepted Answer

私があなたの質問を正しく理解していれば、参照によって取得する既存のヒープに割り当てられた map<> に挿入するよりも、ローカルのスタックに割り当てられた map<> に挿入する方がはるかに高速であることを意味します。

パフォーマンスに影響を与える可能性があるいくつかの可能性があります。ただし、C++ 参照のパフォーマンスと関係があるとは思えません。

最初の可能性はscreenline_usage、参照に変更することで、既存のオブジェクトへのポインターを「本質的に」取得していることです。オブジェクトの実際のインスタンスはではない可能性がありmap<int,float>、マップから継承するものによって可能性があります。たとえば、カスタムコンパレータ関数が定義されたマップである可能性があります。または、スレッド同期ロジックを持つサブクラス。への呼び出しから取得している型がわからないためm_container[access_mode][zone_id]、挿入時にうまく機能しないサブクラスを取得している可能性があります。(ちなみに、デバッガーで返される型を確認できます。) 一方、空の型を作成するmap<int,float>ことで、型がサブクラスではなく実際の map<> であることが保証されます。

実際の map<> インスタンスを取得していると仮定しましょう。

2 番目の可能性は、使用している STL の特定のフレーバーで、map::clear()関数が連想インデックスを維持するために使用される内部データ構造を効率的に消去しないことです。stl::map<> は複雑な内部ハッシュとバケット構造を使用して、効率的な挿入および取得機能を提供します。ただし、clear() を呼び出したときに何が起こるかは不明です。screenline_usage.clear()そのため、空のマップから開始した場合よりも、挿入するマップの効率が低下する可能性があります。

最後に、私が間違っていると仮定しましょう。それはこれらの可能性のどちらでもありません。違いが参照変数を使用した結果であるかどうかを判断する簡単な方法があります。次のように、ヒープに新しいマップを割り当てて、コード内の参照に割り当てることができます。

map<int, float>& screenline_usage = new map<int,float>();

これは、既存のマップと新しいマップへの挿入に違いがあるかどうか、または実際に screenline_usage がパフォーマンスに影響を与えている参照であるという事実があるかどうかを判断するのに役立ちます。ところで、このヒープに割り当てられたマップを解放することを忘れないでください。そうしないと、メモリリークが発生します。

score 0 · Accepted Answer

私が理解しようとしている問題は、マップが参照であるときに内部ループの評価に時間がかかるのはなぜですか?

時間がかかっているのはget_zone_screenline_usage: の呼び出しであり、時間がかかる理由は、特定の要素がまだ存在していm_containerないため、返される前に作成して挿入する必要があるためだと思います。

score 0 · Accepted Answer

私の更新によると、これは c++ 参照の問題ではなく、キャッシュの一貫性の問題である可能性が最も高いと思います。

c++ - C++ での参照速度

6 に答える 6

Related

Reference