8

I need to improve the throughput of the system.

The usual cycle of optimization has been done and we have already achieved 1.5X better throughput.

I am now beginning to wonder if I can utilize the cachegrind output to improve the system's throughput.

Can somebody point me to how to begin on this?

What I understand is we need to ensure most frequently used data should be kept small enough so that it remains in L1 cache and the next set of data should fit in the L2.

Is this the right direction I am taking?

4

4 に答える 4

6

cachegrindの出力自体は、コードの最適化を行う方法についてあまり多くの情報を提供しないのは事実です。それをどのように解釈するかを知る必要があり、L1とL2に適合するデータについてあなたが言っていることは確かに正しい方向です。

メモリアクセスパターンがパフォーマンスにどのように影響するかを完全に理解するには、GNUlibcのメンテナであるUlrichDrepperによる優れた論文「すべてのプログラマーがメモリについて知っておくべきこと」を読むことをお勧めします。

于 2009-11-12T19:20:04.583 に答える
3

If you're having trouble parsing the cachegrind output, look into KCacheGrind (it should be available in your distro of choice). I use it and find it quite helpful.

于 2009-11-12T17:59:15.247 に答える
2

According to the Cachegrind documentation, the details given to you by cachegrind are the number of cache misses for a given part of your code. You need to know about how caches work on the architecture you are targetting so that you know how to fix the code. In practice this means making data smaller or changing the access pattern of some data so that cached data is still in the cache. However you need to understand your program's data and data access before you can act on the information. As it says in the manual,

In short, Cachegrind can tell you where some of the bottlenecks in your code are, but it can't tell you how to fix them. You have to work that out for yourself. But at least you have the information!

于 2009-11-12T17:57:29.257 に答える
2

1.5x is a nice speedup. It means you found something that took 33% of the time that you could get rid of. I bet you can do more, even before you get down to low-level issues like data memory cache. This is an example of how. Basically, you could have additional performance problems (and opportunities for speedup) that were not large before, like 25% say. Well, with the 1.5x speedup, that 25% is now 37.5%, so it is "worth more" than it was. Often such a problem is in the form of some mid-stack function call that is requesting work that, once you know how much it costs, you may decide isn't completely necessary. Since kcachegrind does not really pinpoint these, you may not realize it is a problem.

于 2009-11-19T14:18:01.410 に答える