19

コードで次の問題が発生しました。Valgrindとgperftoolsを使用して、ヒープチェックとヒーププロファイリングを実行し、割り当てたすべてのメモリを解放するかどうかを確認していました。これらのツールの出力は見栄えがよく、メモリを失っていないようです。ただし、これを見るtopと、出力がps混乱しています。これは、基本的にvalgrindとgperftoolsで観察しているものを表していないためです。

番号は次のとおりです。

  • トップレポート:RES 150M
  • Valgrind(Massif)のレポート:2300万のピーク使用量
  • gperftoolsヒーププロファイラーのレポート:22.7Mのピーク使用量

私の質問は今、違いはどこから来るのですか?Valgrindでスタックの使用状況を追跡しようとしましたが、成功しませんでした。

詳細:

  • このプロセスは基本的に、mysqlからCapiを介してメモリ内ストレージにデータをロードしています。
  • リークチェックを実行し、ロードが完了した直後にブレークすると、144バイトの最終的な損失と、現在割り当てられている量に適合する10Mの到達可能性が示されます。
  • ライブラリは複雑なIPCを実行せず、いくつかのスレッドを開始しますが、作業を実行しているのは1つのスレッドのみです。
  • 他の複雑なシステムライブラリはロードされません
  • / proc / pid / smapsのPSSサイズは、TOPおよびpsのRESサイズに対応します。

報告されたメモリ消費量のこの違いがどこから来ているのか、何かアイデアはありますか?プログラムが正しく動作していることを確認するにはどうすればよいですか?この問題をさらに調査する方法について何かアイデアはありますか?

4

3 に答える 3

20

Finally I was able to solve the problem and will happily share my findings. In general the best tool to evaluate memory consumption of a program from my perspective is the Massif tool from Valgrind. it allows you to profile the heap consumption and gives you a detailed analysis.

To profile the heap of your application run valgrind --tool=massif prog now, this will give you basic access to all information about the typical memory allocation functions like malloc and friends. However, to dig deeper I activated the option --pages-as-heap=yes which will then report even the information about the underlaying system calls. To given an example here is something from my profiling session:

 67  1,284,382,720      978,575,360      978,575,360             0            0
100.00% (978,575,360B) (page allocation syscalls) mmap/mremap/brk, --alloc-fns, etc.
->87.28% (854,118,400B) 0x8282419: mmap (syscall-template.S:82)
| ->84.80% (829,849,600B) 0x821DF7D: _int_malloc (malloc.c:3226)
| | ->84.36% (825,507,840B) 0x821E49F: _int_memalign (malloc.c:5492)
| | | ->84.36% (825,507,840B) 0x8220591: memalign (malloc.c:3880)
| | |   ->84.36% (825,507,840B) 0x82217A7: posix_memalign (malloc.c:6315)
| | |     ->83.37% (815,792,128B) 0x4C74F9B: std::_Rb_tree_node<std::pair<std::string const, unsigned int> >* std::_Rb_tree<std::string, std::pair<std::string const, unsigned int>, std::_Select1st<std::pair<std::string const, unsigned int> >, std::less<std::string>, StrategizedAllocator<std::pair<std::string const, unsigned int>, MemalignStrategy<4096> > >::_M_create_node<std::pair<std::string, unsigned int> >(std::pair<std::string, unsigned int>&&) (MemalignStrategy.h:13)
| | |     | ->83.37% (815,792,128B) 0x4C7529F: OrderIndifferentDictionary<std::string, MemalignStrategy<4096>, StrategizedAllocator>::addValue(std::string) (stl_tree.h:961)
| | |     |   ->83.37% (815,792,128B) 0x5458DC9: var_to_string(char***, unsigned long, unsigned long, AbstractTable*) (AbstractTable.h:341)
| | |     |     ->83.37% (815,792,128B) 0x545A466: MySQLInput::load(std::shared_ptr<AbstractTable>, std::vector<std::vector<ColumnMetadata*, std::allocator<ColumnMetadata*> >*, std::allocator<std::vector<ColumnMetadata*, std::allocator<ColumnMetadata*> >*> > const*, Loader::params const&) (MySQLLoader.cpp:161)
| | |     |       ->83.37% (815,792,128B) 0x54628F2: Loader::load(Loader::params const&) (Loader.cpp:133)
| | |     |         ->83.37% (815,792,128B) 0x4F6B487: MySQLTableLoad::executePlanOperation() (MySQLTableLoad.cpp:60)
| | |     |           ->83.37% (815,792,128B) 0x4F8F8F1: _PlanOperation::execute_throws() (PlanOperation.cpp:221)
| | |     |             ->83.37% (815,792,128B) 0x4F92B08: _PlanOperation::execute() (PlanOperation.cpp:262)
| | |     |               ->83.37% (815,792,128B) 0x4F92F00: _PlanOperation::operator()() (PlanOperation.cpp:204)
| | |     |                 ->83.37% (815,792,128B) 0x656F9B0: TaskQueue::executeTask() (TaskQueue.cpp:88)
| | |     |                   ->83.37% (815,792,128B) 0x7A70AD6: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
| | |     |                     ->83.37% (815,792,128B) 0x6BAEEFA: start_thread (pthread_create.c:304)
| | |     |                       ->83.37% (815,792,128B) 0x8285F4B: clone (clone.S:112)
| | |     |                         
| | |     ->00.99% (9,715,712B) in 1+ places, all below ms_print's threshold (01.00%)
| | |     
| | ->00.44% (4,341,760B) in 1+ places, all below ms_print's threshold (01.00%)

As you can see ~85% of my memory allocation come from a single branch and the question is now why the memory consumption is so high, if the original heap profiling showed a normal consumption. If you look at the example you will see why. For allocation I used posix_memalign to make sure allocations happen to useful boundaries. This allocator was then passed down from the outer class to the inner member variables (a map in this case) to use the allocator for heap allocation. However, the boundary I choose was too large - 4096 - in my case. This means, you will allocate 4b using posix_memalign but the system will allocate a full page for you to align it correctly. If you now allocate many small values you will end up with lots of unused memory. This memory will not be reported by normal heap profiling tools since you allocate only a fraction of this memory, but the system allocation routines will allocate more and hide the rest.

To solve this problem, I switched to a smaller boundary and thus could drastically reduce the memory overhead.

As a conclusion of my hours spent in front of Massif & Co. I can only recommend to use this tool for deep profiling since it gives you a very good understanding of what is happening and allows tracking errors easily. For the use of posix_memalign the situation is different. There are cases where it is really necessary, however, for most cases you will just fine with a normal malloc.

于 2012-11-22T12:29:17.157 に答える
2

According to this article ps/top report how much memory your program uses if it were the only program running. Assuming that your program e.g. uses a bunch of shared libraries such as STL which are already loaded into memory there is a gap between the amount of actual memory that is allocated due to the execution of your program vs how much memory it would allocate if it were the only process.

于 2012-11-21T19:57:40.417 に答える
0

By default, Massif only reports heap size. TOP reports actual size in memory, including the size used by the program code itself as well as the stack size.

Try supplying Massif with the --stacks=yes option, telling it to report total memory usage, including stack space and see if this changes the picture?

于 2012-11-21T10:11:28.430 に答える