I have a multithreaded program (Implemented in C using Pthreads on Linux platform) that runs on a multicore machine. I am using ValGrind with --memcheck option to find some memory issues that I have in my code. But it hangs. To give a complete overview of the problem, here is the background.
The code has some sequential part at the start as part of initialization and later it creates 8 threads (using Pthread API) and rungs to completion. My code dumps "core" after sometime. I used GDB, it gives the following trace.
======= Backtrace: =========
/lib/tls/i686/cmov/libc.so.6[0xb7cd47cd]
/lib/tls/i686/cmov/libc.so.6(cfree+0x90)[0xb7cd7e30]
/home/kumar/CycleSim/slack_cp/sim-outorder[0x819a6c9]
/home/kumar/CycleSim/slack_cp/sim-outorder[0x8167e3e]
/home/kumar/CycleSim/slack_cp/sim-outorder[0x804f5e4]
/lib/tls/i686/cmov/libpthread.so.0[0xb7f8c31b]
/lib/tls/i686/cmov/libc.so.6(clone+0x5e)[0xb7d3c57e]
======= Memory map: ========
08048000-081b5000 r-xp 00000000 08:11 11813248
/home/kumar/CycleSim/slack_cp/sim-outorder
081b5000-081b8000 rw-p 0016c000 08:11 11813248
/home/kumar/CycleSim/slack_cp/sim-outorder
081b8000-08549000 rw-p 081b8000 00:00 0 [heap]
ab9fd000-ab9fe000 ---p ab9fd000 00:00 0
ab9fe000-ac1fe000 rw-p ab9fe000 00:00 0
ac1fe000-ac1ff000 ---p ac1fe000 00:00 0
ac1ff000-ac9ff000 rw-p ac1ff000 00:00 0
ac9ff000-aca00000 ---p ac9ff000 00:00 0
aca00000-ad2cb000 rw-p aca00000 00:00 0
ad2cb000-ad300000 ---p ad2cb000 00:00 0
ad3bf000-ad3c0000 ---p ad3bf000 00:00 0
ad3c0000-adbc0000 rw-p ad3c0000 00:00 0
adbc0000-adbc1000 ---p adbc0000 00:00 0
adbc1000-ae3c1000 rw-p adbc1000 00:00 0
ae3c1000-ae3c2000 ---p ae3c1000 00:00 0
ae3c2000-aebc2000 rw-p ae3c2000 00:00 0
aebc2000-aebc3000 ---p aebc2000 00:00 0
aebc3000-b2e7d000 rw-p aebc3000 00:00 0
b2e7d000-b2e7e000 ---p b2e7d000 00:00 0
b2e7e000-b367e000 rw-p b2e7e000 00:00 0
b367e000-b367f000 ---p b367e000 00:00 0
b367f000-b7c6d000 rw-p b367f000 00:00 0
b7c6d000-b7da8000 r-xp 00000000 08:01 12895490 /lib/tls/i686/cmov/libc-2.5.so
b7da8000-b7da9000 r--p 0013b000 08:01 12895490 /lib/tls/i686/cmov/libc-2.5.so
b7da9000-b7dab000 rw-p 0013c000 08:01 12895490 /lib/tls/i686/cmov/libc-2.5.so
b7dab000-b7dae000 rw-p b7dab000 00:00 0
b7dae000-b7dde000 r-xp 00000000 08:21 3828021 /usr/lib/libgslcblas.so.0.0.0
b7dde000-b7ddf000 rw-p 0002f000 08:21 3828021 /usr/lib/libgslcblas.so.0.0.0
b7ddf000-b7f7d000 r-xp 00000000 08:21 3828022 /usr/lib/libgsl.so.0.9.0
b7f7d000-b7f87000 rw-p 0019d000 08:21 3828022 /usr/lib/libgsl.so.0.9.0
b7f87000-b7f9a000 r-xp 00000000 08:01 12895516
/lib/tls/i686/cmov/libpthread-2.5.so
b7f9a000-b7f9c000 rw-p 00013000 08:01 12895516
/lib/tls/i686/cmov/libpthread-2.5.so
b7f9c000-b7f9f000 rw-p b7f9c000 00:00 0
b7f9f000-b7fc4000 r-xp 00000000 08:01 12895498 /lib/tls/i686/cmov/libm-2.5.so
b7fc4000-b7fc6000 rw-p 00024000 08:01 12895498 /lib/tls/i686/cmov/libm-2.5.so
b7fc9000-b7fd4000 r-xp 00000000 08:01 12861504 /lib/libgcc_s.so.1
b7fd4000-b7fd5000 rw-p 0000a000 08:01 12861504 /lib/libgcc_s.so.1
b7fd5000-b7fd9000 rw-p b7fd5000 00:00 0
b7fd9000-b7ff2000 r-xp 00000000 08:01 12861461 /lib/ld-2.5.so
b7ff2000-b7ff4000 rw-p 00019000 08:01 12861461 /lib/ld-2.5.so
bf8a0000-bf8b5000 rw-p bf8a0000 00:00 0 [stack]
ffffe000-fffff000 r-xp 00000000 00:00 0 [vdso]
Though I used -g option and no O flags it does not give the exact code location where the problem exists. I
After searching over the internet I understood that, it comes because I am corrupting the memory. Either writing data in array out of bounds (Yes, I am using big array, but I am checking explicitly before accessing every element in the array) or accessing an illegal heap memory. But as the code is huge, I could not figure it out just looking at it. So I turned to ValGrind for this to see where memory corruption is happening. I ran the code with ValGrind, it runs well till sequential part of the code, but when it comes to parallel part (Pthread creation part), It is not doing any thing. With the help of "top -H -p pid" I see that all threads are created, but they are in sleep mode. The original code (without valgrind) does not hang which I ran for a long time (But I cannot give guarantee that it is deadlock free). Is using Helgrind (Thread error detector of valgrind) any useful?
Can anyone point me to the document or similar issue. It is ValGrind version 2. Machine is i686, Linux operating system.
Thanks D. L. Kumar