While debugging memory leaks in one of my private projects, I discovered that GDB and Valgrind can actually operate together in a very nice fashion.
GDB is capable of debugging remote programs, like for embedded device software development, by using a remote protocol to communicate with a proxy within the device.
Valgrind is an almost necessary tool if you are working in an environment of dynamically allocated and returned memory. It follows each allocation in your program and tracks it to see if it is returned properly, continue to be referenced or is lost in space, which is a ‘memory leak’. And as any leak, given enough time you will drown, in this case require more and more memory, until either you program is eating up your whole computer, or you get out of memory.
Valgrind can also communicate with an external process to obey some interesting commands.
The glue between Valgrind and GDB, provided by the Valgrind team, is called vgdb. vgdb is a small process that connects GDB with the Valgrind process.
I found out from the Valgrind website, what to do. Here’s what I do to hook this up.
- Start Valgrind on your program, e.g.
valgrind --vgdb=yes --vgdb-error=0 <program> <arguments>
Valgrind now initiates your program and stops just before starting it, much like GDB does, waiting for connection and commands. It is actually very helpful and prints exactly what you should do now
==21399== TO DEBUG THIS PROCESS USING GDB: start GDB like this ==21399== /path/to/gdb <path to program> ==21399== and then give GDB the following command ==21399== target remote | /usr/lib/valgrind/../../bin/vgdb --pid=21399
- So start GDB on your program, e.g
- Ensure that GDB isn’t using non-stop mode (Valgrind doesn’t like that)
gdb> set non-stop off
- Connect the GDB remote functionality to the Valgrind gdb-server. you could copy the exact command from the valgrind output, but it usually is sufficient with
gdb> target remote | vgdb
‘vgdb-error’ option in the valgrind command line indicates how many errors Valgrind should detect before stopping the program. So 1 (one) would mean that as soon as valgrind detects an error (accessing non-existing memory, double free’ing, etc.) it will give the control to the client/debugger as if a breakpoint had been hit. You can use 0 (zero) to give GDB control before execution starts, to set breakpoints for example.
What you can do
Once you have set this up you can of course benefit from stepping and breaking in your program as it actually is running with Valgrind. (You can’t ‘r’un, only ‘c’ontinue. To restart you have to do just that, start over.)
But you can also, at any point, let Valgrind do its thing, doing the leak check by sending a command to the ‘monitor’ (GDB lingo for the remote process):
gdb> monitor leak_check
This gives you an summary, exactly as at the end of any Valgrind-run, but at precisely this point in the execution. You can add various options to the command to get more out of it, but since this is sufficient for the purpose of this blog, those are left as an excercise to the reader.
One other very handy command is
gdb> monitor who_points_at
This will give you hints on from where this particular memory is referenced. Usually these are the places that are just overwritten, and the memory at <addr> gets lost.
The Valgrind server can do a lot of other stuff too.
If your program have leaks, as detected by normal batch run through Valgrind with –leak-check=full, then you probably want to remove that leak.
Sometimes it is obvious from the excellent backtrace of where the memory was allocated, to deduce when the memory is lost. Sometimes not so much.
Here’s my little technique:
- Hook up your program under GDB and Valgrind
- Put a break at where you think the memory is lost
- Continue there and run a leak check
- If there is no leak yet, slowly proceed forward, doing a leak check after each step or at every new breakpoint
- Once you see a leak, that leak occurred between the last stop and this
- Restart and take even smaller steps, with leak checks in between every one of them, an pinpoint the exact statement that creates the leak.
Now you know exactly which statement creates the leak. Why? Now that is your *real* problem…