HOME BLOG ARCHIVE TAGS

Code Pinpointing - Zen Approach #3

February 23, 2013

This is part of a series of posts dedicated to code-pinpointing. If you missed Zen Approach #1 or Zen Approach #2, please check them out. We may build upon it.

Today we’ll talk about memory leaks. Some readers asked for more GNU/Linux techniques, and that’s the environment I’ll use to present an old/powerful debugging trick. As usual, let’s catch up with the basics first.

Every C/C++ programmer with minimal experience knows that resource management is hard. Specially when dealing with complex components and manual memory handling.

Usually, in such systems, heap/dynamic based memory management is done by malloc/free or new/delete pairs (standard constructs). When everything is done correctly, all the memory blocks allocated must be freed.

Memory leaks happen when memory isn’t properly returned to the heap manager. Sophisticated profilers/tools/compilers/runtimes can help programmers find these kind of bugs. Unfortunately, it’s just not always possible to use this automatic support to spot problems.

In common GNU/Linux systems, glibc is the C/C++ runtime responsible for the standard memory manager. This is a user-space component (a shared library, most of the time) that provides a heap on top of raw operating-system/kernel memory primitives. With special options, glibc’s memory manager can help programmers track heap usage.

But what if we can’t count on automatic support to identify where a program is leaking memory? How can we pinpoint the offending code? (e.g., in release builds under stress testing)

Knowing how glibc’s heap is implemented, and leveraging the fact that, if we are leaking memory with some consistency, there will be lots of memory blocks with good “signatures” left behind, we can use a debugger to collect some of these “trails”. With some luck, these “footprints” are recognizable enough to help pinpointing problematic code.

Random memory picking is not as bad as it sounds. First, we can narrow our search efforts. Second - and more important - leaking consistency helps a lot. After a while, when out of memory errors start to happen, we come to a point where the leaked memory is much larger than the tracked one. Without too much struggling, just dumping some memory blocks is enough to hit something. Strings, numbers, pointers… - whatever. Chances are we’ll find familiar patterns.

gdb is ubiquitous. It’s assumed to be our debugger from now on. To present the technique, we’ll use list 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
//
// list1 - two memory leaks simulation
//
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char *argv[])
{
    const size_t LEN1 = 32;
    const size_t LEN2 = 3 * 1024 * 1024;

    unsigned char *p1 = (unsigned char *) malloc(LEN1);
    unsigned char *p2 = (unsigned char *) malloc(LEN2);

    memset(p1, 0xAA, LEN1);
    memset(p2, 0xBB, LEN2);

    getchar();

    return 0;
}

The program can be compiled using the command “gcc leak.c -o leak”, to produce the resulting executable “leak”. If executed, it just allocates two memory blocks, puts recognizable “signatures” inside them, and waits indefinitely for a keyboard hit. This is enough to emulate the behavior of a long-lived buggy program, waiting for some kind of IO (like network activity, filesystem operations, etc).

We use memory blocks with very disparate sizes to fire two different strategies employed by glibc: small blocks are allocated with sbrk (glibc __brk, kernel syscall brk), and big blocks are allocated with mmap (glibc __mmap, kernel syscall mmap2). These strategies can be further investigated using glibc’s malloc.c.

After loading gdb, and attaching to an already-running “leak” process, we can start to pinpoint. First, let’s examine the small blocks, using gdb’s info proc command:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
(gdb) info proc status
process 12553
cmdline = ‘./leak’
cwd = ‘/tmp’
exe = ‘/tmp/leak’
Name: leak
State: T (tracing stop)
(...)
StaBrk: 09777000 kB
Brk:    09798000 kB
StaStk: bf89c3e0 kB
(...)

Some parts of the dump were omitted for brevity. StaBrk and Brk represent the start and end of process’ data segment. As we’ve seen earlier, that’s a good memory region to pick small heap blocks.

If we dump some words from StaBrk, we’ll find the 0xAA byte signature used in list 1. In real situations, it’s better to start the memory picking backwards, from Brk. This is so because higher addresses will probably hold the last leaked data (some jumps back and forth may be needed, before finding something useful).

We don’t have such a high number of small leaked blocks. For exemplification purposes, we dump from the start (using the x command):

(gdb) x/32wx 0x09777000
 0x9777000: 0x00000000 0x00000029 0xaaaaaaaa 0xaaaaaaaa
 0x9777010: 0xaaaaaaaa 0xaaaaaaaa 0xaaaaaaaa 0xaaaaaaaa
 0x9777020: 0xaaaaaaaa 0xaaaaaaaa 0x00000000 0x00020fd9
 0x9777030: 0x00000000 0x00000000 0x00000000 0x00000000
(...)

To search big memory blocks, we’ll have to change our approach slightly. As mmap is used for big allocations, process’ data segment isn’t useful as a search range. To find potential address candidates, gdb’s mapping command can help:

(gdb) info proc mappings
process 12553
cmdline = './leak'
cwd = '/tmp'
exe = '/tmp/leak'
Mapped address spaces:
Start Addr   End Addr       Size     Offset objfile
  0x5de000   0x5f8000    0x1a000          0 /lib/ld-2.5.so
  0x5f8000   0x5f9000     0x1000    0x19000 /lib/ld-2.5.so
 (...)
 0x743000   0x744000     0x1000   0x141000 /lib/libc-2.5.so
  0x744000   0x747000     0x3000  0x744000
 0xf10000   0xf11000     0x1000   0xf10000 [vdso]
 0x8048000 0x8049000     0x1000          0 /tmp/leak
 0x8049000 0x804a000     0x1000          0 /tmp/leak
0x9777000  0x9798000    0x21000  0x9777000 [heap]
0xb7c89000 0xb7f8c000   0x303000 0xb7c89000
0xb7f95000 0xb7f96000     0x1000 0xb7f95000
0xbf889000 0xbf89e000    0x15000 0xbffe9000 [stack]

For the sake of completeness, program’s data segment information is again marked in blue (this shows that brk “spelunking” can be done in more than one way).

We’re interested in anonymous memory ranges, because glibc sets the flags PROT_READ|PROT_WRITE|MAP_ANONYMOUS|MAP_PRIVATE when mmap() is called by the heap manager.

The candidates are marked above, in orange. Unfortunately, we get lots of ranges sometimes (when a great number of big blocks are leaked). To narrow the options, we can use the fact that glibc controls its allocation strategy with an mmap threshold (it’s dynamic and configurable; the gory details are not so relevant for our discussion). Very small mmap’ed regions are not good candidates for dumping.

First candidate above has only 12 KiB (0x3000). Last one has only 4 KiB (0x1000). The second block - with 3 MiB (0x303000) - has potential. One more time, gdb’s x command is used to dump part of this region (the reader should always keep in mind that backwards searching is frequently more promising in real situations; again, for exemplification purposes, the dump is taken from the beginning):

(gdb) x/32x 0xb7c89000
 0xb7c89000: 0x00000000 0x00301002 0xbbbbbbbb 0xbbbbbbbb
 0xb7c89010: 0xbbbbbbbb 0xbbbbbbbb 0xbbbbbbbb 0xbbbbbbbb
 0xb7c89020: 0xbbbbbbbb 0xbbbbbbbb 0xbbbbbbbb 0xbbbbbbbb
(...)

There we go: list 1 0xBB byte signature is found, providing the context for the code pinpointing.

In practice, some playing with the memory ranges may be needed, coupled with different dump sizes. But the “zen” works at its best: no special binaries are used, and no complex/slow tools are necessary (gdb-heap extension still doesn’t support C++). Only basic debugger commands are enough to provide some insights to the programmer.

Next in series: Zen Approach #4