Code Pinpointing - Zen Approach #1

October 31, 2012

We live in an era of non-native code. [Byte/Managed]code that is generated by layers of so called enterprise software - used by many, understood in its entirety by few. Unfortunately, programmed by corporate developers that have little interest - or no time at all - to deal with the complexity of close to the metal issues (translation: low-level programming).

Having started my career as a “dim-rs-as-recorset-to-iterate-and-generate-reports” kind of programmer, I don’t wanna bash anyone. In fact, this is a very natural path for an early career in IT. But someone has to write the OS, compiler, interpreter, database… And the virtual machines are not developed by themselves either (there’s still room for that kind of lost programming “lore”).

Nowadays, even though C/C++/low-level code is not used very often for business tasks (at least, not for the majority of them), even developers assigned to traditional business level development have to deal with code interactions belonging to application servers, low-level database layers, and/or the underlying operating system (just to name a few).

As we all in the field know, when things get ugly, they get really, really ugly. And according to the well established Murphy’s Law, this usually happens in the wild (we get very screwed).

I believe everybody has seen variations of:

alt GPF Information covered to protect the guilty
(Windows 95… x86… “good” ol’ days… SoftICE, I miss you!)

and variations of:

alt segfault Information covered to protect the guilty
(Linux… x86-64… from time-to-time, Humanity evolves!)

Tracking bugs/dumps/logs left behind by layers of ABIs, compilers, libs and multithreading can be overkill. Specially if the “crumbs” mean Greek to you.

This way, I’m planing to start a series of posts dedicated to code-pinpointing. Like Dorothy says, “there’s no place like home”. And discovering where our problems begun, starting from just a bunch of weird hex numbers, can make you feel exactly this way: like coming home. And, from there, things turn out to be easy once again.

With basic logic, some experience/guidance (luck also counts, as we’ll see later), and simple techniques, I believe anyone can at least try to find out where some obscure bug was originated. Or extract its context. But please… don’t underestimate the power of simplicity. With only this first “zen approach” under your belt, you be amazed by what can be inferred.

I won’t cover (right now) some tricky details about low-level CPU architecture, stack layout, address-space layout randomization, and stuff like that. And I also won’t lose reader’s time explaining how simple tasks like editing, compiling, linking, and running programs (inside or outside a debugger) are done.

As a “warm up”, I prepared a very dumb/buggy code snippet, to demonstrate how to start to approach core dumps, crash reports, and minidumps. It was written in C, and compiled with gcc:

$ emacs -nw bug.c

#include <stdio.h>
#include <string.h>

int main(int argc, char *argv[])
    printf("about to generate a fault...\n");

    char msg[4] = { 0 };

    strcpy(msg, argv[1]);

    return 0;

$ gcc -o bug.exe bug.c $ strip -s bug.exe

After preparing the sample program, let’s simulate one specific fault condition, to exercise our reading capabilities (some details removed for brevity):

$ gdb --args ./bug.exe xxxxxx-sometimes-eyes-just-cannot-see-it
 Reading symbols from /tmp/bug.exe...(no debugging symbols found)...done.
 (gdb) run
 about to generate a fault...

Program received signal SIGSEGV, Segmentation fault.
 0x08048420 in puts ()
 (gdb) info r
 eax            0x0       0
 ecx            0x732d7878       1932359800
 edx            0xbfffe8b2      -1073747790
 ebx            0x74656d6f       1952804207

 esp            0x732d7874       0x732d7874
 ebp            0x73656d69       0x73656d69
 esi            0x244ca0 2378912
 edi            0x0      0
 eip            0x8048420        0x8048420 <puts+312>
 eflags         0x210286 [ PF SF IF RF ID ]
 cs             0x73     115
 ss             0x7b     123
 ds             0x7b     123
 es             0x7b     123
 fs             0x0      0
 gs             0x33     51

The idea is simple. We executed the program with some input. It was processed, and the software crashed. GNU debugger (GDB) was used to simplify the exercise. In typical real scenarios, we’ll almost always have to deal with core dump files (or other postmortem data). But the principles are the same: the first thing to look for in a dump like this is the value of CPU register EIP (RIP in 64-bit environments). Why?

EIP holds the faulty instruction address, and it may carry important context. In this particular case, we got no luck. EIP actually represents the address of a valid instruction (that “touches” an invalid thing). We can learn this with a simple disasm command (more about this in future posts).

The second thing to look for in a dump like this is the value of CPU register ESP (RSP in 64-bit environments). Why?

ESP is the stack pointer. And it can also carry some clues. As local variables are often handled with registers ESP and EBP, we can pinpoint important code from them. Finally, we take note of EBP value (RBP in 64-bit environments).

Before starting to use the aforementioned steps, a small digression about why we approach the available data like this is necessary.

x86/x86-64 CPUs have an architectural trait regarding stack frames: inside them, we can look for previous stack frame information, and we always have the return address of the current function’s caller (this value becomes the EIP/RIP, at the moment a “ret” machine instruction is executed).

As local function variables may also live on the current stack frame, there lies the issue: if we are not careful enough to manipulate them, chances are that we’ll probably overwrite important data. Maybe, we’re just luck enough to only trash an unused memory block (like padding). But, unfortunately, most of the time, we’ll overwrite important structures. And the result will be a corrupted stack frame. This means that EIP, ESP and EBP may assume corrupted values.

Astute readers may have already got the point: if we know how to look at these 3 registers in the right way, we may be able to locate the offending code. Without resorting to long debugging sessions. Just like that. Straight to the point. And this is how it’s made: knowing that x86/x86-64 CPUs are little-endian pieces of silicon, and assuming, just for the sake of an honest try, that the fault is related to (non Unicode, for now) string processing, we’ll see each described register as a char array. Not a WORD. Not a DWORD. Not a QWORD. Just bytes/octets.

Going back to our dump analysis, we already know that EIP is a real memory address, and we try the trick directly with ESP, that holds the value 0x732d7874. First, we invert the byte order, because we’re dealing with little-endianness. The result: 0x74, 0x78, 0x2D, 0x73. Interpreting them as a small ASCII array, we arrive at ‘t’, ‘x’, ‘-‘, ‘s’ - the string “tx-s”. Not very promising, though. But with EBP, things get a little more interesting, as can be seen below in the hex editor:

alt hex dump

Starting from 0x73656d69, we derive the inverted array 0x69, 0x6D, 0x65, 0x73. Interpreting them as ASCII, finally, we get to the string “imes”. And now we found something!

I (artificially) used the argument “xxxxxx-sometimes-eyes-just-cannot-see-it” as input to the test program. Something at some stack-frame/function-call was corrupted, and EBP became a thing that can be seen as the string “imes”. This is a substring of the input: “xxxxxx-sometimes-eyes-just-cannot-see-it”. A great hint to pinpoint the problematic code.

Going back to line 10 of the sample snippet, the input is processed inside a local buffer. In the very buggy/faulty way described.

As contrived (and pathetic!) as this example may sound, you’ll be amazed by how many times this works in practice. Just one substring, that can lead us direct to the problematic code location. Bypassing multiple layers of code, and saving a lot of time and effort.

My coworkers know that, as a rule of thumb, I spend as little as possible stepping in debuggers. And this is on purpose. The cycle “…breakpoint, step, inspect, step…” is tedious, repetitive, and counterproductive. Sometimes, it’s even impossible to use it immediately, as when we are debugging complex code (specialized hardware firmware, critical game paths, and kernel code are such examples).

It’s not unusual for me to be away from the debugger for months; in fact, I almost always use debuggers only to practice/study reverse engineering, not to develop/debug production code.

I invite the reader to try leveraging this simple technique. It’s a more relaxed (starting) approach for bug hunting, that usually pays for itself multiple times. Specially with nasty memory programming bugs, when cause and effect correlation is a very difficult task at hand.

Next in series: Zen Approach #2