HOME BLOG ARCHIVE TAGS

Code Pinpointing - Zen Approach #2

November 10, 2012

This is part of a series of posts dedicated to code-pinpointing. If you missed Zen Approach #1, please check it out, for better understanding. We may build upon it.

Today’s technique is dedicated to the adventurous guys that have to deal with the mysteries of Win[32/64]/MSVS/MSVC (native?) programming. First, let’s catch up with the basics (disclaimer: I may be a little rusty regarding MS ecosystem… one forgets…).

Microsoft’s compiler (or maybe CRT/Windows itself) fills some blocks of memory with known values when debugging (I’m talking here about “debug-/GZ-/RTC-etc-yada-yada” builds, and/or running a program under a debugger). The setup:

- stack-based buffers are filled with bytes 0xCC; a nice property about this is that, in x86/x86-64 CPUs, 0xCC is the opcode for int3 instruction (software breakpoint interrupt); more on this later;

- heap-based buffers are filled with a bunch of weird values (0xFD, 0xDD, 0xCD, 0xAB, 0xBAADF00D, 0xFEEE, 0xE0, 0xF0, etc); details can be found on MSDN for OS/CRT;

The theory underlying this behavior is as follows: memory/variables initialization is a good programming practice (isn’t it?!). There’s some controversy regarding this subject, though. Depending on who you ask, this may or may not be seen as a good thing to be done on our backs (debug/release builds are usually very distinct, and forcing the stack/heap to behave differently when debug-[building/running] may turn the gap even larger, making for difficult to find Heisenbugs).

Memory related bugs are hard fellows. But as soon as we encounter the special “tags” (like in minidumps, debugger windows, logs, etc) we keep in mind that we’re (probably) dealing with these programming errors. And if we get the dreaded “access violation” (exception 0xC0000005; see GetExceptionCode() for more details), we can start to pinpoint the offending code more consciously. How? Among two things, we must be:

a) TOUCHING invalid memory locations (read or write operations); or b) EXECUTING invalid memory locations;

Let’s tackle the immediate case [a] first. In such scenarios, chances are we’re getting crashes like the following:

alt access violation Access Violation touching mem with uninitialized ptr

We can see that something is being written to at address 0xCCCCCCCC. With the basics at hand, there’s a high probability that we’re dealing here with easy to spot uninitialized pointer indirection. And this (probable) pointer is a local/stack variable, declared in some function we can spot effortlessly (cause we also have the valid EIP/RIP address 0x5A0FB586 being executed).

How can we infer this? The trigger is byte 0xCC mentioned earlier, for uninitialized buffers used in debug builds. A local/stack pointer variable just happens to be a small buffer, with 4 bytes in length (8 bytes for 64-bit environments). That’s why the “touched” location is 0xCCCCCCCC (32-bit environment here), and the pinpoint is as easy as it gets (normally, the developer has the context, cause a debug build/run is in place).

It sounds too easy, but one can get lost when dealing with layers of OOP code, smart-pointers, ATL/COM, etc. Assuming an uninitialized pointer indirection is a small detail. But it may offer even more context for proper offending code pinpointing. By the way, talking about smart-pointers, if we’re using heap-based/dynamic variables (instead of local/stack ones), the invalid location will probably be 0xCDCDCDCD[CDCDCDCD] (uninitialized heap memory) for the invalid indirection. The other heap patterns may also occur in practice; another common “address” is 0xDDDDDDDD[DDDDDDDD] (free’d heap memory) for the “use after unallocation” type of bug. I guess, at this point, the reader has already gotten the idea.

Things turn interesting when we talk about [b] scenarios: access violations are raised because invalid EIP/RIP is used by CPU for an invalid memory address execution (the address has the recognizable patterns described).

Knowing what we already learned by [a] scenarios, it’s not hard to extrapolate some conclusions, and anticipate uninitialized/already-unallocated buffer usage (be it local, from stack, or dynamic, from heap). The nasty detail here is how the instruction pointer got the invalid value in the first place, if this register can’t be manipulated directly? (talking here about nonexistent things like “mov eip/rip, val“; there are techniques for indirect EIP/RIP manipulation; they’re outside the scope of this article, and not found in typical programs)

As we saw in Zen Approach #1, corrupted stacks are prone to EIP/RIP indirect changing, through stack frame return address overrun. That’s our pinpoint. Just looking at the registers, we find out that (probably) an uninitialized/already-unallocated buffer (or part of it) is causing a local/stack buffer-overflow. Zen Approach #1 can improve even more the pinpointing. We skip EIP/RIP altogether (it’s caused the address execution fault - nothing more to be learned there). EBP/RBP and ESP/RSP are analyzed directly, and the remaining “puzzle” pieces snap into place eventually.

[BONUS NOTES]

1) Sometimes, soft breakpoint exceptions are raised with no (apparent) reasons whatsoever; even when we disable all the explicit breakpoints created before, or use release builds - just to stop the weirdness.

Now we can explain what causes them (for the sake of exemplification, let’s pretend things like buffer-overflow detection and NX-bit are not used): if we ever try to execute code in uninitialized stack memory, the bytes 0xCC can be fetched and executed by the CPU as int3 instructions. With release builds, the usual culprits are the padding bytes written by the linker to align sections.

2) Unfortunately, sometimes, a previous perfect system starts to raise soft breakpoints (or access violations) randomly, when executing unrelated applications, for no apparent reason (hardware issues taken aside).

Malware writers are normal people too, and some hooking techniques are better debugged with explicit 0xCC opcode injections. At “release” - in the wild - that remaining trash may happen to be executed.

When my wife infected every possible electronic device made by humankind, further investigations (hard debug sessions, let me tell you) revealed the malicious software presence at two computers running Windows. They behaved like random software exceptions were the rule, not the… - well, exception (no pun intended). So, watch out.

Next in series: Zen Approach #3