HOME BLOG ARCHIVE TAGS

ELF symbol visibility and the perils of name clashing

January 04, 2013

ELF files are the de facto common format used by most modern Unix-like operating systems. For object building, linking and/or loading (executables, shared libraries, core dumps, etc). Originally used in System V and Solaris environments, GNU/Linux systems/tools embraced the format a long time ago, to overcome limitations in old binary executable/linking formats (like “a.out” and COFF).

I don’t want to repeat the good information provided by better sources, such as classic Ulrich Drepper’s How to Write Shared Libraries, and Linkers and Loaders (by John Levine). It suffices to say that, compared to the more widely known PE format, ELF is definitely much more powerful and flexible (some concepts are similar, though: PLTs/GOT and IATs, Delay-Loading and lazy binding, etc).

This complexity comes at a price. In execution time and developer terms: it’s much more difficult to program with ELF correctly (specially in C/C++), without knowing, at least, some of the under the hood dirty details about the related specifications. (compared to them, Matt Pietrek’s An In-Depth Look into the Win32 Portable Executable File Format is a walk in the park)

One such trait of ELF scheme not infrequently causes troubles for programmers: ELF dynamic linking employs a single namespace (this is a simplification; in practice, dynamic linking scopes are a dense topic). We can contrast this with the Windows DLL loader, that uses a namespace for each library/PE.

ELF executables/libs list the dynamic symbols they need at runtime resolution, and their shared object dependencies. But, aside from specific versioned symbols, no bind between the symbols and the libs is recorded. (off-topic note: a discussion about dynamic linking, versioned symbols, and other interesting stuff can be found here.)

This comes to no surprise, cause symbol interposing/preemption was deliberately designed as a powerful enabler of dynamic hooks, interface specialization, and other useful techniques. But here comes the bite: if programmers are not careful/knowledgeable enough, their DSOs (dynamic shared objects) may expose unnecessary symbols. And they can be used in the wrong way at runtime, clashing with other libs.

There are many ways to avoid exposing elements unnecessarily in the ELF dynamic symbol table (used by the dynamic linker for resolution): C declarations can be made static (or anonymous, in C++ classes), gcc attributes and command-line options exist to control symbol scope/visibility, and export maps can be fed to the linker at build time, with very granular control.

I won’t discuss them, cause they’re presented in great depth in the aforementioned references. I want to talk about an ELF symbol clashing I had to deal with recently, and present a binary patch technique to overcome the problem, when source-code changing is not a viable solution (it was not in this specific case).

Something similar happened to the Samba project several years ago (original bug). They got a name clashing at run-time. And a specific function symbol was preempted/interposed by another one, in a different library.

The net result - when function signatures are different - is usually a stack corruption (at best!), that normally raises a SIGSEGV at call return time. Undefined behavior arises, when the functions involved have exactly the same signature (or compatible arguments), but different (and incompatible) implementations.

The culprit was memdup()’s symbol, exported and used by two different ELFs, loaded at the same address-space. The fix applied was immediate: to rename samba’s memdup() implementation to samba_memdup(). It’s one possible path. But an incomplete one, nonetheless. The function is internal only, and should not have been exported (warning: I believe the same applies to the “interposer” libsnmp, but didn’t check).

The trivial fix documented was possible because they could change the code and keep it up to-date. Unfortunately, this is not always the case. Some clashes involve binary-only components. Others involve code that just can’t be forked and managed, because there are too many associated costs.

memdup()’s clash was exactly the situation I had to confront, when I tried to integrate libslp, libsnmp, and a complex program. The generated SIGSEGV was handled by a specialized function, that gave me the following typical backtrace (edited for brevity and exemplification purposes):

[fcollyer@localhost tmp]$ cat UNHANDLED_FAULT_BACKTRACE.log.4923
./app.exe(_Z25exception_handleri+0x25)[0x8060800]
/lib/libc.so.6[0x857310]
 /usr/lib/libsnmp.so.20(memdup+0x7c)[0xb768a1fc]
 /usr/lib/libslp.so.1(NetworkConnectToSA+0xa1)[0xb7705411]
/usr/lib/libslp.so.1(ProcessSrvDeReg+0x128)[0xb77021d8]
/usr/lib/libslp.so.1(SLPDereg+0x114)[0xb7702364]
/app_path/app_core.so(_Z22notify_changei+0x182)[0xb3408507]
./app.exe(main+0xa49)[0x805f6e8]
/lib/libc.so.6(__libc_start_main+0xe6)[0x737ce6]
./app.exe[0x804e491]

We have two strange sequences in red, where libslp - somehow - managed to call into the (totally unrelated) libsnmp before the crash.

Just looking at “memdup+0x7c” was enough to realize that memdup() was wrongly exposed/used by the libs, cause backtrace_symbols_fd() resolved the name, and this function needs the -rdynamic compiler option to work.

In fact, further investigation revealed that buffer-overflows didn’t occur (they can mess the stack/execution-order in unpredictable ways), and that the two codebases actually have a symbol resolution dependency on memdup. The snippets from their sources:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
 // a) libslp

/*=========================================================================*/
void* memdup(const void* src, int srclen)
/* Generic memdup analogous to strdup()                                    */
/*=========================================================================*/
{
    char* result;
    result = (unsigned char*)xmalloc(srclen);

    if(result)
    {
        memcpy(result,src,srclen);
    }

    return result;
}

 // b) libsnmp

/** Duplicates a memory block.
 *  Copies a existing memory location from a pointer to another, newly
    malloced, pointer.

 *    @param to      Pointer to allocate and copy memory to.
 *      @param from    Pointer to copy memory from.
 *      @param size    Size of the data to be copied.
 *    
 *    @return SNMPERR_SUCCESS    on success, SNMPERR_GENERR on failure.
 */
int
memdup(u_char ** to, const void * from, size_t size)
{
    if (to == NULL)
        return SNMPERR_GENERR;

    if (from == NULL) {
        *to = NULL;
        return SNMPERR_SUCCESS;
    }

    if ((*to = (u_char *) malloc(size)) == NULL)
        return SNMPERR_GENERR;

    memcpy(*to, from, size);

    return SNMPERR_SUCCESS;

}                              /* end memdup() */

With the help of objdump tool, the dynamic symbol tables were checked for the clash:

[fcollyer@localhost tmp]$ objdump -T /usr/lib/libsnmp.so|grep
memdup 00058180 g DF .text 00000080 Base memdup

[fcollyer@localhost tmp]$ objdump -T /usr/lib/libslp.so|grep memdup 008c14f0 g DF .text 0000004d Base memdup

memdup” is really a global symbol in both ELFs (standalone ‘g’ flags above), and the first lib in the dynamic linker’s scope will be used for the (clashed) symbol resolution. In my case, libsnmp was the first to be searched, and its memdup implementation was called by libslp, instead of libslp’s code. Interposing/preemption in its best form!

I took the ELF binary patch approach to fix the situation, avoiding libsnmp and libslp source-code management altogether. Workarounds involving any kind of intra-app redirection (or any change at all) could pollute the project’s codebase with very strange code/logic (maybe, even from outer components). Finally, the patched libs are used in a firmware runtime environment that stays stable for years. The patch approach was easy to understand, simple to document, and fast to apply and test for correctness.

The idea behind the patch is simple: do what should have been done by the libs. Instead of keeping the symbols as globals, satisfying another ELF undefined reference, the symbols are turned local and hidden, not visible outside the object files containing each definition. The practical effect is that libsnmp and libslp both will use their memdup implementations. It may sound complicated, but just a little bit of ELF knowledge is needed. For the patch itself, a regular hex-editor is employed, without resorting to specialized tools (such as ELF shell).

I’ll take the steps applied in the app runtime system, that is slightly different from the environment I use for development. First, the dumps that allow us to workaround the issue, and verify the process:

[fcollyer@host2 /]$ objdump -T /usr/lib/libsnmp.so|grep memdup
 000542e0 g DF .text 0000006e Base memdup

The first integer - 0x000542E0 above - is the symbol offset, described by the field st_value of Elf32_Sym struct (can be found in /usr/include/elf.h, shown below) . As we’re dealing with a 32-bit little-endian system, we invert the integer, and grep it inside an hex-editor. The result will be something like this (after sym structure declaration):

1
2
3
4
5
6
7
8
9
typedef struct
{
  Elf32_Word    st_name;                /* Symbol name (string tbl index) */
  Elf32_Addr    st_value;               /* Symbol value */
  Elf32_Word    st_size;                /* Symbol size */
  unsigned char st_info;                /* Symbol type and binding */
  unsigned char st_other;               /* Symbol visibility */
  Elf32_Section st_shndx;               /* Section index */
} Elf32_Sym;

alt dyn_sym_patch1

st_name and st_value are highlighted in red, st_size in green, and the patch targets st_info / st_other in purple. To understand what is encoded in st_info, again, we look in /usr/include/elf.h:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
/* Legal values for ST_BIND subfield of st_info (symbol binding).  */

#define STB_LOCAL       0               /* Local symbol */
#define STB_GLOBAL      1               /* Global symbol */
#define STB_WEAK        2               /* Weak symbol */

/* Legal values for ST_TYPE subfield of st_info (symbol type).  */

#define STT_NOTYPE      0               /* Symbol type is unspecified */
#define STT_OBJECT      1               /* Symbol is a data object */
#define STT_FUNC        2               /* Symbol is a code object */

It’s clear from the definitions and comments that st_info field is interpreted as two separate nibbles - one for symbol binding, and the other for symbol type. As expected, memdup is encoded in libsnmp.so with 0x12, that represents a global function symbol. For st_other encoding, we have:

1
2
3
4
5
/* Symbol visibility specification encoded in the st_other field.  */
#define STV_DEFAULT     0               /* Default symbol visibility rules */
#define STV_INTERNAL    1               /* Processor specific hidden class */
#define STV_HIDDEN      2               /* Sym unavailable in other modules */
#define STV_PROTECTED   3               /* Not preemptible, not exported */

The default visibility is used, represented by the 0x00 byte. To change the binding and visibility, we can patch libsnmp.so, to make it look like:

alt dyn_sym_patch2

In purple we have the bytes 0x02 used as the patch values. The first 0x02 encodes a new binding - local to the defining file - with the same type as before (memdup is still a function). The second 0x02 changes the symbol visibility to be hidden from other ELFs.

Let’s check the patch effectiveness, with another objdump round:

[fcollyer@host2 /]$ objdump -T /usr/lib/libsnmp.so|grep memdup
 000542e0 l DF .text 0000006e Base .hidden memdup

I marked the diffs in blue. memdup symbol isn’t global anymore. The ‘l’ flag above asserts this, which means that it’s now a local symbol. And the “.hidden” attribute also appeared, establishing, for sure, that we changed the correct Elf32_Sym entry.

For the sake of brevity, I only showed the process for libsnmp. But it should also be applied to libslp (that was the case for our runtime).

As a final observation, it’s important for the reader to note that binary techniques still have their place in complex scenarios. But they shall be used with great caution and precision, to avoid the future maintainability costs/risks that were meant to be mitigated, in the first place.