HOME BLOG ARCHIVE TAGS

Linux and Win32 Native Thread Naming

November 07, 2015

Multithreaded programing is hard. One technique that helps developing complex software is to set thread names, later identifiable in the debugger (or logs).

Linux and Windows have very different approaches to native thread naming. While Linux offers a direct and globally accessible method - in the best Unix-like I/O tradition - Win32 suffers with kludgy ways of achieving the same goal.

WINDOWS GOT IT FIRST

A long time ago, Matt Pietrek (of Under The Hood and BoundsChecker fame) wrote about the thread information block (TIB), a data structure associated with each Win32 thread. He documented common TIB fields, and it didn’t take long for programmers to setup pvArbitrary pointers as extra TLS slots.

One interesting pvArbitrary application is described by McKay and Woodring in their book “Debugging Windows Programs”. To set its name, a thread calls the special function below (snippet from the book):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
BOOL SetThreadName( char *pszName )
{
(...)
    char **ppszThreadName = 0;

    __asm {
        mov eax, fs:[0x18] // calling thread TIB
        add eax, 0x14      // pvArbitrary is at offset 0x14 in TIB

        // ppszThreadName = &TIB->pvArbitrary
        mov [ppszThreadName], eax
    }

(...)

    *ppszThreadName = pszName;

(...)
}

When debugging, pseudo register @TIB can be employed in a watch window expression to retrieve the name:

1
(char *)(dw(@TIB + 0x14))

They also described a second technique, removed from above SetThreadName() for readability. It was presented first by Jay Bazuzi at the 1999 Microsoft TechEd Conference (back then, he was a developer from Visual C++ team).

This second mechanism is now the standard: a special SEH Exception is raised to name a thread. Tools that implement this protocol catch the first chance exception, receive the thread id, do IPC with ReadProcessMemory(), and recover the name.

Unfortunately, Windows kernel has no knowledge of these user space conventions. Problems can happen in many unpredictable ways, as when @TIB->pvArbitrary is already taken by third-party components. Finally, the RaiseException() mechanism is very error prone (subject to annoying races, and does nothing when no one is listening).

BUT LINUX GOT IT RIGHT

Until kernel version 2.6.9, there was no way to natively set thread names in Linux; and it was not until release 2.6.11 that a corresponding get facility was created.

Since glibc 2.12, pthread_setname_np() and pthread_getname_np() can be called to set/get thread names. Pthreads leverages kernel functionality directly through prctl() or ‘/proc/self/task/[tid]/comm’ files.

Some confusion surrounds prctl() options. Old documentation stated that PR_SET_NAME and PR_GET_NAME worked on process names. In fact, they are used to operate on tasks (threads and processes under Linux are abstracted as schedulable tasks; see clone() system call for more details).

Thread naming is easy to implement on outdated glibc systems:

1
2
3
4
5
6
7
#include <sys/prctl.h>

// task name is restricted to 16 chars, including '\0';
prctl(PR_SET_NAME, (unsigned long) "thread_name");

// make sure name is at least [15 + 1] chars in length;
prctl(PR_GET_NAME, (unsigned long) name);

Old GDB versions can’t show task names with info threads. In this case, a bash loop inside appropriate ‘/proc/[pid]/task’ directory is enough to generate a list:

1
$ for t in */; do echo -n "$t - "; cat "$t"/comm; done

Linux thread name limit of 16 chars comes from TASK_COMM_LEN. When prctl syscall is made to PR_SET_NAME/PR_GET_NAME, current task ‘comm’ field size enforces it indirectly. This small buffer capacity is a low price to pay, for a very consistent functionality.