glibc select() is an accident waiting to happen

August 26, 2017

Asynchronous I/O [0] is an important part of any programmer’s toolbox. It’s very often done with select() function, which enables multiple file descriptor monitoring.

Although strictly not necessary under Linux, glibc select() is hard-limited by FD_SETSIZE, defined as 1024 (POSIX allows the upper bound). Unfortunately, if we employ select() to monitor values bigger than 1023, behavior may get undefined (an edge-case not hard to neglect).

In a system that worked for ages, I had to debug this nasty issue, when select() highest set number suddenly became bigger than 1023. Consistent disposal of a zeroed/corrupted socket variable triggered stdin closing, and all processing was turned into a data corruption race [1].

Play it safe, even if you think you handle file descriptors bound by some count [2]. Always use poll() instead of select().

[0] - sometimes called I/O multiplexing;
[1] - according to Murphy’s law, it happened first in production environment;
[2] - maybe your code runs as a library inside a host with limited I/O activity… what happens if its workload changes?;