Back to the Basics: The Mighty Assert

January 17, 2013

There are programmers still developing software without any kind of protective measures. The result is more time spent with debuggers, and no spare time with their families (the less we spend stepping in the debugger, always the better).

I consider asserts the first line of defense for proactive debugging. For the starters, what is an assert? A programming assertion is a statement placed in a computer program, where the developer thinks a predicate is always true.

I do most of my development in C/C++, and that’s what I’ll use to present the technique. This is not to say that higher-level languages wouldn’t benefit from assertion usage - quite the opposite! High-level languages/frameworks provide heavy asserting primitives nowadays, exactly because of the their importance.

Picture a routine to manipulate human heights. Sooner or later, we’ll have to ask: does it make sense to have a 0.0 human stature? Or a negative one? Not at all, or course. This way, we can encode this believing in an assertion, e.g.,

// [1] (...)
assert( height > 0.0 );
// [2] (...)

Any time the assert line is executed, we’re sure that no “impossible” height is in place. But what if this assumption fails? We say that the assert “failed”. And we better be notified of this extraordinaire event.

There are all kinds of tricks that can be played to notify the programmer that a specific assert failed. One example of assert() notification (glibc, GNU/Linux system):

app.exe: app.c:8: handle_height: Assertion `height > 0.0' failed.

Unfortunately, as can be seen in the dump above, ANSI C/C++ assert() functionality is so drastic and fatalist, that, any time an assertion fails, the program is aborted! (this may or may not be a desired behavior; for the kind of software I have to work with, it is not - we’ll see later one way to overcome this)

The basic assertion rules:

a) assertions are NOT error handling;
b) they should tell us what is wrong, why, and where;
c) they are reserved for debug builds;
d) they must not change the software behavior (unless explicit allowed by programmer);

In fact, we can consider item [c] a corollary of [a]. If asserts were part of the program error handling, they would have to be present all the time.

The presence of asserts only in debug builds is important in another aspect: performance. If they are removed from the final product, there’s no point in avoiding their usage (at least, not in C/C++). Assert as much as you can, without worrying about anything.

Rule [d] is very important. For custom assert implementations and proper usage. It says that program state before and after assertions must be invariant (e.g., between points [1] and [2] above). And unless the programmer being notified explicitly changes something (such as breaking into the debugger), asserts must not trigger any behavior difference between debug and final builds.

Before discussing some good and bad examples of asserting (taken from Subversion, a widely known free-software project), let’s dig one simple C++ alternative, that plays by the rules (necessary headers suppressed from the snippet, to save space):

#define STR_(x) #x

void dbg_assert(bool pred, const char *file, int line, const char *exp)
    const int original_errno__ = errno;

        assert( file != NULL );
        assert( exp != NULL );

    if ( !pred ) {
        (void) fprintf(stderr, 
            "\tAssertion failed! \"%s\", file '%s', line %d\n",
            exp, file, line);

        (void) fflush(stderr);


    errno = original_errno__;

#ifndef NDEBUG
#define ASSERT(pred) dbg_assert((pred), __FILE__, __LINE__, STR_(pred))
#else // release
#define ASSERT(pred) ((void) (0))
#endif // !NDEBUG

ANSI C/C++ assert is still used, but in a subtle way (lines 7 and 8). If the compiler/runtime/system is so broken that static strings are not properly passed to the function (strings generated by the preprocessor, at compile time), aborting is really the way to go (we can’t trust our execution environment anymore).

Two important steps must be always considered, to obey rule [d]. If any kind of global state (even in a per-thread basis, like in TLS) is changed, it’s prudent to save it, and then restore it. Lines 5 and 21 above manage this, just in case.

For debug hooks, line 18 has a commented placeholder. There’s no way to write this portably, for some reasons: maybe, small assembly stubs have to be written to raise specific faults. And each operating system has it’s own “debug interface/environment”. Even POSIX ptrace must be used in very specific scenarios, like when we fire a particular debugger (GDB? DBX?) for self-attaching.

The notification mechanism is immediate: a message is sent to stderr if the informed predicate doesn’t hold (i.e., is false). Finally, rules [a] and [c] are respected by the fact that non-debug builds (in C/C++, the ones who define NDEBUG, among other things) are not encumbered by unnecessary code (the preprocessor takes care of this, lines 24-28).


Now, let’s tackle some real-world uses of assert. At svn_error.h, Subversion (SVN) defines some debugging infrastructure (sometimes reverted to error handling in release code).

We’re interested in the debugging aspects of the implementation (and the patterns applied). For simplification/exemplification, we’ll consider all the underlying boilerplate as just one single ASSERT, disregarding some peculiarities of the codebase.

From libsvn_client/commit.c:

ASSERT(rel_targets != NULL);

This is straightforward. If the pointer rel_targets is ever passed to the involved routine with a NULL value, the assert will fail. This is good in many distinct and complementary ways, cause all the rules are respected.

Another good one:

ASSERT(depth != svn_depth_infinity);

At this point, we can already identify a good pattern for asserting: if possible, only check one condition at a time as the predicate.

A counter bad/subtle example:

ASSERT(rel_target != NULL && *rel_target != '\0');

This is a classic problem (sometimes, a real mistake). Leveraging the logical short-circuit employed by C/C++ in the expression used as predicate avoids segmentation faults. However, the problem here lies in the rule [b] violation: when confronted with a message about the assertion failure, only the “what” and “where” can be known for sure. The “why” got lost, by the fact that two expressions are checked as one predicate. Without further investigation, there’s no way to know if the rel_target string pointer is NULL, or the string length is actually zero.

A better way of doing this - taking into account that the string ptr can be NULL - is to write extra debug only code:

ASSERT( rel_target != NULL );
ASSERT( debug_str_len(rel_target) > 0 );

debug_str_len() could be defined as a thin-wrapper around strlen(), with the special NULL case handling (returning 0). Another way is actually accepting the fact that the pointer got screwed, and let the eventual second assert raise a segmentation fault.

Moving forward:

ASSERT(depth != svn_depth_unknown && depth != svn_depth_exclude);

Now things got a little more interesting. How are we supposed to handle cases like this? One solution (obvious, but not always applied): if we’re dealing with a specific set of values that must not hold, and just one valid case, the code is slightly changed:

ASSERT(depth == the_only_valid_option_in_context);

If there isn’t such unique expected value, unfortunately, we have to stick to compound predicates (and document them). A good practice is to comment complex asserts, cause they are also project code (as such, evolve with time, and are subject to maintenance)

From libsvn_fs/fs-loader.c:

ASSERT((depth == svn_depth_empty) ||
       (depth == svn_depth_files) ||
       (depth == svn_depth_immediates) ||
       (depth == svn_depth_infinity));

Here we have the same pattern as before, with inverted logic. If there’s one invalid option in context, check it explicitly:

ASSERT(depth != the_only_invalid_option_in_context);

From libsvn_wc/node.c:

ASSERT(walk_depth >= svn_depth_empty && walk_depth <= svn_depth_infinity);

This one deserves further commentary. Some authors assert index variables this way, to make sure valid ranges are used. It’s pretty common to see this, when programmers consider the composite predicate a single entity (semantically).

I still feel, though, that this is not an ideal assert structure, even if the pattern is common practice. I believe that it’s a rule [b] violation, and break these asserts in two - one dealing with the range lower-bound, and the other dealing with the upper-bound counterpart.

As a final remark, it’s important for the reader to note that common sense is inevitable when complex assertion expressions are needed. The best way to assert must be judged in a case-by-case basis, taking the rules as general guidance, not written-in-stone laws.