17 May 2015

The Preprocessor, your friend and enemy.

This little post contains snippets adapted from my "Macro of the Day" series. A way to show off neat C preprocessor tricks which are both useful, and completely fucking stupid. You can decide for yourself which ones are useful, and the ones that are not.

Also check-out Leandro Pereira's blog for more stuff like this. My new favorite "tricks" blog.

While small and simplistic, it has a huge impact for many reasons

#define lz_safe_free(_var, _freefn) do { \
        _freefn((_var));                 \
        (_var) = NULL;                   \
}  while (0)

What does it do?

It free's the variable _var using the second argument _freefn.

_var is then set to NULL. You can find potentially silly memory corruption bugs by crashing on a dereference instead of waiting around for your allocator to realize something isn't right here which can take a while.

Questions asked

@g33kgmr why the do {} while(0) was needed

A few reasons:

  • C allows you to optionally omit brackets as long as the it contains only one conditional
if (var == 1)
   do_something();
else()
   do_other();

While I don't condone such things, you see it everywhere. Do the do {} while(0) is to avoid this type of issue. This is common practice and does not affect performance.

  • Another reason would be something as simple as localization of variable names and potential collisions.

Make your preprocessor do the work for you.

If you have spent the time to learn proper design, it is a good practice to keep your structures hidden, which can be accessed or modified via assessor/set type functions.

This is a common mistake even experienced programmers still make; once you have given a user the ability to access all the little detailed parts of your structures and design in a public manner, they WILL use them. But what happens when you want to change something around; names wise, include order, or whatever. Your users may suddenly find themselves in a bit of trouble. So doing it right the first time around is better than begging for forgiveness 2 years down the line. Something I learned very quickly with libevhtp.

But if you let your preprocessor generate all this code that is just boring to write, rename, and avoid all those stupid ordering of headers with a simple macro.

In the example I gave in the tweet, I had a C file, and a header file, I will shorten it up a bit here, but you can see the longer version still.

#include <stdio.h>
#include <assert.h>

struct hidden_data {
        char   * stuff;
        bool     things;
        uint8_t  foo;
        uint16_t bar;
        uint32_t baz;
};

#define _GEN_ACCESSOR(valname, valtype)                       \
        inline valtype                                        \
        hidden_data_get_ ## valname(struct hidden_data * h) { \
                assert(h != NULL);                            \
                return h->valname;                            \
        }

_GEN_ACCESSOR(stuff, const char *);
_GEN_ACCESSOR(things, bool);
_GEN_ACCESSOR(foo, uint8_t);
_GEN_ACCESSOR(bar, uint16_t);
_GEN_ACCESSOR(baz, uin532_t);

Let's take the _GEN_ACCESSOR(stuff, const char *); and look at how it will expand:

inline const char *
hidden_data_get_stuff(struct hidden_data * h) {
        assert(h != NULL);
        return h->stuff;
}

This is a very basic example, you can of course create much more complex code generators.

But no matter what, your header file with the function definitions will always be trivial:

#ifndef __BLAH_H__
#define __BLAH_H__

struct hidden_data; /* note we do not expose the data within */

#define GEN_ACCESSOR(vname, vtype) \
        vtype hidden_data_get_ ## vname(struct hidden_data *)

GEN_ACCESSOR(stuff, const char *);
GEN_ACCESSOR(things, bool);
GEN_ACCESSOR(foo, uint8_t);
GEN_ACCESSOR(bar, uint16_t);
GEN_ACCESSOR(baz, uin532_t);

#endif

If the return value does not need a cast, we can optimize the macro even more!

#define _GEN_ACCESSOR(valname)                                         \
        extern inline const typeof(((struct hidden_data *)0)->valname) \
        hidden_data_get_ ## valname(struct hidden_data * h) {          \
                assert(h != NULL);                                     \
                return h->valname;                                     \
        }

_GEN_ACCESSOR(stuff)
_GEN_ACCESSOR(foo)
_GEN_ACCESSOR(bar)
_GEN_ACCESSOR(baz)

To see this type of method applied in the most extreme of manners, check out the <sys/tree.h> or <sys/queue.h> header files. These are full on hardcore code generators for various list and tree algorithms.

I think I saw @nmathewson try to do a hashing mechanism using pure preprocessor generation, I may have secretly bonked him on the head.

TL;DNR: no matter who you are, and I've heard about the "overhead" of such abstractions, but I've never seen any noticeable issues, even in the most extreme of cases. Chase down your shitty string, and memory twiddling functions. It's not things like this, I assure you.

Comparison Macro for Kernel Versions.

So many distributions, so little time. Many fail to update the #include <linux/version.h> (actually it seems to always be out of date).

So I'm always having to rely on this macro to make sure (or work around) my code will run on a particular version of linux.

#define KVER(major, minor, patch) \
        (((major) << 16 + ((minor) << 8) + (patch))

Kicking puppies.

Ok, I know, this is extreme abuse, like kicking puppies. But it's good to know, and comes in useful when in certain situations I will explain.

Everyone, at some point, has wanted to have their own version of some system call, but had to do all kinds of stupid things like ld_preloads and garbage like that (even as far as writing kmods, this is bad juju kids).

So here is a useful hack to rid yourself of such silliness

#define OVERLOAD_SYMBOL(x, y) \
        typeof(x)(x)__asm__(y)

You can use this to over(write|load) commonly used syscalls with your own. For example:

#define OVERLOAD_SYMBOL(x, y) \
        typeof(x)(x)__asm__(y)

static char   initialized  = 0;
statoc void * sym_handle  = NULL;
static int (* openfn)(const char *, int, ...);

void
init(void) {
        if (sym_handle) {
                return;
        }

        sym_handle = dlopen("libc.so.6", RTLD_LAZY);
        openfn     = dlsym(sym_handle, "open");

        initialized = 1;
}

inline int
my_open(const char * pathname, int flags, ...) {
        va_list ap;
        mode_t  mode;

        va_start(ap, flags);
        {
                mode = va_arg(ap, mode_t);
        }
        va_end(ap);

        /* do your own stuff here */
        return (openfn)(pathname, flags, mode);
}

OVERLOAD_SYMBOL(my_open, "open");

Now any library which is linked to this: open is actually my_open without some of the whack steps people go through for the same type of functionality.

Do you use vtables? If so, use this macro for better abstraction!

#define LZ_this(_PARENT) \
        typeof(THIS_OBJ) * this = (typeof(THIS_OBJ) *)_PARENT

What does it do?

Look, this is very vtable specific, if you don't use them, then go away.

The _PARENT part is the C vtable's parent structure, and THIS_OBJ is using typeof to determine the underlying object from the parent.

The idea is for each underlying data associated with a vtable can be easily accessed by doing the following:

typedef struct  {
        struct parent _; /* this is our parent structure data */
        int           foo;
        int           bar;
        char        * baz;
} my_object;

static my_object THIS_OBJ;

Then when the parent calls a vtable function into this, the macro becomes neat.

static int
my_vtbl_function_sum(struct parent * parent, int a, int b) {
        LZ_this(parent);

        this->foo = a;
        this->bar = b;
        this->baz = "woop";

        return a + b;
}

The reality, this is just a wrapper around something like this

static int
my_vtbl_function_sum(struct parent * parent, int a, int b) {
        my_object * this = (my_object *)parent;
}

It's just cleaner to use the macro method; and who says C can't be OO?

A smug preprocessor trainwreck.

I recently came across a preprocessor hack which resulted in an Elvis sneer, headcock, squinted eyes, and a raised eyebrow. Sure, I've shown off some nifty things here. Some of which can be contrived as something they would never use in real code. But there was something about this one that made me want to slam my head into a brick wall to forget what I had just seen.

This is actually standard practice for temporary variable names.

But if you're constantly using it, what a waste of precious stack space.

The following is not the actual code, but it's the basic constructs of what the original developer wrote.

#define UNIQ2(X, Y) X ## Y
#define UNIQ1(X, Y) UNIQ2(X, Y)
#define UNIQ(X)     UNIQ1(X, __LINE__)
#define U UNIQ

#define FOR_EACH(max)                        \
    for (int U(i) = 0; U(i) < max; U(i)++)   \
        printf("%d\n", U(i))

int
main(int argc, char ** argv) {
    FOR_EACH(5);
    return 0;
}

So what does this look like running it through gcc -E?

int
main(int argc, char ** argv) {
    for (int i12 = 0; i12 < 5; i12++) printf("%d\n", i12);
    return 0;
}

The first thing you've probably noticed is that it relies on c99. OK, I'm fine with that. We all have our own opinions, but to me, having your declarations haphazardly placed wherever they happen to land is ugly as fuck. So let's just skip past that part before I rage out.

The next smug trickery here is the use of declaring "random" variable names by using the standard predefined __LINE__ macro. Let's go through this mess.

  1. Assume our call to U sits on line 12, using the original code (int U(i) = 0;)
  2. U is simply an alias to UNIQ
  3. UNIQ takes a single argument, in the case of this example it is a variable declaration i
  4. UNIQ(i) aliases to UNIQ1(i, __LINE__), which in turn is an alias to UNIQ2(i, __LINE__)
  5. UNIQ2(i, __LINE__) finally merges the two things together like so i ## __LINE__

If all goes well, we see the following:

int i12 = 0;

This is NOT the oddest way to achieve "temporary variables" I've ever seen, thank god.

Let's head for the low hanging fruit here. U is just garbage code. The only usecase is to confuse fellow programmers more than they already are. This isn't amateur hour, toss it aside.

#define UNIQ2(X, Y) X ## Y
#define UNIQ1(X, Y) UNIQ2(X, Y)
#define UNIQ(X)     UNIQ1(X, __LINE__)

#define FOR_EACH(max)                        \
    for (int UNIQ(i) = 0; UNIQ(i) < max; UNIQ(i)++)   \
        printf("%d\n", UNIQ(i))

The output is the same, aside from the variable now being declared as i11:

int
main(int argc, char ** argv) {
    for (int i11 = 0; i11 < 5; i11++) printf("%d\n", i11);
    return 0;
}

This is fine, all in all it's short and sweet with some extra added stack waste:

#include <stdio.h>
#include <stdlib.h>

#define UNIQ2(X, Y) X ## Y
#define UNIQ1(X, Y) UNIQ2(X, Y)
#define UNIQ(X)     UNIQ1(X, __LINE__)

#define FOR_EACH(max)                           \
    int UNIQ(i) = 1;                            \
    for (UNIQ(i) = 0; UNIQ(i) < max; UNIQ(i)++) \
        printf("%d\n", UNIQ(i))

int
main(int argc, char ** argv) {
    for (int UNIQ(i) = 0; UNIQ(i) < atoi(argv[1]); UNIQ(i)++) {
        FOR_EACH(5);
    }

    return 0;
}

Still not a fan of the c99 anarchy, but I guess it works fine as long as you keep these little hacks small. As a small note, have we still not learned that wrapping macros in a do { ... } while(0) is good? Are we to assume all developers are the same? If you disagree, immediately solve everything here: https://github.com/ellzey/c_code_puzzles/


Why not

int
main(int argc, char ** argv) {
    int i;
    int y;

    for (i = 0; i < atoi(argv[1]); i++) {
        for (y = 0; y < 5; y++) {
            printf("%d\n", y);
        }
    }

    return 0;
}

Or mabye even using statement expressions if we want to get motherfucking fancy about this shit.

#include <stdio.h>
#include <stdlib.h>

#define LOOP(X, Y) (                     \
        {                                \
            int _y;                      \
            for (_y = X; _y < Y; _y++) { \
                ;                        \
            }                            \
            _y;                          \
        })

int
main(int argc, char ** argv) {
    int i;
    int z;

    for (i = 0; i < atoi(argv[1]); i++) {
        z = LOOP(0, 5);
    }

    return 0;
}

I guess my whole point about this crap is: stop trying to be slick. It's all chest thumping at this point. Clear, and readable code will always survive the evolution of your product, and the revolving doors of developers that follow.

Use these nifty macros ONLY WHEN IT MAKES SENSE, don't give into this sort of tripe just to show off your macro prowess.