Immutable Lists and Function Pointers in C

30 Apr, 2011

Lists, lists, lists…

I have written a lot of Clojure code recently. With Clojure inheriting from Lisp, and List standing for List Processor, I have thought quite a lot about lists recently.

I have also written a huge amount of C code recently. I took a piece of code that I wrote five years ago, noted that I would need to do a lot of work to add the new feature, so I decided to overhaul it and rewrite it The Correct Way. In this case, that means using MySQL’s prepared statement API instead of string hackery. Trust me when I say that string hackery sucks, especially since I know about SQL injection and had to protect against that, too.

I found that I had a lot of isomorphisms in the code, so like the abstraction astronaut that I aspire to become, I started templating the code. No, not the horrific C++ templates, but something more Lispy: creating functions that I could pass around easily. On and on I went, and then I started struggling with parameters to all of these various functions that I needed in order to build up and populate my queries.

“Lists!”, I said!

“No!”, said C.

C said it loudly, so I ultimately used a list data structure that a co-worker wrote for the internal library we use. It weighs a lot and does not allow for certain forms of elegance that I wanted.

So, independently, I played with the concept quite a lot and ultimately wrote my own list library.

Ah. Yes. Another one. And here I’d sworn to never do it again.

List Representations

I have a list of data. I need to store it, pass it around, maybe even modify it. I have some options.

First… a list object. I have implemented several of them over the years, starting all the way back in my data structures classes. Singly-linked lists, doubly-linked lists, circular lists, and on and on. Then later I even did some optimizations and I took up the Standard Template Library, because using it was less painful than implementing Yet Another List.

I want a list object when I plan to work with a lot of data, and I really want the object if the list itself will change a lot. But, list objects fall outside of this because they end up so thoroughly customized and optimized. Further, they do not lend themselves well to language-level idiom.n C The rest of this article will explain what I mean by that by showing what I have tried to accomplish.

Given that, I can enumerate three different common list representations:

  • counted lists
  • terminated lists
  • vararg lists

You know about these. You have a terminated list every time you declare a string:

const char my_str[] = "this list has a zero character terminating it"

However, your list could look more like this:

const char *my_strings[5];
my_strings[0] = "String 0";
my_strings[1] = "String 1";
my_strings[2] = "String 2";
my_strings[3] = NULL;

In this case, the list has a NULL pointer at the end of it. If you know this, you can iterate over the list like so:

ssize_t i;
for (i = 0; my_strings[i] != NULL; i++)
    printf("[%d] %s\n", i, my_strings[i];

But, what if you want to embed a NULL in the middle of the list?

const char *name[3];
name[0] = "Savanni";
name[1] = NULL;
name[2] = "D'Gerinel";

At this point, you have to count the string. You know that the string contains three values and that some of them may contain NULL. With that in mind, your downstream code will need to understand that it may encounter NULL pointers, but you will also need to explicitely tell it how many values to look at.

You will also encounter this if you deal with binary data. Technically, all data in a computer is binary, but in this case, I mean a string of data in which each cell could contain any value, including 0.

size_t img_size = 1024;
const unsigned char *img = <1024 bytes of an image>

Now for the really big huge catch.

For one thing, I have never seen any functions for list manipulation, except sort(), in glibc.

For another, functions written for one list representation will not work for another.

Let’s table this for a short time and explore a use case near and dear to my heart.

Anonymous Lists

In Clojure, nobody bats an eyelash if I do this (assuming print-names has already been defined):

(print-names "Savanni" "D'Gerinel")

In C, I have a few options:

print_name("Savanni", "D'Gerinel");
print_name((char *[]){"Savanni", "D'Gerinel"});

char * lst[] = {"Savanni", "D'Gerinel"};

Assuming I got the code right, print_name((char *[])…) and print_name(lst) are the same. This shows an example of declaring a list inline, anonymously. I can use this trick in other cases, too, such as declaring a structure. For instance:

struct timespec rem;
nanosleep(&(const struct timespec){5, 0},

In this case, I anonymously declared the timespec that nanosleep needs. Sometimes declaring a data structure that exists solely to become a parameter to a function is a total pain, so being able to do this anonymously adds elegance to my code. Nobody ever taught me this, nor have I seen it in example code or tutorials anywhere. I just discovered it one day digging through some of the code at my office.

DANGER: From here on out we leave behind the realms of Type Safety. We begin to play firmly in the realm of void *, and static type-checking becomes a dream of the past. You have been warned.

Returning to print_name, though, I have a problem. I did not create a terminate list, nor did I actually tell print_name how many elements the list contains. This creates disasters in C, so I have to do one or the other. Since I want the most generalized code I can possibly create, I have opted for a counted list:

print_name(2, (int []){"Savanni", "D'Gerinel"});

The other way I want to create such a list would be like so:

print_name(2, "Savanni", "D'Gerinel");

The point is to be able to call print_name() with as many different varieties of lists as possible. The problem, though, is that the two examples here require separate implementations, one accepting a counted list, and one accepting C’s vararg system.

void print_name_cnt (ssize_t cnt, int *lst);
void print_name_var (ssize_t cnt, ...);

While this would be just fine for one function, I was doing stuff like this for many different functions, and it all became very unweildy very quickly.

Enter my solution

The solution lay not in making the functions flexible, but in flexibly constructing the lists. This is something object oriented programmers probably knew about. I needed a single coherent list implementation and functions for building that list. Along with such, I needed a library of list operations, because what is the point of having such a list abstraction unless I have functions to go along with it.

Enter, my Immutable Lists API.

Through some trial and error, I developed the basic implementation:

typedef struct im_list_s {
  size_t cnt;
  const void *data[0];
} im_list_t;

In my implementation, I assume that the size of the list Will Not Change. The “im_” in “im_list_s” stands for “immutable”. Further, the entire list gets allocated with a single malloc() operation, so I can use free() to deallocate it no matter which list operation I use to call it.

Note: I do not provide ironclad guarantees of immutability. The length of the original list cannot change, but you could always hack cnt. You could change the pointers that the list points to. Be sure you understand what you are doing in that case, because I cannot protect you.

Now, the basic construction functions:

im_list_t * im_list_create_vararg (ssize_t cnt, va_list vap);
im_list_t * im_list_create_cnt (ssize_t cnt, ...);
im_list_t * im_list_create_term (void *term, ...);

Given these three functions, I can build a list in three different ways:

im_list_create_cnt(2, "Savanni", "D'Gerinel");
im_list_create_vararg (2, vap); // in this case, vap was built by some other function accepting a vararg
im_list_create_term (NULL, "Savanni", "D'Gerinel", NULL);

Given these three, I need only a single implementation for print_name:

void print_name (im_list_t *lst);

print_name(im_list_create_cnt(2, "Savanni", "D'Gerinel"));

There’s Always A Catch

This is all well and good, but what about Memory Management?

These lists that I am creating cannot be made on the stack. I have no choice but to allocate some heap space for them because I never know, until the list gets created at run time, how much space the list will need. As always, I must deallocate any allocated heap space. If I delegate that to the calling function, though, I lose all of the elegance of inline construction. I may as well go with a heavier solution.

im_list_t *lst = im_list_create_cnt(2, "Savanni", "D'Gerinel");

No good.

So, in my library, I decided that lists would be allocated, but managed in static variables.

im_list_create_cnt keeps a static variable internally, builds the list into that static, and returns it. If that variable already has a value, it will deallocate the old value first. This means that you cannot be totally haphazard in when you create lists – you must be done using one list before you can create another – but it does allow you the flexibility of creating a list anonymously as I wanted to do without leaking memory.

However, sometimes I do want to keep that list, so I created im_list_dup to duplicate any list. The original list remains under the control of im_list_create_cnt, but I now have a new copy that I can safely use as I desire.

im_list_t *lst = im_list_dup(im_list_create_cnt(2, "Savanni", "D'Gerinel"));
print_name(im_list_create(2, "Jim", "Thorpe"));


I am not satisfied with such a small number of operations. These operations form the basis for a list, and they certainly provide the most interesting part of my list solution, but they are not enough to be fully useful. In my code, I have already implemented a small set of additional helpful operations, and I will add more as I use the list in real projects. This article, however, describes how we can actually accomplish a little elegance in C, extending the language without going all the way to macro hackery.

Be sure to examine all of the code in more detail. I have already included an example to show my use cases and full documentation of the code proper.

Have fun, and tell me about other awesomely elegant hacks! C could really use elegant dictionary and set data structures, giving it a nice rounding of the structures that make Clojure so powerful.

Immutable Lists and Function Parameters in C by Savanni D'Gerinel is licensed under a Creative Commons Attribution-NonCommercial-SharAlike 3.0 Unported License. You can link to it, copy it, redistribute it, and modify it, but don't sell it or the modifications and don't take my name from it.

Dreamer, Shaper, Seeker, Maker