Martin Uecker, 2025-07-20
I discuss the implementation of type and bounds safe generic containers in C.
Previously, I discussed a span type,
and bounds checking using arrays.
Here, I will discuss a vector type. A vector type is essentially
a resizable array. A vector type could be used as in the following
example.
int main()
{
vec(int) *vec_ptr = calloc(1, sizeof *vec_ptr);
if (!vec_ptr) // memory out
abort();
for (int i = 0; i < 10; i++)
vec_push(int, &vec_ptr, i);
int sum = sum_vec(vec_ptr);
free(vec_ptr);
printf("sum: %d\n", sum);
}
In C, it is actually fairly easy to define a vector type.
#define vec(T) struct vec_##T { ssize_t N; T data[/* .N */]; }
It is very similar to the span type I discussed two weeks ago and you may want to read this post first. In comparison to the span type, the last element is not a pointer to an array but a so-called flexible array member, which extends the structure by an array of unknown length.
The key feature is that we can append elements to a vector, which
we can implement using realloc
.
#define vec_push(T, v, x) \
({ \
vec(T) **_vp = (v); \
ssize_t _N = (*_vp)->N + 1; \
ssize_t _S = _N * (ssize_t)sizeof((*_vp)->data[0]) \
+ (ssize_t)sizeof(vec(T)); \
if (!(*_vp = realloc(*_vp, _S))) abort(); \
(*_vp)->N++; \
(*_vp)->data[_N - 1] = (x); \
})
I simply terminate the program using abort
in case memory
can not be allocated. Of course, this may not be what you want. If
you need to handle errors (e.g. in embedded or systems programming),
you need to adapt the API to return an error. You might also
notice that I use ssize_t
instead of size_t
.
the idea is to use the signed overflow sanitizer to catch overflow. If
overflow must be handled explicitly, C23's checked integers could be used.
Many vector types include a capacity field, so that resizing on every
push can be avoided. I do not include one, because simplicity is more
important to me and realloc
often does this already internally.
In most scenarios, the performance is already good enough.
Still, there are cases where it is really important to get optimal performance.
For this, I experiment with two other interfaces. The first uses an external integer
for tracking the capacity. One can use this interface in a performance critical
routine to avoid repeated allocation. At the end, a single call to vec_fit
can be used to release the memory that is not needed.
T vec_push_cap(_Type T, ssize_t *cap_ptr, vec(T) *v, T x);
T vec_pop_cap(_Type T, ssize_t *cap_ptr, vec(T) *v);
vec(T) *vec_fit(_Type T, vec(T) *v);
On top of this interface I have another version that always assumes that the available capacity is the current size rounded up to the next power of two (at least below a certain limit).
T vec_push_auto(_Type T, vec(T) *v, T x);
T vec_pop_auto(_Type T, vec(T) *v);
I am still experimenting with these interfaces, but it seems that usability and performace can be quite good. One open question is for the automatic version is how to implement a hysteresis. If I find some time I will do some benchmarking and post the results.
Safety assumes that the members are not accessed directly but by using
the API. Exactly as for the span type, the type of the data
array
should be T[.N]
so that is has a known length.
Until we get something like this in C, we can use the same
workaround and define a macro that adds the length back to the
array type.
#define vec2array(T, v) (*({ vec(T) *_v = (v); (T(*)[_v->N])_v->data; }))
Using such a macro, this vector type nicely interoperates with conventional
C arrays. In the following example, we use the array_slice
macro from my previous post. I would
usually write a loop, but I wanted to show how one can continue to do
bounds safe processing with the array that represents the vector.
Example:
static int array_sum(int N, int (*arr_ptr)[N])
{
return N ? (*arr_ptr)[0] + array_sum(N - 1, &array_slice(*arr_ptr, 1, N)) : 0;
}
int main()
{
vec(int) *vec_ptr = calloc(1, sizeof *vec_ptr);
for (int i = 0; i < 10; i++)
vec_push(int, &vec_ptr, i);
// int (*arr_ptr)[*] = ...
auto arr_ptr = &vec2array(int, vec_ptr);
int sum = array_sum(_Countof *arr_ptr, atr_ptr);
free(vec_ptr);
printf("sum: %d\n", sum);
}
In practice, there is still one bounds safety issue in the code, because the
undefined behavior sanitizer does not check that the first argument in the call
to array_sum
matches the second argument. I hope we can fix this
omission at some point.
If you want, you can check out my experimental library where I am experimenting
with these ideas: link. If you
have ideas on how to do this better, let me know!