Martin Uecker, 2025-08-10
I discuss the implementation of type and bounds safe generic containers in C.
Previously, I discussed a span type,
bounds checking using arrays. and a
a vector type.
This time, I will discuss maybe
inspired by Haskell. This type
can used to return a value that may not exist, e.g. because an error was
encountered during the computation. The following examples shows for
a divide
function that catches division by zero.
static maybe(int) divide(int a, int b)
{
return (b != 0) ? maybe_just(int, a / b) : maybe_nothing(int);
}
But careful, there is another error case not checked here! Which is it?
As usual, we can define it simply as a macro that expands into a structure,
and define simple type constructors.
#define maybe(T) struct maybe_##T { bool ok; T value; }
#define maybe_just(T, x) (maybe(T)){ .value = (x), .ok = true }
#define maybe_nothing(T) (maybe(T)){ .value = (T){ }, .ok = false }
In the caller, we can then check whether the value exists or not.
int main()
{
int d = 2; // 0
maybe(int) p = divide(6, d);
if (p.ok) {
printf("%d\n", p.value);
} else {
printf("division by zero\n");
fflush(stdout);
}
return 0;
}
Can we make this safer to use? In principle, we like to get some
error if we try to use the value although it does not exist. For this,
we add a macro maybe_value
which includes a check.
#define maybe_value(T, x) (*({ maybe(T) *_p = &(x); _p->ok ? &_p->value : (void*)0; }))
Here, instead of handling the error condition, I create an lvalue that
points nowhere in case of an error because it then corresponds to
(*({ (void*)0; }))
, relying on the null sanitizer to transform
it into a run-time trap for safety.
maybe(int) p = divide(6, d);
if (p.ok) {
printf("%d\n", maybe_value(p));
}
You can find the full example here: Godbolt
But as mentioned above, there is another case where integer division has undefined
behavior in C. If we divide the smallest representable integer by minus one, then the
result is one larger than the biggest representable integer. Let's also add a test
for this!
maybe(int) unsafe_divide(int a, int b)
{
if (b == -1 && a == INT_MAX)
return maybe_nothing(int);
return (b != 0) ? maybe_just(int, a / b) : maybe_nothing(int);
}
So we created a safe function for integer division. But can we be sure it
is safe? Maybe we made mistake. Now, there are tools and a complete industry
that may be able to help with this, but instead let's first simply look at the
assembly generated by GCC when
using the signed overflow sanitizer in trapping mode with
-O2 -fsanitize=signed-integer-overflow,integer-divide-by-zero -fsanitize-trap=undefined
.
unsafe_divide:
cmp esi, -1
sete dl
cmp edi, 2147483647
jne .L2
test dl, dl
je .L2
.L4:
xor eax, eax
ret
.L2:
test esi, esi
je .L4
cmp edi, -2147483648
je .L20
mov eax, edi
cdq
idiv esi
sal rax, 32
or rax, 1
ret
.L20:
test dl, dl
jne .L18
mov eax, edi
cdq
idiv esi
sal rax, 32
or rax, 1
ret
safe_divide.cold:
.L18:
ud2
This is strange, there is still a code path that ends in a trap in the form of
the ud2
instruction. So either the optimizer was not able to see
that this is not possible or our check was incorrect. In fact, I got it wrong and
we have to check against INT_MIN
and not INT_MAX
. Here
is the corrected and nicer version.
maybe(int) safe_divide(int a, int b)
{
if (b == 0 || (b == -1 && a == INT_MIN))
return maybe_nothing(int);
return maybe_just(int, a / b);
}
The assembly looks different now
and does not contain a code path leading to an ud2
anymore.
safe_divide:
test esi, esi
je .L2
cmp esi, -1
jne .L3
cmp edi, -2147483648
je .L2
.L3:
mov eax, edi
cdq
idiv esi
sal rax, 32
or rax, 1
ret
.L2:
xor eax, eax
ret
The optimizer has proven that there is no overflow or division by zero left in our function!
Does this also work for our complete example
using maybe
? It does! The optimizer has statically shown that there is no overflow
and that all error cases are handled. Isn't this cool! One could now turn off the sanitizer
and still be sure that there is no overflow possible, as it was statically proven.
Of course, this can not be used to show that C programs are completely memory safe, as
there areas which are not covered by the sanitizers, and there are also sanitizers that do
not catch all undefined behavior in their respective domain. In particular, lifetime issues
and pointer arithmetic are not covered. Thus, one has to stress that this approach is
very limited when trying to prove safety properties of legacy C programs in this way.
Still, if you use VLAs and variably modified types instead of pointer arithmetic as discussed
previously, you can even have your bounds checked;
see for yourself example!
If you want, you can check out my experimental library where I am experimenting
with these ideas: link. If you
have ideas on how to do this better, let me know!