Writing Bounds-Safe Code in C with Arrays

Martin Uecker, 2025-07-09


One challenging when writing C code is that it is easy to make mistakes when doing pointer arithmetic. There are many scenarious where it can be made safe, but this is not the topic of this post. Anyway, there is really not much reason to use pointer arithmetic in new code. Instead, one can use array types. Arrays are types that know their length, i.e. they are dependent on an integer that specifies the number of elements. Instead of pointer arithmetic, we can use safe operations on arrays and then let the compiler check the access.


	int arr[3] = { 0, 1, 2 }; 

	arr[4] = 3;  // bug!
	

Compilers can detect this problem at translation time.


	example.c:9:16: warning: array subscript 4 is above array bounds of 'int[3]' [-Warray-bounds=]
	

But this works only for static arrays. What about arrays of run-time length? For those, one can not generally detect this at translation time. Still, because the type depends on the length, it can be detected at run time. With -fsanitize=bounds one gets the expected error.


	int n = 10;
	int arr[n];

	arr[10] = 3;  // bug!
	
Example

	example.c:10:5: runtime error: index 10 out of bounds for type 'int [*]'
	

We also need to be able to pass pointers to arrays of run-time length around. But in C this also works just fine (Godbolt Example). This is one of the coolest features of C!


	void foo(int n, int (*p)[n])
	{
	  (*p)[10] = 1;
	}
	

This essentially gives us a slice type. It is slightly inconvenient as the size needs to be passed separately. A much bigger problem is that we can not store such pointers in structures or unions, or return them from function. These limitation could (and should) be fixed on the language level, but for now I suggest to look at the span type I discussed last week, which addresses these issues and seamlessly interoperates with arrays as one can get access to the area the span points to as an array.

The array_slice macro

A key operation is to select a smaller array from a larger one. Later operations should then be restricted to the subarray. I will explain how this can be done safely in C. Consider the following example.


	char str[] = "hallo";

	auto slice = &array_slice(str, 1, 1 + 3);
	(*slice)[0] = 'A';
	(*slice)[1] = 'L';
	(*slice)[3] = 'L';	// bug!
	

The code has an out-of-bounds violation because the slice only has three elements and we expect a bounds checker to prevent this. As part of the undefined behavior sanitizer we can use -fsanitize=bounds and then one gets the expected error. See for yourself: Godbolt: Example.


	example.c:16:13: runtime error: index 3 out of bounds for type 'char [*]'
	

One should note that by default the undefined behavior sanitizer only diagnoses the error but does not terminate the program. This can be controlled via command-line flags. In production, you may also want to turn this directly into a trap to avoid the overhead for the code that prints the diagnostic.
How does this work? The idea behind the array_slice macro is simply to cast a pointer to the first element of the subarray to an array type with the length of the slice.


	#define array_slice(x, start, end)				\
	(*({								\
		auto _y = &(x);						\
	   	size_t _start = (start);				\
		size_t _end = (end);					\
		CHECK(_start ⟩= 0);					\
		CHECK(_end ⟩= _start);					\
		CHECK(_end ⟨= _Countof *_y));				\
		(typeof((*_y)[0])(*)[__end - __start])&(*__y)[__start];	\
	}))
	

Remaining Issues

Overall, there are still some remaining issues. First, variably-modified types - though extremely useful - are an underdeveloped feature of the C language. Obvious deficiencies are that one can not store them in structures or unions and can not return them functions. For the former problem, one suggestion is to introduce syntax for a dependent structure ype.


	struct foo {
	  int N;
	  char (*p)[.N];
	};
	

For returning from functions, it has been pointed out that, theoretically, this should already work when exploiting the peculiarity of C's declarator syntax.


	char (*foo(int N))[N];
	
Existing compilers do not support this and it was clarified that the second N is not in the scope of the function prototype. Alternatively, one could consider introducing C++'s trailing return types.

	auto foo(int N) -> (char(*)[N]);
	

This works when the length depends on an argument. If the length should not depend on an argument of the function, one should return a structure instead.
Another problem is that assignments between such pointer types are not checked by the undefined behavior sanitizer for consistency. For example, the error in the following example is not detected.


	void foo(int n, char (*q)[n])
	{
 	  (*q)[10] = 1;
	}

	void bar()
	{
	  char buf[10];
	  foo(11, &buf);
	}
	

This a major hole which still needs to be fixed and I have patch for GCC that would add the necessariy checks.

If you want, you can check out my experimental library where I am experimenting with these ideas: link. If you have ideas on how to do this better, let me know!

References