Using Address Sanitizer for a C or C++ Library Linked to a Rust Executable

Posted 2024-07-12

When using Rust, it’s easy to forget about segfaults, buffer overflows, and related memory safety bugs. Recently, I had a problem when I linked a C++ library to my Rust executable and said library was producing segfaults (among other things). The way I decided to tackle the problem, was to use Address Sanitizer (ASan). I’ll summarize the procedure here in case anyone else runs into the same problem and they might find this useful.

Scenario

Our scenario is this: we have a C library whose functions we invoke from a Rust executable via Rust’s FFI. This example will work just as well if the library is written in C++ and exposes its functions via extern "C" linkage.

Let’s assume that we have the source code of the C or C++ library and we are able to rebuild it. Here, we’ll use CMake for building, but the specific build system is not important, as long as we can pass the correct compiler and linker flags. Other options are available, this workflow has just become a preference for me personally. Finally, I also assume that we are running on x86 Linux, since I am not sure what the status of address sanitizer is on Windows and I am too lazy to find out.

The C Library Code

The code in this article contains everything you need to run this example. Thus, we’ll use some silly code for both the C and the Rust part, just enough to trigger problems that we can examine with ASan.

// lib.c
#include <stdint.h>
#include <stdlib.h>

// "private", only used internally
static uint32_t access_internal(uint32_t *pointer,
                                size_t index);

// the "public" function
uint32_t allocate_and_access(size_t const size,
                             size_t const index) {
  uint32_t* data = calloc(size, sizeof(uint32_t));
  uint32_t const value = access_internal(data, index);
  free(data);
  return value;
}

static uint32_t access_internal(uint32_t *const pointer,
                                size_t const index) {
  return pointer[index];
}

When compiled as a shared library, it will export the function allocate_and_access, which allocates size number of u32 integers on the heap and returns the value at index index. If the index is within the range [0,...,size-1], the function will return the value 0, because calloc initializes the memory to zero. If index is outside of the range, that’s an out of bounds memory access, which produces undefined behavior. I have used an internal “private”1 function that performs the actual access, just so that Address Sanitizer has more of a stack trace to report.

Before we see how to build the library with Address Sanitizer so that it plays nicely with a Rust executale, let’s take a look at the overall project structure and the Rust code.

The Rust Project

Our simple project is structured as follows:

.
|- build.rs
|- Cargo.toml
|- myclib
|   |- CMakeLists.txt
|   |- lib.c
|- src
    |- main.rs

It’s just a bog standard Rust executable project that contains our C library in a subdirectory. Our Rust executable calls the C function via the foreign function interface and prints the result.

// main.rs
extern "C" {
    fn allocate_and_access(size: usize, index: usize) -> u32;
}

fn main() {
    println!("Calling C function...");
    let retval = unsafe { allocate_and_access(10, 11) };
    println!("...returned: {}", retval)
}

We can see that the program will perform an out of bounds access when run like that. In this case, it will likely print some random nonsense2. For completeness, let’s see what the build.rs file looks like, before we get into how to debug the program with ASan.

//build.rs
fn main() {
    let dst = cmake::build("myclib");
    println!("cargo:rustc-link-search=native={}/lib", dst.display());
    println!("cargo:rustc-link-lib=dylib=myclib");
}

This has a build-dependency on the cmake crate, just because that is how I like to build my C and C++ projects. Let’s now look at the CMake file for the C project, because that is where we pass the necessary compiler and linker flags to build our library with Address Sanitizer enabled.

Building Our C Project With (Static) Address Sanitizer

Enabling Address Sanitizer for our library is as easy as passing the correct compiler and linker flags. However, if we just pass -fsanitize=address, we will run into this error when executing our program:

==82818==Your application is linked against incompatible ASan runtimes.

That’s because, at the time of writing, rustc only bundles the static version of the Address Sanitizer runtime. Thus, to use our C library with our Rust executable, we have to make sure that we also link the library with the static version of the Address Sanitizer runtime. Here is how that looks in our simple CMakeLists.txt file:

# CMakeLists.txt
cmake_minimum_required(VERSION 3.10)
project(MyCProject C)
add_library(myclib SHARED lib.c)

# the flags to use ASan statically
target_compile_options(myclib PUBLIC 
 -fsanitize=address)
target_link_options(myclib PUBLIC
 -fsanitize=address 
 -static-libasan)

# needed for the cmake crate
install(TARGETS myclib
        LIBRARY DESTINATION lib
        ARCHIVE DESTINATION lib)

We have now made sure that the static runtime of ASan gets linked to our library.

Running the Executable With Address Sanitizer

At the time of writing, we need the nightly rust compiler to use Address Sanitizer in our programs3. So we have to make sure that our project uses the nightly toolchain, if it’s not already doing that:

$ rustup override set nightly

This will make sure that the given project, and only that, uses the nightly compiler, which allows us to pass the -Z family of flags. To run our program, we essentially just cargo run it with some extra compiler flags:

$ RUSTFLAGS="-Z sanitizer=address" cargo run --target x86_64-unknown-linux-gnu

Note that we also have to specify the target explicitly4.

Analyzing the Address Sanitizer Output

We’ll now have a quick look at how to analyze the output of Address Sanitizer, without going into great detail, because there are tons of tutorials on the web. But let’s at least see how we can make use of the output. When we run our program we get an output such as this:

Calling C function...
=================================================================
==10654==ERROR: AddressSanitizer: heap-buffer-overflow on address [...]
READ of size 4 at 0x50400000003c thread T0
    #0 0x78d074480247  (./target/debug/build/[...]/out/lib/libmyclib.so+0x1247) [...]
    #1 0x78d0744801d4  (./target/debug/build/[...]/out/lib/libmyclib.so+0x11d4) [...]
    #2 0x63b647f83a64  (./target/debug/asan-rust+0xeea64) (BuildId: c547f75d1a8b7697)
    #3 0x63b647f83e1a  (./target/debug/asan-rust+0xeee1a) (BuildId: c547f75d1a8b7697)
    #4 0x63b647f83c3d  (./target/debug/asan-rust+0xeec3d) (BuildId: c547f75d1a8b7697)
    #5 0x63b647f84084  (./target/debug/asan-rust+0xef084) (BuildId: c547f75d1a8b7697)
[...]

There’s more useful output, but this truncated version should suffice. ASan tells us that we have a buffer-overflow as expected. It also gives us a stack trace, which can be very helpful in debugging how that buffer overflow was actually triggered. The top of the stack trace shows us where the overflow happened. There’s just one problem: ASan just gives us raw offsets in our binaries, such as libmyclib.so+0x1247. That means “the code at offset 0x1247 of libmyclib”, which is still not very human readable. There are a couple of things we can do to about that. Let’s see some of them.

Using addr2line

We can use the GNU addr2line tool to convert addresses in binaries into lines of source code. This, and all the other things I’ll mention below, requires that our library be compiled with debug symbols enabled. We have implicitly done that because CMake defaults to the debug build type, unless otherwise specified.

$ addr2line -f -p -e ./target/debug/build/[...]/out/lib/libmyclib.so 0x1247
access_internal at ./myclib/lib.c:20

The addr2line tool tells us the function name and the line in the source code that produced the buffer overflow. Not surprisingly, this is exactly the line return pointer[index]; in the access_internal function. Doing it like that surely works, but it can become tedious quickly.

Using the Symbolizer

If we have llvm installed, there is a tool called the llvm-symbolizer. It might not be called exactly like that, e.g. for my particular installation it’s called llvm-symbolizer-18. We can tell ASan about it by using a dedicated environment variable. Then, we use another dedicated environment variable to instruct ASan to use it to prettify its output.

$ export ASAN_SYMBOLIZER_PATH=$(which llvm-symbolizer-18)
$ export ASAN_OPTIONS=symbolize=1

If we now run our program as above, the output is much easier to grasp:

Calling C function...
=================================================================
==13863==ERROR: AddressSanitizer: heap-buffer-overflow on address [...]
READ of size 4 at 0x50400000003c thread T0
    #0 0x70e4a4ca2247 in access_internal ./myclib/lib.c:20:17
    #1 0x70e4a4ca21d4 in allocate_and_access ./myclib/lib.c:12:26
    #2 0x5b2614b25d74 in asan_rust::main::h67804a126392dee0 ./src/main.rs:7:27
[...]

This was exactly the information that I needed to fix my particular problem. If the output of ASan is particularly unwieldy, we can also direct it into a file using the ASAN_OPTIONS environment variable, or via piping the stderr output to a file.

Conclusion

ASan proved invaluable for me, because it helped me find and eventually fix a weird out of bounds memory access, that was producing segfaults sometimes and hot garbage at other times. I was really happy that the integration across languages was pretty smooth, after figuring out I needed the static runtime. There is much more we can do with ASan in Rust, for example it can also help us find some problems in unsafe Rust code, a small –but important– subset of what miri does, where ASan lets the program run much faster than miri.

Further Reading

Endnotes

  1. The correct term is static or internal linkage. In effect, the function cannot be called from outside the library (or even outside of this particular C File… compilation unit… damn you voice in my head!). 

  2. But we cannot rely on that. Accessing an invalid pointer is undefined behavior, which can mean any number of things. 

  3. Sanitizers may soon be stabilized, so make sure to check. See also this stabilization PR. Thanks to u/Shnatsel for pointing that out

  4. Though I have also seen it work without the explicit target, but better safe than sorry. 

✨ Support

If you like my work and want to support it, please share it. If you want to do even more than that, consider buying me a coffee ☕.

Comments

    You can comment on this post using your GitHub account.

    Join the discussion for this article on this ticket. Comments appear on this page instantly.