Rust vs Common C++ Bugs
I used to like C++. I still do, but I used to, too. Joking aside, I am not one to tell any C++ programmer to just use Rust. There are a ton of valid reasons why companies or individuals decide to use C++ and I don’t wish to deter anyone from doing so. What I am trying to do in this article is to see how Rust stacks up against a handful of very common (and severe) bugs that are easy to produce in C++. I tried to make this article worthwile for both Rust and C++ programmers.
I’ll use Louis Brandy’s excellent CppCon 2017 talk Curiously Recurring C++ Bugs at Facebook as the basis for what constitutes a common and scary bug easily produced in C++. Louis draws his experience from working on the C++ codebase at facebook (now Meta). I am aware that facebook does not represent every C++ use case, but my personal experience is very much compatible with the given list. I’ll try to not repeat the talk too much because it is an excellent presentation that I urge you to watch yourself. In the talk, Louis gives mitigations against many of the bugs, mostly involving Sanitizers. I won’t go into those kind of mitigations here because I want to explore how Rust stacks up on a more fundamental level.
Bug #1: Vector Out of Bounds Access Using []
We probably all know how array or vector access using operator []
can cause out of
bounds access in C++. For example, for std::vector
it does not perform bounds checking,
so an illegal memory access can occurr unnoticed1. The problem is not that this
is possible at all, since a systems language that cares about performance
must offer these kinds of unchecked accesses. The problem is that
this most fundamental way to access a vector element, deep in every programmers DNA, is
inherently unsafe.
There is, of course a bounds-checked API for element access using
std::vector::at
, but few people seem to use it. Rust does the trade-off
differently. On slices, arrays, vectors and the like operator []
is bounds-checked
and will panic2 for out of bounds access.
There is a method get_unchecked
that allows unchecked access to the elements if you tell the compiler trust me,
I know what I am doing using the unsafe
keyword. I personally think this is
the right default to have, since the other way leads to the number one bug
contained in Louis’ presentation.
Bug #2: std::map
Access using []
This one is a pretty well known confusing API: on a std::map
in C++ the operator
[]
actually means get me a reference to the element or insert the default value
and then get me a reference to that. There are cases where this is a useful
API to have. Imbuing the operator []
with that behavior seems problematic
because it violates the principle of least surprise.
Louis Brandy has an excellent example of how that becomes a serious problem.
Take a look at this constructor for a Widget
class that performs some logging:
Widget::Widget(
const std::map<std::string, int>& settings) :
m_settings(settings) {
std::cout << "Widget initialized..."
<< "timeout is:"
<< m_settings["timeout"]
<< "\n";
}
Here the programmer’s mindset is let me just log the timeout real quick, but it
is easy to forget that this operation inserts a timeout of 0
(which in often
means infinite wait) if no such key was already present.
Rust’s BTreeMap
exposes a much less surprising API, which makes this kind of error arguably
impossible to make3.
Let’s have a quick look at the API for accessing entries in Rust’s BTreeMap
.
We can use the []
operator
for accessing an (immutable) reference to an element in the map via its key. This operator
panics if the key is not present in the map. That might seem peculiar at first,
but is consistent with how a vector or array behaves when using the []
operator,
where it means bounds-checked access to the element or panic.
We can also use the get
or get_mut
methods for immutable or mutable access respectively if we are not sure the key
exists in the map. Rather than throw an exception they return an optional reference
to the element if it exists. Optional references are safe in Rust.
But is there a way to have the behavior of operator[]
as in C++ in a clearer
manner? Turns out there is, using the entry
method. Say we have a BTreeMap
named map
: then we can write
map.entry(key).or_default()
to get the equivalent
behavior to C++’s map[key]
. The expression returns a mutable reference that can be manipulated,
just as the C++ operator would. I would argue that this is a much cleaner API.
Bug #3: References to Temporaries
While the bugs described above might theoretically be fixed by an API redesign in the standard library4, the next one is inherent in the language. It has to do with the lifetime of temporaries. I assume that many of us know about lifetime in C++ and also know that lifetimes of temporaries may be extended under certain circumstances. However, I personally would be hard-pressed to recount all of those rules and exceptions to them. Louis gives a very cool motivating example.
Building on the discussion about maps above, let’s write a function that gets an element from a map or returns a given default value. He presents us with the following function:
std::string get_or_default(
const std::map<std::string,std:string>& map,
const std::string& key,
const std::string& default_value) {
auto it = map.find(key);
return (it != map.end()) ?
it->second : default_value;
}
This function works perfectly fine and it is commonly used like so:
get_or_default(people_map,"name","John Doe");
The important thing here is that giving the character literal "John Doe"
as the default parameter implies the construction of a temporary string.
There is nothing wrong with that. The problem arises only when we try
to optimize the get_or_default
function implementation. The function always returns
a copy of the value it chooses to return. Being good C++ programmers, we might
want to get rid of that extra copy-construction and change the return value to a
constant reference to string instead of the by-value return that causes the copy.
Let me quote Louis Brandy: “this code [would be] hopelessly broken”. The
problem does not manifest if we return a reference to an element inside the map.
This is of course fine, but if the map does not contain the given key, we return a reference to the
temporary. This is a dangling reference and thus undefined behavior.
As mentioned above, there are rules for lifetime extension,
which make it defined behavior to bind a temporary value to a reference to const
.
However, we find this exception to the rules:
a temporary bound to a return value of a function in a
return
statement is not extended: it is destroyed immediately at the end of thereturn
expression. Suchreturn
statement always returns a dangling reference.
References to temporaries in C++ are hard, even in 2022 and
address sanitizer and static analyzers can be used as mitigations to some degree.
Still, it would surely be nice not to have that problem at all. This is where
one of Rust’s most well-known features comes in: the Borrow Checker.
This is such an integral and well-documented part of the Rust language
that I’ll keep the example brief. We can write the get_or_default
function in Rust like so:
fn get_or_default<'a,'b,'c>(
map : &'a BTreeMap<String,String>,
key : &String,
default_val : &'b String) -> &'c String
where 'a : 'c,
'b : 'c {
match map.get(key) {
Some(val) => val,
None => default_val,
}
}
This is not idiomatic Rust since there are much cleaner ways to achieve this
logic by using the BTreeMap
API, but it’s a straightforward translation of the
C++ code.
The interesting thing is that the get_or_default
function takes references to all
of it’s parameters (even the default value) and returns a reference, which will
point to either the default value or the entry in the map. Those weird 'a
,
'b
, and 'c
parameters that look like generics are actually named lifetimes
that are enforced completely at compile time to check that references can never
be dangling. What we have told the compiler here is that the map has a lifetime of 'a
.
References to the elements inside the map will also have this lifetime. For the key
we don’t have to specify an explicit lifetime, because the
lifetime of the reference returned from the function is completely independent from it.
The compiler can take care of that on its own. The lifetime
of the default value is 'b
and finally the lifetime of the returned reference is
'c
. The real magic happens in the where
clause, where we indicate the relationship
between the lifetimes. We tell the compiler that the lifetime 'a
and 'b
must
be at least as long as 'c
. Thus, at the callsite, the compiler will check that the
returned reference is only used as long as the map and the default value are still
alive. Again, this is enforced at compile time without any runtime overhead.
For any program that compiles, we can be sure that the get_or_default
function
never returns a dangling reference.
If that syntax seems weird and complicated to you, don’t panic because there is such a thing as lifetime elision which makes those explicit lifetime annotations unneccessary in many common usecases. Still, sometimes we have to help the compiler by annotating the lifetimes which is the price we have to pay in Rust (at compile time) for the absence of dangling references5.
Bug #4: Volatile for Atomic
Louis Brandy tells a success story here. Years ago, the volatile
keyword was
commonly misused to enforce synchronization across threads. With the advent of
std::atomic
, and more generally the addition of library and language facilities
for concurrent programming in C++11, programmers stopped misusing volatile
. We
started using the newly available language and library facilities, since
they were simple to use and simple to teach.
Let’s enjoy this success story, before we get back to the broader topic of thread safety in the next section. Harking back to the first two sections, this success story shows that good API design does go a long way in preventing bugs.
Bug #5: Is shared_ptr
Thread Safe?
In a very fun section of this great talk, Louis shows that developers seem
to forget that a shared_ptr
does not enforce any synchronization for its
pointed to element, yet the reference count is synchronized
across threads6. This distinction seems to be sufficiently confusing
so that bugs are regularly produced by sharing shared_ptr
instances across threads.
Rather than blaming the developers, it seems to me that this is a systemic
issue that should be addressed.
This is where Rusts notion of fearless concurrency
kicks in. It’s a topic too broad to discuss in depth, because a lot of language facilities
work together to create it. It’s instructive to look how
Rust deals with the equivalent problem of shared pointers. Let’s look at this
program, where we’ll try to use a reference counted shared pointer Rc<T>
to share an integer value across threads:
use std::sync::Arc;
use std::rc::Rc;
type SharedPtr<T> = Rc<T>;
fn main() {
let val = SharedPtr::new(10);
let shared1 = SharedPtr::clone(&val);
let handle1 = std::thread::spawn(move || {
println!("thread1: {}",shared1);
});
let shared2 = SharedPtr::clone(&val);
let handle2 = std::thread::spawn(move || {
println!("thread2: {}",shared2);
});
println!("main: {}", val);
let _ = handle1.join();
let _ = handle2.join();
}
We have typedef’d SharedPtr<T>
to the standard library type Rc<T>
. Then we
created a new instance via Rc::new
(think of this somewhat like std::make_shared
)
and share a copy of that shared pointer across different threads (using a method
called clone
because otherwise Rust would move the value out of our scope).
All threads including main
just print the value of the shared integer. One problem:
this program does not compile. The compiler will complain that Rc
does not
implement the Send
trait.
The Send
trait is Rust’s way, via the type system, to indicate whether it is safe to
send a value to a different thread. This is important for types that carry
shared ownership semantics, like Rc<T>
.
It turns out that Rc<T>
is not safe to send across different threads because
the reference counting is not atomic as it would be in std::shared_ptr<T>
. The
correct equivalent for an atomically reference counted shared pointer would be
the Rust type Arc
. This
type does indeed implement Send
and thus using type SharedPtr = Arc
makes this
code compile.
If we now tried to mutate the value from two different threads,
we would find out that we cannot simply do that. We’ll eventually hit an error
telling us we need a second fundamental trait for concurrency, called Sync
.
This trait, again via the typesystem, indicates whether a value is safe to share
between threads. The short story is that we have to
explicitly use a synchronization mechanism for the data that makes it safe to
share among threads, such as a mutex. I won’t go into detail here, because
I have written a two part series comparing mutexes in Rust and C++ (part 1,
part 2).
The takeaway here is that Rust does indeed prevent this type of thread-safety problem (and many more) by making thread safety part of the type system. So thread-safety is checked at compile time. This even presents an opportunity for optimization by giving us non-atomically reference counted pointers that are only safe to use from one thread. We only pay for atomic counting if we need it and we cannot forget to use it during a refactor because the compiler will tell us.
However, it’s only fair to note that the thread-safety guarantees aren’t all sunshine and rainbows in Rust and truly getting into the specifics of what thread-safety guarantees Rust actually gives is complicated. The term is just so loosely defined. For one, Rust does prevent data races and on the other hand you can still deadlock all you want. Also, at the time of writing, Rust does not have a defined memory model, unlike C++ 7.
Bug #6: A Vexing Parse that Only Pretends To Lock A Mutex
This bug is a really curious one. Did you know that std::string(foo);
is valid C++
and is parsed as std::string foo;
? Both produce a default constructed instance
of a string named foo
. This can lead to an interesting problem, when trying
to lock a mutex using the RAII guards in the standard library. Take a look at
this deceptive example:
void Widget::update_state() {
std::unique_lock<std::mutex>(m_mutex);
do_some_update(m_state);
}
The first line of the function does not actually lock the mutex m_mutex
, but
instead defines a default constructed instance of a std::unique_lock
called
m_mutex
. A correction would just require two more characters like so:
std::unique_lock<std::mutex> g(m_mutex)
. The difference is not easy to spot.
Louis shows that despite a running linter, he found
two instances that made it into the production codebase with exactly that mistake.
Evaluating whether this mistake is easy to make in Rust is interesting. Rust’s
Mutex
works very
differently than in C++. In Rust a mutex is a class template8
where the instance of a mutex is the sole owner of the contained data. When we lock this mutex
we get a pointer-like guard that can be dereferenced to access the underlying value.
Once the guard goes out of scope, the mutex is locked again. This data-oriented
approach to mutexes makes it impossible to forget to lock a mutex that protects data, because
we cannot otherwise access the data. I have written about this paradigm in
an article and also explored data-oriented mutexes for C++.
Spoiler: there’s no way to have data-oriented mutexes in C++ without the risk of running
into trouble with lifetimes.
While using mutexes to protect data as described above is surely a very common (maybe
the most common) use case, there are valid reasons to use a mutex to protect
sections of code rather than data. We can achieve this in Rust with the Mutex<()>
specialization, where ()
in Rust is (in this case) equivalent to void
. To
lock this mutex we can just call
{
let _ = my_mutex.lock();
// ... code in section
}
…and that would produce the same problem as the C++ code above! Question to Rust developers: did you spot the problem?
It turns out that let _ =...
is not actually binding the mutex guard to a named
variable. This means the created temporary is immediately destructed, rather
than at the end of the scope, unlike a named entity. In fact, let _ = expr
is semantically equivalent to (void)(expr)
in C++ and serves to e.g. suppress
compiler warnings about unused return values. On the other hand let _g = expr
is somewhat like auto _g = expr
in C++ and binds to a named entity whose destructor
is invoked at the end of its scope. At the time of writing the code shown above
compiles without warnings in Rust. The mutex will not be locked until
the end of the scope.
I really think the semantic
difference between let _ = expr
and let _g = expr
is a confusing design choice,
as evidenced e.g. by the discussions here
and here. There is a clippy lint
9 that will catch the mistake in the code using mutexes above, but it cannot prevent
deeper underlying problem for other types of RAII guards. We could, of course,
argue that using Mutex<()>
is an indication of bad design, but that would
be unfair to C++. Then we could always say producing bugs is bad design.
If Mutex<()>
exists, people will eventually use it. Furthermore, the problem with
let _ = ...
goes beyond just mutexes.
Summary
We looked at six common errors in C++ in the facebook codebase and we saw that Rust enables us to catch most of the bugs at compile time and also emphasizes memory safety and simplicity in its API design. But we also discovered a thread safety bug caused by a syntax quirk in C++ that exists in Rust as well, albeit through a completely different syntax quirk. We also learned that thread safety is a very underspecified aspect of the Rust language despite some amazing compile time guarantees.
I don’t want to pass judgement on Rust vs C++ as languages in general, but it is nice to see that a language which is often hailed as a safe successor to C++ does indeed help to prevent many of the bugs common in C++. And yet we saw that Rust is not without its flaws.
Endnotes
-
This is true for other array like containers in the STL as well as C-style arrays. ↩
-
A panic is an early, but orderly termination of the program with stack unwinding. ↩
-
Half of the reason this operation is so problematic in C++ is that it does not work with
const
-correctness in an intuitive way. It will not work on aconst
map, but Louis shows that this does not deter programmers from using the operator. We just get rid of theconst
. ↩ -
However, I’d be surprised if they are ever fixed, because of backwards compatibility. ↩
-
There is such a thing as
unsafe
Rust which can cause all kinds of UB, including dangling references. Unsafe Rust is not disabling the Borrow Checker. It can introduce legitimate safety problems, but since it’s opt-in we can decide not to use the unsafe subset of Rust and loads of libraries and applications can be and are written without it. ↩ -
A reader on reddit characterized this statement as “both factually true and incredibly misleading”. See their comment on the matter. ↩
-
“Templates” in Rust are not called templates and also work rather differently than templates in C++. But in very broad strokes, Rust generics are somewhat like C++ templates. ↩
-
Clippy is the static analyzer that ships with the rust toolchain Rust and many projects use it to run extra analyses on their code. However, the compiler does not complain and so I feel it would be unfair to C++ to disregard that problem in the Rust language. ↩
Comments
You can comment on this post using your GitHub account.
Join the discussion for this article on this ticket. Comments appear on this page instantly.