Geo’s Notepad

Global Fitting of Multiple Right Hand Sides with Variable Projection

2024-01-10T00:00:00+00:00

About three years ago, I announced in my previous article on variable projection, that I would write a follow up about VarPro with multiple right hand sides. This is it. Global fitting with multiple right hand sides is an application where VarPro shines because it can bring significant computational savings. Let’s dive right in.

What is Global Fitting?

Global fitting is a term that I came across back when I was working in fluorescence lifetime imaging, see e.g. (Warren2013). I am not sure whether it is a widely used term, but the data analysis software OriginLab Origin^® also seems to use it, and their definition is quite instructive¹:

The term “global fitting” generally refers to simultaneous curve fitting operations performed on multiple right hand sides. Because right hand sides remain distinct, they may or may not “share” parameter values during the fit process. When a parameter is shared, a single parameter value is calculated for all right hand sides. When a parameter is not shared, a separate parameter value is calculated for each right hand side.

The topic of this article is global fitting of a vector valued function \(\boldsymbol f \in \mathbb{R}^m\) to \(S \in \mathbb{N}\) vector valued right hand sides \(\boldsymbol y_s \in \mathbb{R}^m\), \(s = 1,\dots,S\). We are concerned only with fitting certain kinds of functions, so called separable model functions. Those are functions \(\boldsymbol f\) which can be written as the linear combination of \(n\) nonlinear functions. For our problem, we will assume that the nonlinear parameters are shared across all the members of the datasets, while the linear coefficients are allowed to vary between members. I’ll give a more formal description soon.

The fitting problem as stated above will allow us to use VarPro to its full potential and reap potentially massive computational benefits. However, not every fitting problem will fit this bill. Firstly, we need a model function that is truly separable. Secondly, we need a problem where it is justified to assume that the nonlinear parameters are shared across the dataset, while the linear coefficients are not. Fluorescence lifetime imaging is an example of such a problem².

VarPro: A Quick Recap

Since this article is a follow up of my previous article, I assume that you, kind reader, are familiar with it. I’ll use the same notation as before, so that its easy to go back and forth between the articles and I’ll keep repetition to a minimum.

In the last article, we were concerned with least squares fitting a vector valued, separable model function \(\boldsymbol{f}\), which is written as a linear combination of nonlinear basis functions

\[\boldsymbol{f}(\boldsymbol{\alpha},\boldsymbol{c}) = \boldsymbol{\Phi}(\boldsymbol\alpha)\boldsymbol{c} \in \mathbb{R}^m \label{def-f} \tag{0}\]

to a vector of observations \(\boldsymbol{y} \in \mathbb{R}^m\). Here \(\boldsymbol \alpha\) are the nonlinear parameters of the model function, while \(\boldsymbol c\) are the linear coefficients. The vector \(\boldsymbol y\) is the (single) right hand side of our problem.

Specifically we were concerned with finding \(\boldsymbol{\alpha} \in \mathbb{R}^q\) and \(\boldsymbol c \in \mathbb{R}^n\) such that the weighted sum of squared residuals \(\rho_{WLS}\) is minimized

\[\begin{eqnarray} &\min_{\boldsymbol \alpha, \boldsymbol c}& \rho_{WLS}(\boldsymbol \alpha, \boldsymbol c) \label{min-rho-single} \tag{1} \\ \rho_{WLS}(\boldsymbol \alpha, \boldsymbol c) &:=& \lVert \boldsymbol{r_w}(\boldsymbol \alpha, \boldsymbol c) \rVert^2_2 \label{def-rwls} \tag{2}\\ \boldsymbol r_w(\boldsymbol \alpha, \boldsymbol c) &:=& \boldsymbol W (\boldsymbol y - \boldsymbol{f}(\boldsymbol \alpha, \boldsymbol c)) \\ &=& \boldsymbol y_w - \boldsymbol \Phi_w(\boldsymbol \alpha) \boldsymbol c \label{def-rw} \tag{3} \\ \boldsymbol \Phi_w &:=& \boldsymbol W \boldsymbol \Phi \label{def-phiw} \tag{4} \\ \boldsymbol y_w &:=& \boldsymbol W \boldsymbol y \label{def-yw} \tag{5}, \end{eqnarray}\]

where \(\boldsymbol W \in \mathbb{R}^{m\times m}\) is a weighting matrix. We learned that the magic of VarPro is to rewrite the problem from a minimization over \(\boldsymbol \alpha\) and \(\boldsymbol c\) to a minimization over the nonlinear parameters \(\boldsymbol \alpha\) only:

\[\boldsymbol r_w (\boldsymbol \alpha) = \boldsymbol P^\perp_{\boldsymbol \Phi_w(\boldsymbol \alpha)} \boldsymbol y_w. \label{rw-P-y} \tag{6}\]

The previous article goes into detail on how the projection matrix \(\boldsymbol P^\perp_{\boldsymbol \Phi_w(\boldsymbol \alpha)}\), which depends on \(\boldsymbol \Phi(\boldsymbol \alpha)\) and \(\boldsymbol W\), is calculated. To minimize the squared sum of the residuals, we feed \(\boldsymbol r_w (\boldsymbol \alpha)\) into a least squares solver of our choice, like e.g. the Levenberg-Marquardt algorithm. We are typically required to provide the Jacobian matrix \(\boldsymbol J(\boldsymbol \alpha)\) of the residuals as well. It turns out, that we can calculate the \(k\)-th column \(\boldsymbol j_k\) of the Jacobian as

\[\boldsymbol j_k = \frac{\partial \boldsymbol r_w}{\partial \alpha_k} = \frac{\partial\boldsymbol P^\perp_{\boldsymbol \Phi_w(\boldsymbol \alpha)}}{\partial \alpha_k} \boldsymbol y_w, \label{jk} \tag{7}\]

where the expression for the derivative of the projection matrix is given in the previous article. Now we have all the ingredients together to tackle the global fitting problem.

Global Fitting with VarPro

In this section I’ll follow the clear presentation of Bärligea and Hochstaffl (Baerligea2023)³. As I said above, this article is concerned with fitting separable models to a dataset where the nonlinear parameters are shared across the whole dataset, while the linear coefficients are allowed to vary across the members of the set. Let’s formalize this now. Our dataset is an ordered set of vector valued right hand sides \(\left\{\boldsymbol y_s \in \mathbb{R}^m | s=1,\dots,S\right\}\). We’ll now collect the members of the dataset into a matrix:

\[\boldsymbol Y = \left(\begin{matrix} \vert & & \vert \\ \boldsymbol y_1, & \dots, & \boldsymbol y_S \\ \vert & & \vert \\ \end{matrix}\right) \in \mathbb{R}^{m \times S}. \label{def-Y} \tag{8}\]

Since we allowed the linear coefficients to vary across the dataset, each member has its own vector of linear coefficients \(\boldsymbol c_s\). We can also group those into a matrix

\[\boldsymbol C = \left(\begin{matrix} \vert & & \vert \\ \boldsymbol c_1, & \dots, & \boldsymbol c_S \\ \vert & & \vert \\ \end{matrix}\right) \in \mathbb{R}^{n \times S}.\label{def-C} \tag{9}\]

Finally, we can group the weighted residual vectors into a matrix:

\[\boldsymbol R_w = \left(\begin{matrix} \vert & & \vert \\ \boldsymbol r_{w,1}, & \dots, & \boldsymbol r_{w,S} \\ \vert & & \vert \\ \end{matrix}\right) \in \mathbb{R}^{m \times S}, \label{def-Rmatrix} \tag{10}\]

where

\[\begin{eqnarray} \boldsymbol r_{w,s} &=& \boldsymbol W (\boldsymbol y_s - \boldsymbol \Phi(\boldsymbol \alpha) \boldsymbol c_s) \tag{11} \label{weighted-data}\\ &=& \boldsymbol y_{w,s} - \boldsymbol \Phi_w(\boldsymbol \alpha) \boldsymbol c_s \\ \boldsymbol y_{w,s} &:=& \boldsymbol W \boldsymbol y_s \end{eqnarray}\]

This implies that the same weights are applied to each member of the dataset. Note further, that \(\boldsymbol \alpha\) and thus \(\boldsymbol \Phi_w(\alpha)\) are the same for each residual vector.

Our fitting problem now is to minimize the sum of squared residual vector 2-norms, i.e. \(\sum_s \lVert r_{w,s} \rVert_2^2\), which we can write in matrix form like so:

\[\begin{eqnarray} &\min_{\boldsymbol \alpha, \boldsymbol C}& \rho_{WLS}(\boldsymbol \alpha, \boldsymbol C) \label{min-rho-mrhs} \tag{12} \\ \rho_{WLS}(\boldsymbol \alpha, \boldsymbol C) &:=& \lVert \boldsymbol R_w(\boldsymbol \alpha, \boldsymbol C) \rVert_F^2 \label{redef-rho} \tag{13} \\ \boldsymbol R_w(\boldsymbol \alpha, \boldsymbol C) &:=& \boldsymbol W (\boldsymbol Y - \boldsymbol \Phi \boldsymbol C \label{def-residual-matrix} \tag{14}) \\ &=& \boldsymbol Y_w - \boldsymbol \Phi_w \boldsymbol C ,\\ \end{eqnarray}\]

where \(\boldsymbol Y_w = \boldsymbol W \boldsymbol Y\) and \(\lVert . \rVert_F\) is the Frobenius Norm. I have reused the symbol \(\rho_{WLS}\) for the sum of the squared residuals, since this trivially contains eq. \(\eqref{def-rwls}\) as a special case for a dataset with only one member (\(S = 1\)).

Using the ideas of VarPro as presented in the previous article, we can rewrite minimization problem \(\eqref{min-rho-mrhs}\) into a minimization over \(\boldsymbol \alpha\) only:

\[\begin{eqnarray} &\min_{\boldsymbol \alpha}& \boldsymbol \rho_{WLS}(\boldsymbol \alpha) \label{min-rho-mrhs-varpro} \tag{15} \\ \boldsymbol \rho_{WLS} (\boldsymbol \alpha) &=& \lVert \boldsymbol R_w (\boldsymbol \alpha) \rVert_F^2 \label{rho-varpro} \tag{16} \\ \boldsymbol R_w(\boldsymbol \alpha) &=& \boldsymbol P^\perp_{\boldsymbol \Phi_w(\boldsymbol \alpha)} \boldsymbol Y_w \label{rw-varpro} \tag{17} \\ \end{eqnarray}\]

The matrix equations \(\eqref{rho-varpro},\eqref{rw-varpro}\) are generalizations of the vector identities \(\eqref{def-rwls}, \eqref{def-rw}\). But there’s a problem that prevents us from just plugging these results into off-the-shelf nonlinear least squares minimizers, as we did in the previous article. The problem is, that those implementations usually require us to give the residual as one single vector. Additionally, we typically need to specify the Jacobian matrix of that residual vector.

Luckily, all is not lost and we are not forced to resort to inefficient approaches⁴ to shoehorn our nice matrix equations into vector format. The residual \(\rho_{WLS}\) in eq. \(\eqref{rho-varpro}\) is just the squared sum of the elements of the matrix \(\boldsymbol R_w\). It’s obvious that \(\lVert \boldsymbol R_w (\boldsymbol \alpha) \rVert_F^2\) is the same as the squared norm \(\lVert \boldsymbol z_w (\boldsymbol \alpha)\rVert_2^2\) of a vector \(\boldsymbol z_w (\boldsymbol \alpha)\) defined as:

\[\boldsymbol z_w(\boldsymbol \alpha) := \text{vec}\; \boldsymbol R_w (\boldsymbol \alpha) = \left( \begin{matrix} \boldsymbol r_{w,1} \\ \vdots \\ \boldsymbol r_{w,S} \\ \end{matrix} \right) = \left( \begin{matrix} \boldsymbol P^\perp_{\boldsymbol \Phi_w (\boldsymbol \alpha)} \boldsymbol y_{w,1}\\ \vdots \\ \boldsymbol P^\perp_{\boldsymbol \Phi_w (\boldsymbol \alpha)} \boldsymbol y_{w,S}\\ \end{matrix} \right) \in \mathbb{R}^{m\cdot S}. \label{z-vec} \tag{18}\]

The mathematical operation \(\text{vec}\) is called vectorization⁵ and turns a matrix into a vector by stacking the matrix columns on top of each other. We now obtained a vector that we can pass into our nonlinear minimization step. We can use the matrix form of eq. \(\eqref{rw-varpro}\) to calculate \(\boldsymbol z_w\) and then turn the resulting matrix into a vector by stacking the columns. Ideally, this is a very cheap operation in our linear algebra backend.

The final piece of the puzzle is an expression for the Jacobian of \(\boldsymbol z_w(\boldsymbol \alpha)\), which we’ll denote \(\boldsymbol J \{\boldsymbol z_w\}(\boldsymbol \alpha) \in \mathbb{R}^{m\cdot S\, \times \, q}\). It’s \(k-th\) column is, by definition, just

\[\boldsymbol j_k^{(z)} = \frac{\partial z_w}{\partial \alpha_k} \in \mathbb{R}^{m \cdot S},\]

which, using the same insights as above, we can write it as

\[\boldsymbol j_k^{(z)} = \left( \begin{matrix} \frac{\partial \boldsymbol r_{w,1} }{\partial \alpha_k} \\ \vdots \\ \frac{\partial \boldsymbol r_{w,S} }{\partial \alpha_k} \\ \end{matrix} \right) = \left( \begin{matrix} \frac{\partial \boldsymbol P^\perp_{\boldsymbol \Phi_w (\boldsymbol \alpha)}}{\partial \alpha_k} \boldsymbol y_{w,1}\\ \vdots \\ \frac{\partial \boldsymbol P^\perp_{\boldsymbol \Phi_w (\boldsymbol \alpha)}}{\partial \alpha_k} \boldsymbol y_{w,S}\\ \end{matrix} \right) = \text{vec} \left( \frac{\partial \boldsymbol P^\perp_{\boldsymbol \Phi_w (\boldsymbol \alpha)}}{\partial \alpha_k} \boldsymbol Y_w \right). \label{jkz} \tag{19}\]

The previous article shows how to calculate the matrix \(\frac{\partial \boldsymbol P^\perp_{\boldsymbol \Phi_w (\boldsymbol \alpha)}}{\partial \alpha_k}\). It’s the same matrix as for a single right hand side. Again, we can use the matrix form to efficiently calculate eq. \(\eqref{jkz}\) and then transform the matrix into a column vector. If we compare the equations for the single right hand side \(\eqref{rw-P-y}, \eqref{jk}\) with the equations for multiple right hand sides \(\eqref{rw-varpro}, \eqref{jkz}\), we can see that the matrix equations are just pretty straightforward generalizations of the original vector identities.

Advantages and Limitations

The presented approach to solving multiple right hand sides with variable projection has many advantages. VarPro eliminates the linear parameters from the nonlinear minimization process, which –together with the fact that the nonlinear parameters are shared across the dataset– means that instead of \(S\cdot n + q\) parameters, the nonlinear solver only has to solve for \(q\) parameters. This is a substantial reduction in parameters even for moderately sized datasets. Furthermore, the matrix \(\boldsymbol P^\perp_{\boldsymbol \Phi_w(\boldsymbol \alpha)d}\) and its derivative only need to be calculated once for the whole dataset for a given value of \(\boldsymbol \alpha\). This can massively speed up the fit.

However, this comes at a price: the whole calculation that I presented here depends on the fact that the same weights are applied to all members of the dataset, see eq. \(\eqref{weighted-data}\). This might not be as bad of a limitation as it sounds at first. Warrent et al. show that it’s pretty simple to come up with decent global weights even for Poisson distributed data (Warren2013).

Limitations and Extensions

I’ll conclude this article here, since it is the simplest and most efficient application of VarPro to problems with multiple right hand sides. Let me mention some limitations of the presented approach and how to overcome them, without going into too much detail.

Depending on the Data Index

One important limitation of the approach presented here is, that the model functions and the weights must be the same for each member of the dataset, i.e. they may not vary with the index \(s\) across the dataset. This limitation enables us to calculate the projection matrix and its derivative only once for the whole dataset and brings us substantial computational savings. It is pretty straightforward to extend the method presented here to allow a dependency on \(s\), cf. eg. (Baerligea2023). We will then need to recalculate the matrix \(\boldsymbol \Phi_w^{(s)}\) for every index \(s\), and likewise the projection matrix and it’s derivative. This will cost us some significant compute⁶ but can still beat a purely nonlinear minimization without VarPro (Baerligea2023).

More Efficient Solvers

The methodology presented here assumes we want to plug our residual and Jacobian into an existing nonlinear least squares solver. This has many advantages. For example, in our implementation we can concentrate on the actual VarPro part and leave the minimization to a well crafted third party library. We can also switch out the minimization backend, by switching to a different library or exchanging the underlying algorithm. Usually, VarPro implementations use the Levenberg-Marquardt (LM) algorithm for minimization, but any nonlinear least squares solver will do. Bärligea actually presents some evidence that solvers other than LM can be more efficient for certain problems (Baerligea2023).

However, there are some downsides that come with this approach. One problem is that both the residual vector and the Jacobian matrix will have \(m\cdot S\) rows, which can become quite large for big datasets. In their paper, Warren et al. report an approach termed partitioned variable projection that implements a modified version of the LM solver, which does not require to store the full Jacobian (Warren2023).

References

(Warren2013) Warren SC, et al. (2013) “Rapid Global Fitting of Large Fluorescence Lifetime Imaging Microscopy Datasets,” PLOS ONE 8(8): e70687. (link)

(Baerligea2023) Bärligea, A. et al. (2023) “A Generalized Variable Projection Algorithm for Least Squares Problems in Atmospheric Remote Sensing,” Mathematics 2023, 11, 2839 (link)

Endnotes

They use the term dataset instead of right hand side in their definition, but I am going to use the term dataset slightly differently. So that is why I changed it to right hand side. ↩
Fluorescence Lifetime Imaging (FLIM) requires us to fit a number of lifetimes (the nonlinear parameters) from a multiexponential decay, with varying amplitudes of the individual exponential decays (the linear coefficients). It is a reasonable approximation that only a handful of distinct lifetimes are present in any one particular sample (corresponding to different fluorophores), but that the linear coefficients (corresponding to fluorophore concentration) might vary spatially across a sample (Warren2013). However, it is also well known that the fluorescence lifetime of a fluorophore depends on it’s chemical surroundings, among other things. So the most likely scenario is that both the concentration as well as the lifetimes actually change across a sample. However, exponential fitting is a notoriously ill conditioned problem and the change in lifetime might or might not be detectable within the accuracy of the fit. At the end of the day, it’s a decision that must be made based on our knowledge of the data. Also consider the principle that “all models are wrong, but some are useful”. ↩
They extend the method for datasets where the members of a dataset may have different numbers of elements. This is out of scope for this here article because we have to sacrifice computational savings for this extension. However, it’s definitely worth checking out their paper. ↩
If you’re interested, check out the section titled naive approach in the Bärligea paper. ↩
Not to be confused with the concept of vectorization in programming. ↩
Maybe this could help for the case where only the weights vary with \(s\). But I’m not so sure… ↩

Variable Projection Update

2023-12-17T00:00:00+00:00

Announcing a major update of my article on the variable projection algorithm, which you can find on this blog by following this link. The article now contains all the information you need to implement your own VarPro library in a language of your choice.

I’ve decided not to move the original post since it would break all internal and external links to it. The update lays the foundation for a future article on variable projection for problems with multiple right hand sides. That one is finally coming, after I had announced it at the end of the previous article roughly three years ago.

Idiomatic Rust (for C++ Devs): Constructors & Conversions

2023-11-25T00:00:00+00:00

Starting out in Rust as a C++ developer, one of the features I missed most at first were constructors. In this post, we explore the many roles that constructors have in C++ and see how they can (or can’t) be mapped to different parts of the Rust language. Although I’ll use C++ terminology, this article is likely helpful for developers coming to Rust from other languages as well.

There are many types of constructors in C++ and, as we’ll see, they serve a wide range of purposes. I’ll address these purposes section by section and we’ll explore if and how we can map them to Rust.

Initialization

One of the most obvious (and most important) purposes of constructors is to provide a way to initialize an instance of a type¹. Say we have a simple class declaration like so:

class Rectangle {
public:
    Rectangle(double width, double height);
private:
    double width;
    double height;
}
/**omitted definitions**/

// Usage
int main() {
    Rectangle rect(1.,2.);
    //...
}

Here, the constructor allows us to initialize the private fields of the Rectangle class, which is pretty useful if we appreciate encapsulation.

Before we jump into how to map this usecase to Rust, let’s note that constructors are so called special member functions in C++. They have special syntax for both how they are defined and how they are used, but really constructors are just functions that take some arguments and return an instance of the type. There’s nothing stopping us from just creating a static member function that takes the width and height and returns a rectangle². In fact, this idiom in C++ is called Named Constructors. The ISO C++ website calls it a “technique that provides more intuitive and/or safer construction operations for users of your class” (link).

Interestingly, that’s exactly how we would do it in Rust. We just write an associated function (the equivalent of a static member function) that returns Self³ and takes some arguments.

struct Rectangle {
    width: f64,
    height: f64,
}

impl Rectangle {
    pub fn new(width: f64, height: f64) -> Self {
        Self {
            width,
            height,
        }
    }

    pub fn square(dim : f64) -> Self {
        Self {
            width : dim,
            height : dim,
        }
    }
}

Rust has no special syntax or semantics for these types of constructors. If we want to provide them, we just write associated functions that return an instance of our type. Since new is not a keyword in Rust, it is customary (but by no means mandatory) to call one constructor of our type, preferrebly one that a lot of users will interact with, new. That new-function can take the number of arguments that make sense (including no arguments). For our rectangle that would be the width and height.

If we want to provide another constructor, Rust forces us to create another associated function and call it by a different name than new. There is no function overloading in Rust⁴ and no default arguments, so we are forced to create a different function which makes us think about a useful name. For our Rectangle struct that would be the square associated function which makes its purpose very explicit. In C++ we might be tempted to overload a constructor that just takes one parameter. While this might be be okay-ish for a rectangle type, it can quickly become a problem for more complex types.

One example from the Rust standard library is for example Vec::new that takes no arguments and constructs an empty vector. To create a vector with a given capacity we use Vec::with_capacity and pass it the initial capacity.

Enforcing Invariants with Constructors that Fail

Constructors are also a great way to help us enforce some invariants about a type. In our Rectangle example we might want to make sure that the dimensions are nonnegative. In a constructor in C++ we would have to signal this with an exception. Even if we were big errors-as-values proponents, we could not return a std::expected if we wanted to. A constructor for a type T is a special member function that can only return T. However, the named constructor idiom would allow us to make this adjustment.

There’s a lot to be said about the pros and cons of exceptions and this is not the place to say it. Rust does not have them and thus the way to communicate errors is as values⁵. If we have a constructor that can fail, we just make it explicit in the signature and have it return Result where Error is an appropriate error type. An example in the standard library is CString::new that takes a string of bytes and transforms it into a C-style (null-terminated) string. It will return an error if there are internal null-bytes in the given input.

Default Constructors

Another supremely important use case of constructors is default construction. In C++, a constructor that can be called with no arguments⁶ is a default constructor and is required e.g. for many operations in standard library containers. It’s so essential in C++ that writing T t; for a user defined T will not give us an uninitialized instance of T but a default constructed one.

The extent of what default constructed means semantically will vary from type to type but at least it implies that the instance will not consist of utter random nonsense. A default constructed std::shared_ptr will not be safe to dereference but at least it contains a null pointer, which is much better than a random address. A default constructed std::string will be empty and can be safely printed. If we are designing numerical optimization library, the default constructed instance of our Optimizer type could contain sensible default values for stopping criteria and tolerances and thus might be ready for use as-is.

The way we signal that a type is default constructible in Rust is to implement the Default trait on it. Default requires exactly one associated function fn default() -> Self, which already implies that default construction cannot return an error. It’s a good idea to check whether a type implements Default. For example Vec implements Default, and it turns out writing Vec::default() is the same as writing Vec::new(). If we write a type that exposes a new() -> Self constructor it is customary to also implement the Default trait. In fact, there is even a clippy lint for exactly that. Don’t worry if you don’t know enough about traits yet to understand how to actually use the Default trait in practice. I’ll go over an example of using trait bounds further below in the section on conversions.

Now, we can implement the Default trait for our struct manually like so:

impl Default for Rectangle {
    fn default() -> Self {
        Self {
            width : Default::default(),
            height : Default::default(),
        }
    }
}

Here, we have implemented the default constructor of our struct by just calling the default constructors of all the member fields. This is what the Default::default() behind the fields boils down to. I could have written x: 0 instead of x: Default::default() but this way allows us to see that we could pretty much implement any struct’s default constructor just by calling the the default constructor of its member fields, provided the members are default constructible. That’s a lot of boilerplate isn’t it? And that is why we could have stuck the line #[derive(Default) just above our rectangle definition to let the compiler handle the boilerplate for us like so:

#[derive(Default)]
struct Rectangle {
    width : f64,
    height : f64,
}

Now we can construct a rectangle with zero width and height by calling Rectangle::default().

Copy Constructors

Another important usecase of constructors is copying instances using the copy constructor. If we invoke a copy constructor that is because we want a, you guessed it, copy of the instance that we can manipulate independently from the original⁷. In C++, manually implementing a copy constructor means that some logic has to be executed that goes beyond just invoking the copy constructor of each member field. Otherwise we would have been fine with the default copy constructor.

If we want to implement a copy constructor in Rust, we implement the Clone trait for our type. Note that the trait is called Clone, not Copy. We’ll get to the Copy trait later, but for general purpose copy constructors, the correct trait to use is Clone. To give our rectangle a copy constructor, we can simply implement one like so:

impl Clone for Rectangle {
    fn clone(&self) -> Self {
        Self {
            width : self.width.clone(),
            height: self.height.clone(),
    }
}

The first thing we can see is that the clone function takes self by shared reference and returns an instance of Self. That’s how the C++ copy constructor works as well. However, copy construction cannot fail since it does not return a Result⁸.

The second thing is that I wrote the constructor a bit peculiarly by invoking the copy constructors of the member fields. I just did that to make it obvious that, just as with the default constructor above, we can let the compiler implement the boilerplate for us. We do that by sticking the #[derive(Clone)] annotation before the struct definition. This is the semantic equivalent of defaulting the copy constructor in C++, only that the Rust compiler will never implement a copy constructor for us without being explicitly told to. Since we already derived Default, this now looks like so:

#[derive(Default, Clone)]
struct Rectangle {
    width : f64,
    height : f64,
}

For more complex cases, we have to write the logic ourselves. But, just as with defaulting the copy constructor in C++, a good old #[derive(Clone)] is often just what we need.

Using Clone for Explicit Copy Construction

If you’ve worked a bit with Rust, you’ll know that it has destructive move semantics and single ownership. That means, if you pass in an instance of a type into a function by value⁹ the instance gets moved, the ownership gets transferred, and you can no longer access the instance. Say we have this code:

fn flip(img : Image) -> Image {
    //logic
}

let original = Image::new("my_image.jpg");
let flipped = flip(original);

In this case original is not accessible any more after it has been passed to the flip function. That can be pretty helpful because it allows the flip function to reuse the already allocated image buffers to perform its magic. But what if we wanted to display both the original and the flipped image next to each other? Well, that’s where Clone can come in handy, assuming our Image implements it¹⁰:

let flipped = flip(original.clone());

Now we have access to the original and the flipped image, since only the temporary instance that was created via a call to clone is moved into the function. This is not unlike pass-by-value semantics in C++. However, in Rust we have to explicitly call the copy constructor via a call to clone, while in C++ the copy constructor is invoked implicitly. I found the explicit cloning pretty irritating at first, but I realized soon that it’s a great way to spot optimization opportunities and it encourages me to think more carefully about the ownership semantics of my APIs.

Deriving Clone on Generic Types

Using the derive macro to implement traits that can be trivially implemented is usually the right thing to do. However, for generic types it might be necessary to implement Clone manually even if all the fields are Clone. For example

#[derive(Clone)]
struct MyPointer<T> {
    inner : Rc<T>,
}

This struct is generic on T and contains only one field, a reference counted shared pointer of type Rc, which can always be cloned. However, the derive macro will implement Clone for MyPointer only where T implements Clone. In many cases, this is the correct trait bound to enforce, but in this case it’s more restrictive than it needs to be, since Rc is always Clone. So we are better off manually implementing the clone trait here, which amounts to just calling inner.clone(). Here is a good article going into a bit more detail on the subject.

Trivially Copyable Types

For some types, even calling default copy constructors (which in turn invoke the copy constructors of the type’s members) may be unnecessarily expensive, because just doing a byte-for-byte copy to a new location would suffice. That’s why, in C++, we have the concept¹¹ of trivially copyable. Those are types that can be copied by doing a byte-for-byte copy of the memory. Whether a type is trivially copyable can be tested at compile time with the std::is_trivially_copyable type trait, which the compiler will specialize for our types. Say we define a struct that is an aggregate of trivially copyable types like so:

struct Point {
    double x;
    double y;
}

This type, as per the standard, is then also trivially copyable. If we then created an aggregate of multiple Point fields, that will still be trivially copyable and so on. That’s a really neat thing in C++ because it will let the compiler replace calls to many copy constructors by one bulk copy¹².

Rust also has the concept of trivially copyable types and has a marker trait called Copy for it. Marker traits are traits that have no associated methods and instead tell the compiler some semantic properties about our type. Although it has no methods, the compiler will not automatically implement the Copy trait on our types, since Rust wants you to be explicit about the semantics of our types¹³. Implementing the Copy trait for our type is easy since it has no methods:

impl Copy for Rectangle {}

You could also –you guessed it– stick a #[derive(Copy)] above the struct definition. There’s a couple important things to note about types that implement Copy. Firstly, for a type to implement Copy, it must also implement Clone. Also every field of a type that is declared Copy must itself be Copy. The compiler enforces both these things. Further, the Rust documentation states:

Types that are Copy should have a trivial implementation of Clone. More formally: if T: Copy, x: T, and y: &T, then let x = y.clone(); is equivalent to let x = *y;. Manual implementations should be careful to uphold this invariant; however, unsafe code must not rely on it to ensure memory safety.

The compiler cannot enforce that, but it can help us do the right thing if we just #[derive(Clone,Copy)]. However, the aforementioned caveats on implementing Clone on generic types apply.

Copy vs Clone

We already saw how Clone comes in handy for passing a copy of an instance by value. Now Copy does something with our types that is much closer to the C++ semantics: every assignment or pass-by-value becomes an implicit bitwise copy instead of a destructive move. Copy is the reason we can use primitive types like f32, u64 like this:

fn add(lhs: i32, right: i32) -> i32 {
    lhs + rhs
}

let x = 5;
let y = x; // (1)
let z = add(x,y); // (2)
print("{x}+{y}={z}");

If i32 wasn’t Copy, then x would have been moved in ① and y would have been moved in ②. It’s a useful semantic to have but if you’re a library maintainer it’s also a hell of a commitment to make, because if you remove Copy from a type, your users will have to go through quite a bit of pain to migrate.

Move Constructors

I’m not sure if this concept is common in other languages, but both C++ and Rust have move semantics. However, the way the two languages think about move semantics is very different. That makes this section pretty straightforward.

We’ve already mentioned that Rust has destructive move semantics and ownership is transferred with a move. That means that there is no need for a move constructor and Rust simply does not have an equivalent. That also frees us from the burden of leaving moved-from values in a defined state. In Rust, there is no such thing as a moved-from value, since the compiler will not let us access it.

Conversions

The last major usecase –hoping I did not forget one– of constructors in C++ are conversions. In fact, every constructor that can be called with one parameter is a converting constructor. That is, unless it is declared with the keyword explicit, which is considered a better default practice in modern C++. Implicit conversions can come in very handy, but if we are not careful they can lead to surprising behavior. Take this example from the Core Guidelines on the topic:

class String {
public:
    String(int);
    // ...
};

String s = 10;   // surprise: string of size 10

Surprising indeed, which is why Rust provides a way to make conversions possible but explicit¹⁴. As with a lot of the previous sections, the solution comes in form of traits. In this case, the two generic traits From and Into. To implement From for our type U we must implement the function from(value:T) -> U which transforms a type T to type U. To implement Into for a type T, we must implement the function into(self) -> U, which also transforms T to U. If this seems like both traits are just reduntant ways to define a transformation T -> U, bear with me, we’ll revisit this very soon.

As an important aside, note that if we don’t want to take ownership of the value, we can also implement the traits on references, e.g. implement From<&T> for U and so on ¹⁵.

For an example let’s pretend we’re working on a BigInteger type that can store large integer numbers. Naturally, we want to offer a converting constructor from types like i32, u64 and so on.

struct BigInteger {
    //...
}

impl From<i32> for BigInteger {
    fn from(value : i32) -> Self {
        //...
    }
}

impl From<u64> for BigInteger {
    fn from(value: u64) -> Self {
        //...
    }
}

Now we can use create a big integer like so:

let first = BigInteger::from(-1i32);
let second = BigInteger::from(10u64);

Behind the scenes, Rust will also implement Into for both i32 and u64, so that we can convert both those types to BigInteger by calling into.

fn add(first :BigInteger, second:BigInteger) -> BigInteger {
    //...
}

let x = 10i32;
let y = 20u64;
let z = add(x.into(),y.into());

As a matter of fact, it’s true that if we implement From for U, then the type T will get a so called blanket implementation of Into, courtesy of the standard library. That’s why it is recommended to implement From for U rather than Into for T. As a matter of fact, for Rust versions greater or equal 1.41, it’s not necessary to actually implement Into, ever ¹⁶. The reason is that we get the Into implementation for free when implementing From but not the other way round¹⁷.

In contrast to the recommendation on implementing conversion traits, if we want to constrain on them the advice is flipped around. That is, if we want to accept a type U that can be made into a T, we should constrain the type on U: Into rather than T: From, because there are possibly more types implementing the first trait bound than the latter. So if we want to make our add function generic and have it perform the addition without us having to manually call into, we would write it like so:

fn add<T,U> (first : T, second : U) -> BigInteger
    where T: Into<BigInteger>, 
          U: Into<BigInteger> {
    first.into() + second.into()
}

That is, provided our BigInteger type supports addition, which we can implement using the Add trait. But I’ll leave that for another time.

Coversions that Fail

This article is already pretty long, so I’ll keep this brief. Looking at the associated function signatures for From and Into, we can see that they cannot return an error. To implement a conversion that can return an error, we can use the TryFrom and TryInto traits. They work analogous to their From and Into counterparts, but allow us to specify an error type Error and return a Result to indicate conversions that can fail.

Summary

I hope I have covered all the important use cases of constructors in C++ and shown if and how we can map them to Rust. If not, please let me know and I’ll add more use cases here. The one point that I tried to hammer home is that Rust values explicitness. Explicit copies, clones, conversions and even being explicit in what is implemented, even if the compiler could trivially do so. When I go back to C++ now, I often find myself adhering to these more Rusty idioms.

Endnotes

It would not be C++ if there weren’t potentially many ways of initialization that interact in complex ways with each other. There’s even a book dedicated solely to this very topic. ↩
One drawback I can think of with using static functions as named constructors in C++ is that they won’t be useful in contexts like emplace or std::make_shared where we use perfect forwarding of constructor arguments for in place construction of objects. ↩
In an impl block, Self refers to the type itself. For our Rectangle we could just have written out Rectangle, but for more complex types involving generics it becomes tedious quickly. ↩
You can abuse the trait system to get something like overloading. Don’t do that. ↩
There is another, orthogonal, mechanism to signal failure: panic. I won’t go into detail here. ↩
Note the wording “can be called with no arguments”, not “takes no arguments”. See here. ↩
What I mean by that is that the copy is independent from the original, but it could still refer to some common data as is e.g. the case with std::shared_ptr. ↩
The copy constructor can of course panic. ↩
The wording here is a bit C++ inspired, but if we want to be absolutely precise, everything in Rust is passed by value, including references. In Rust, references are pointers with a whole lot of compile time guardrails, but pointers nonetheless. That’s why we have to dereference them. In that sense it’s closer to C than C++. A pointer itself in C is also passed by value, but it’s pointee may be modifyable, depending on constness. ↩
This is just to serve as an illustration of explicit cloning. I’m not saying this is the best API for that particular problem. ↩
It’s not a concept in the C++20 meaning of the word. ↩
It’s important to note that it’s undefined to rely on manual specializations of std::is_trivially_copyable. ↩
There are a handful of marker traits that the compiler will implicitly implement for you if appropriate. Those are called Auto Traits and are very carefully chosen. The most common auto traits that programmers interact with are the Send and Sync traits that are important for describing thread safety via the type system. ↩
Although Rust forces most conversions to be explicit, it does not abandon all forms of implict conversions either. There is a mechanism called Deref coercion. ↩
Reddit user u/quxfoo pointed out here that it might not be generally good advice to implement From (or Into) on references. ↩
The reason is that with Rust 1.41 the orphan rules changed and made manual implementations of Into unnecessary. Thanks to reddit user u/Zde-G for pointing that out here. ↩
The standard library implementors could just as well have chosen to do the blanket implementation the other way round. It’s just the way they chose to do it at the time. ↩

Curiously Cumbersome Rust: Type-level Programming

2023-09-01T00:00:00+00:00

The moment that spawned this article was when I asked myself how hard can it be to make sure two types have the same size at compile time? Well… it’s complicated. In here, we’ll do a deep dive into the limits of compile time metaprogamming in today’s (and tonight’s) Rust.

Motivation

In my last article I wrote a function to perform an in place mapping from a Vec to a Vec where the important precondition was that T and U have the same size and alignment. The function looked something like this:

fn map_in_place<T,U,F>(v: Vec<T>, mut f: F) -> Vec<U> 
    where F: FnMut(T) -> U {
    assert_eq!(std::alloc::Layout::<T>::new(), 
        std::alloc::Layout::<U>::new());
    // loads of unsafe code here
}

I won’t go into the unsafe code here because that was the topic of the aforementioned article. The thing that bugged me was that assert_eq! in there. Not the fact that it was in there at all, but that the panic would only occur at runtime ¹. As a die-hard metaprogramming fan, it felt weird to check a condition at runtime that we can clearly check at compile time. It would be great if we can stick this condition into the function signature, possibly using traits to make it blatantly obvious that we want the types T and U to have the same size and alignment.

Today’s Problem: Same Size for Two Types

For this post, let’s consider a slightly simplified problem and just check that the two types T and U have the same size and not bother with the alignment. This is just to keep the examples concise, because once we figure out how to check for size, adding an alignment requirement is trivial. So what we want is this:

fn do_somehting<T,U>(t: T, u: U) 
    // where: T and U have same size 
{
    // do something
}

// this compiles
do_something(1u8,2u8);
// this also compiles
do_somehting(1f32,2i32);
// this must not compile
do_something(1u8,2u32);

Our goal is to make the compiler accept the code only if our two types T and U have the same size and emit an error message otherwise. The compiler knows the size of a (sized) type at compiletime. That’s why it should be simple enough to find a solution that enforces identical size at compile time,right? Right?

Using Associated Constants

My first intuition was that associated constants would help us elegantly enforce trait bounds to restrict our do_something function arguments to types with same size.

There are many ways to skin this cat and my ideas were definitely influenced by how metaprogramming in C++ uses associated types and compile time constants to give us metafunctions. I married this with a more Rusty idea of trait bounds and I came up with the following, pretty straightforward, code:

pub trait SameSizeAs<T> {
    const VALUE: bool;
}

impl<T,U> SameSizeAs<T> for U {
    const VALUE: bool = 
        std::mem::size_of::<T>() == std::mem::size_of::<U>();
}

So what we do is implement a trait SameSizeAs for every type U, which indicates whether T and U have the same size via an associated constant. That’s not too bad. We can use the trait like so:

pub fn do_something<T,U>(t: T, u: U) 
where U: SameSizeAs<T,VALUE=true> {
    // do something
}

I find this pretty elegant and concise and it turns out the error messages are very readable if we try to call the function with two types of different size. There’s just one problem with this: it does not compile on stable Rust. Current stable Rust (1.72 at the time of writing) does not allow us to compare associated constants for equality; we need the feature associated_const_equality to compile it. I found that a bit disappointing because I liked the simplicity of the solution and I would like this to work on stable Rust.

For completeness let me link to another known way of using compile time booleans in where clauses via a clever combination of Const Generics and Traits. However, it requires the unstable feature generic_const_exprs. I won’t go into detail here but we will see this feature pop up in a different context.

Using Associated Types

So the problem with the solution above was that we cannot yet compare associated constants for equality in trait bounds. But we definitely can compare types for equality in trait bounds and so that is the core of the next approach I took. Translate the boolean values into types and do the equality comparison on the types rather than the values. So that’s what I tried next.

Coming from C++, I know that metaprogramming with types can get a bit hairy at times. However, I was pretty confident that I could find a solution. Because after all I was only trying to make the compiler enforce something that it already knows!

Now, since I didn’t want to use actual boolean values at compile time I had to translate the idea of booleans into types:

struct TrueType;
struct FalseType;

trait BoolType {}
impl BoolType for TrueType;
impl BoolType for FalseType;

Strictly speaking, the whole BoolType trait is not necessary but I feel it makes the downstream code easier to read. Now we can define a trait that tells us whether a type T that implements it has the same size as another type U:

pub trait SameSizeAs<U> {
    type Value : BoolType;
}

You can see why I like the BoolType trait here: it mirrors the syntax we would use to define the type of a struct field or an associated constant. Compare this implementation with the way we did it above. Finally, we can add a nice where clause into our function definition:

pub fn do_something<T,U>(t: T, u: U) 
where T: SameSizeAs<U,Value=TrueType> {
    // do something
}

This reads very similar to the constant based code above but now it is fine to write Value=TrueType in the where clause in stable Rust. The reason is that we are testing for equality of an associated type and not a compile time constant value.

Finally, there is just one thing missing and that is to write a blanket implementation for SameSizeAs that serves our purpose. We need to have some way to go from a compile time known condition (a const bool) to a type. Since Rust 1.51 we have Const Generics to help us make this transition. That’s the only way I saw how to do that. In C++ we would use a templated struct with boolean template parameters and associated typed. In Rust we can to a similar thing when we bring traits into the mix:

pub struct Condition<const B: bool>;

pub trait TruthType {
    type ValueType : BoolType;
}

impl TruthType for Condition<true> {
    type ValueType = TrueType;
}

impl TruthType for Condition<false> {
    type ValueType = FalseType;
}

We can use the struct and the trait together to go from a compile time known condition to a type. Unfortunately, as of the time of writing we cannot simply use it as Condition::ValueType but we have to use the fully qualified type so that the compiler can understand the associated type, even if it is actually unambiguous. That means we must use it as as TruthType>::ValueType which is a bit cumbersome but does the trick ²:

impl<T,U> SameSizeAs<U> for T 
where Condition<{core::mem::size_of::<T>() 
        == core::mem::size_of::<U>()}>: TruthType {
    type Value = <Condition<{core::mem::size_of::<T>() 
        == core::mem::size_of::<U>()}> as TruthType>::ValueType;
}

We have now created a metafunction that transforms a compile time known boolean into a type. We can use it to find out whether two given types are of the same size. That’s great and all, but we again have to use an unstable feature for that. This where the feature generic_const_exprs pops up again. We need this to use generic parameters T and U as part of the Const Generic parameter for Condition. It’s a bit unfortunate since the whole exercise was to go from a compile time boolean to a type and it seems to me we need an unstable feature to accomplish that in our particular case. I would be happy to be proven wrong here.

Be that as it may, we can now use our type and trait to restrict the generic types passed to our do_something function:

pub fn do_something<T,U>(t: T, u: U) 
where T: SameSizeAs<U,Value=TrueType> {
    // do something 
}

Now the compiler will let us invoke do_something with types of the same size and will give an error otherwise. I find it hard to compare which unsafe feature has a better chance of making it to stable soon, but it is worth noting that as of now, generic_const_exprs is still described as “highly experimental” in the associated tracking issue and that the compiler issues a dedicated warning when it is used.

Rethinking and Making it Work on Stable

There is another way to go about the whole problem, which does not involve traits. For a while stable Rust has offered the possibility of panicking in const evaluated contexts. A panic in const context will produce a compile error, though I can’t find the exact Rust version that stabilized it. Framing the problem like this makes it conceptually similar to a static_assert in C++, though it is not quite as straightforward.

What we need to do to invoke a const panic is to force the compiler to constant evaluate the panic. What we do is:

const ASSERTION : () = assert!(Cond,"condition was not satisfied");

Here, Cond needs to be a compile time known boolean. This code produces a compile error if and only if Cond evaluates to false. So now we might just try to replace the runtime assertion in our function by a compile time assertion like so:

fn do_something<T,U>(t: T, u: U) {
    const ASSERTION : () = assert!(core::mem::size_of::<T>()
                            ==core::mem::size_of::<U>(),
                           "T and U must have the same size");
    // do something
}

However, this does not compile becaues the compiler points to T and U with the error message use of generic parameter from outer function. What does that mean? The way it was explained to me is that const items exist as if they were global, even if they were defined inside a function. That is why we cannot access the generic parameters of the function in the const item ASSERTION. But there is a way around it. Let’s make ASSERTION an associated constant of a struct:

struct SameSize<T, U> {
    phantom: std::marker::PhantomData<(T, U)>,
}

impl<T,U> SameSize<T,U> {
    const ASSERTION: () = assert!(std::mem::size_of::<T>() 
                           == std::mem::size_of::<U>(),
                          "types do not have the same size");
}

Now what we have to do is force the creation of that constant inside the function. But we can’t just use another const item to do that because that would, again, not allow us to access the types T and U for the reasons stated above. However, we can do it in a context that is not const evaluated and whose only purpose is to force the monomorphization of the compile time assertion we are interested in.

pub fn do_something<T,U>(t: T, u: U) {
    _ = SameSize::<T,U>::ASSERTION;
    // do something
}

Now when we try to invoke do_something with types of different sizes the compiler will print an error message. This one finally works on stable Rust, which is pretty satisfying. However, while it is nice that this does work at compile time, there is no indication in the function signature that we require T and U to be of the same size. We must relegate this fact to the documentation.

Providing Fallback Implementations

The stated goal of this article was to enforce that T and U have the same size at compile time and we have achieved that in different ways, one of which works on stable. But what if we did not want to issue a compile error in case T and U have different sizes but rather provide a fallback implementation? Let’s go very briefly through the presented solutions starting with the last one:

I see no way of using compile time assertions for branching in code generation because their only purpose is to emit a compile error. So that one is out, I think. The case is different when using associated types in traits, because in principle we could write two incarnations of do_something: one where T: SameSizeAs and one where T: SameSizeAs. However, currently the trait solver in Rust does not recognize these two things as disjoint cases so that one won’t work yet. There’s some clever ways around those limitations, but I am not sure they’ll work for this case. You can read all about it –shameless plug incoming– in my article on mutually exclusive traits in Rust. Lastly, using associated constants: again, we could in theory write two implementations, but as of now the cases U: SameSizeAs and U: SameSizeAs are not recognized as disjoint. However, it is stated as a future goal in the associated tracking issue.

If you are aware of specialization you’ll recognize that this would offer another way of providing a fallback implementation. It does not work quite like the solutions outlined above but it can be used to achieve something to that effect. Specialization is a big complex of features that is, as of the time of writing, unsound and even a minimal subset is still unstable.

Fallback Implementations without Specialization

This section is an idea that was posted by reddit user u/Dragon-Hatcher here that has to be one of the most brilliant applications of the KISS principle I have seen. Let’s just do the obvious thing and stick an if inside the function like so:

fn do_stuff<T,U>(t: T, u: U) {
    use std::mem::size_of;
    if size_of::<T>() == size_of::<U>() {
        // do one thing
    } else {
        // do another thing
    }
}

What happens here is that the compiler will evaluate the condition at compile time and just optimize out the branch that is not taken. Don’t believe me? Try it on godbolt. The one thing I don’t know is that the compiler will always evaluate a const fn at compile time when it can ³, but here it clearly works.

Final Thoughts

First of all, I’m happy to hear all the things I got wrong in this article because this is indeed a complex topic. Secondly, I would be interested in other ways to solve this problem that I missed here, especially ones that work on stable.

While this writeup has been fun, it has demonstrated to me that interacting with types in nontrivial ways during metaprogramming in Rust is hard, especially in the context of conditional compilation. Furthermore, the trait system still has some rough edges, where stuff that intuitively should work does not ⁴. That’s a compliment to Rust because it is surprising to run into these problems in such a well designed language. I’m also not trying to say that the current trait system is badly implemented because when it works (which is almost all of the time) it works amazingly, but this exercise would have been a oneliner in Modern C++™ ⁵.

Endnotes

The actual condition inside the assert is likely evaluated at compile time and the code is optimized accordingly, but the panic will occur only at runtime. Thanks Shnatsel for pointing that out here. ↩
If it strikes you as odd that we have to repeat the exact same condition in the where clause that we used in the body, you are not alone. In principle the compiler should know that TruthType is implemented for all incarnations of Condition. It also does not help if we write where Condition: TruthType, Condition:TruthType. I suspect those are limitations in the current trait solver. ↩
Meaning when we don’t force the evaluation in a const context. ↩
There are efforts to implement a new trait solver with the aim of improving the current situation. Thanks to reddit user u/Sharlinator for pointing it out here that this was not Chalk, as I had stated in a previous version of this endnote. ↩
I know C++ has massive problems and I will choose Rust over it any time but the (non macro based) metaprogramming and conditional compilation is currently stronger in C++. Though for normal (non-meta) usecases Traits beat Concepts any day of the week. ↩

Learn Unsafe Rust From My Mistakes

2023-07-24T00:00:00+00:00

A project of mine required me to dive into unsafe Rust and when I was done with it, I had understood something that I wanted to share. However, since I wasn’t sure if I made any subtle mistakes, I did ask the community to review my code and oh boy did it turn out that I had missed some vital things. Bear with me and hopefully you’ll gather something useful, too.

A Warning

Be warned that most (possibly all) the examples in this post contain unsafe code with bugs of varying subtlety. Do not blindly copy code from here. If you read till the end you’ll learn why you will likely never have to bother with the particular code in this post… and also that you’ve been using it already, maybe unknowingly.

If you spot errors in this article please do reach out either via the commentary on this page or shoot me a mail using the link at the bottom of this page.

The Task At Hand: In Place Mapping

The task we’re trying to tackle here is to transform a Vec into a Vec in place, given that types T and U have the same memory layout, i.e. the same size and alignment. Transforming in place means we are reusing the storage of the initial vector. Our mapping function will look something like this:

fn map_in_place<T,U,F>(v: Vec<T>, mut f: F) -> Vec<U> 
    where F: FnMut(T) -> U {
    assert_eq!(std::alloc::Layout::<T>::new(), 
        std::alloc::Layout::<U>::new());

    todo!()
}

Vec does not expose an obvious, high level, and safe API to accomplish what we want ¹, so we have to dive into unsafe. The rest of this article is concerned with replacing the todo!(), but let’s take a step back first.

Unsafe Rust Confusion

I’ve struggled a lot with understanding when and how to use unsafe Rust. Part of the reason is that there is a (very justified) hesitation to use unsafe code within the Rust ecosystem and it’s a clear plus when a crate advertises itself as written in 100% safe Rust ². Then add to that some overly simplistic semtiments I’ve come across such as unsafe Rust is not about circumventing the borrow checker. I think this implanted the idea in my head that I could spot incorrect usage of unsafe code by the mere fact that it was “circumventing the borrow checker”, whatever that meant. It turned out that this wasn’t much of a help that and I needed a better mental model.

Unsafe Is Not About Circumventing Anything

A very important realization for me was to stop thinking about whether or not I was circumventing the borrow checker with unsafe Rust. I’ll try to reframe it in this section. The first thing to realize is that the interaction between the borrow checker and unsafe is too narrow of a view. It’s about the interaction of the fundamental language rules of Rust with unsafe code. The borrow checker is a well known part of the Rust language and it enforces the aliasing rules. It’s an important part of what makes safe Rust memory safe, but it’s only a part of what gives the language its safety guarantees.

The second thing to realize is that unsafe does not change any fundamental rules of the language and so it also does not, for example, turn off the borrow checker. Take a look at this invalid Rust code:

// does not compile
let x = 10;
let r1 = &mut x;
let r2 = &mut x;
// ...

The compiler will reject this code because we are violating one of Rust’s basic assumptions by trying to take two mutable references to one piece of data ³. Merely placing the same code into an unsafe block does not make it valid Rust code. The aliasing rules for references apply everywhere, so the compiler will always assume they are true. It will stop you from violating them wherever it can.

unsafe {
  // still rejected
  let mut x = 10;
  let r1 = &mut x;
  let r2 = &mut x;
  // ...
}

Now let’s look at this example, where we use unsafe to eventually obtain two mutable references:

let mut x = 10;
let ptr: *mut i32 = &mut x;
unsafe {
  let r1 = &mut *ptr;
  let r2 = &mut *ptr;
  // ...
}

This code compiles, so we have just circumvented Rust’s Borrow Checker using unsafe, haven’t we? In a way yes, but that is not a helpful way to think about it. The problem is not that we have done something that the borrow checker would not allow us to do, the problem is that we have violated the aliasing rules of the language when we used the powers bestowed upon us via the unsafe keyword. In unsafe Rust, the compiler lets us work with raw pointers (who are outside the scope of the borrow checker) but it expects us to still adhere to the rules of the language. In this case we have created two mutable references to one piece of data, which breaks the aliasing rules.

The compiler always assumes that the language rules apply and subsequently that two mutable references can never point to the same memory. It is allowed to optimize our program as if that assumption is always true and that will, in turn, result in the dreaded undefined behavior… even in an unsafe block because –and I know I am belaboring this point– unsafe Rust still assumes the rules of safe Rust are unbroken.

Detecting Rule Violations

At this point you might be wondering if there are any tools to help you detect rule violations and the answer is Miri. It is an analysis tool that can run your program or test suite and detect certain kinds of undefined behavior by using an interpreter for Rust’s mid-level IR. You can use it as a cargo plugin with cargo miri run or cargo miri test.

This is immensely helpful for finding some classes of undefined behavior, but there are some caveats. Because Miri is an interpreter it is much slower than the compiled binary, so running your whole test suite or program might not be feasible. Furthermore it works by running your code through the interpreter and in this sense it works at runtime. That means even if it is theoretically able to detect a source of undefined behavior (UB), you must actually hit the UB during a run. This is all the more reason to have an exhaustive test suite for your unsafe code and keep in mind that are certain classes of UB that Miri does not detect regardless.

Unsafe as a Gateway

Among other things, unsafe gives us the power to dereference raw pointers and to call unsafe functions ⁴. The language rules that apply to references are not enforced on raw pointers. That is not an accident but one of the defining features of pointers. We are able to use them to write correct programs that the borrow checker would reject because it errs on the side of caution. A famous example is implementing doubly linked lists.

In unsafe Rust it is now our responsibility to enforce the language rules. It is especially easy to make mistakes when transitioning from unsafe constructs to safe constructs, like we did above when transitioning from pointers to references ⁵. In unsafe land we are able to express things that we cannot in safe Rust, such as I need multiple mutable references to one piece of data. You must never actually use two mutable references (even if you trick the compiler) but it is perfectly fine to use two pointers to the same data. In fact, pointers are exactly the language construct we should use for that particular problem, because there is no way to express the same intent in safe Rust ⁶.

let mut x = 10;
let p1: *mut i32 = &mut x;
let p2 = p1;
unsafe {
  // ...
}

This is perfectly fine Rust code ⁷ and the compiler will not break our code by making assumptions about what the pointers can or can’t point to. Again, it would not be very helpful to frame this as circumventing the borrow checker, because the same thing could be said about the broken code futher above. We have now stepped into unsafe land and there is things we can do in unsafe land that we cannot simply do in safe land. If you only think of the part where we are “circumventing” the borrow checker both code snippets would be equivalent, but they are not. Using unsafe code to express things we cannot express in safe Rust is one of the major usecases of unsafe. On the other hand, using unsafe code to make safe language constructs behave in forbidden ways is an abuse.

So one of the lessons for me was to learn to use unsafe constructs more comfortably and not try to weasel my way back into safe constructs as soon as possible. However, I’ve found the ergonomics of using unsafe constructs (such as pointers) much more cumbersome than using safe language constructs (such as references) and that makes it very tempting to cross the border prematurely and write broken code. Let’s take a look how all of this applies to our in place mapping problem.

The Transformative Unsafe Journey

Armed with the understanding above I set out on my journey of implementing
the in-place mapping function. I was not going to make the rookie mistake of using pointers to get me some illegal references, no siree. I was going to stay in pointer-land as long as I needed to and everything would be fine… so I thought.

A Clear, Simple (and Wrong) Solution

So let’s have a look at a first solution that avoids the obvious error of violating Rust’s aliasing rules and does quite a few things correctly.

fn map_in_place<T,U,F>(v: Vec<T>, mut f: F) -> Vec<U> 
where F: FnMut(T) -> U {
    assert_eq!(Layout::new::<T>(), Layout::new::<U>());
    unsafe {
        // (1)
        let (pstart, len, cap) = v.into_raw_parts();
        // (2)
        for pt in (0..len).map(|j| pstart.add(j)) {
            // (3)
            let t = pt.read();
            let u = f(t);
            // (4)
            let pu: *mut U = pt.cast();
            pu.write(u);
        }
        // (5)
        Vec::from_raw_parts(pstart.cast(), len, cap)
    }
}

I should mention that the latest stable Rust version at the time of writing is Rust 1.71, which is probably not terribly important but it should be noted nonetheless. The code above uses the unstable Vec::into_raw_parts API just for clarity. The effect can be very easily replicated in stable Rust.

Let’s pretend that this was my first draft of the code ⁸. Before we go into the problems with this function let’s look at the code line by line. After making sure the types T and U have the same memory layout, ① we destructure the vector to obtain its raw parts: The pointer pstart to the first element, the number of elements len, and the capacity cap. ② Then we iterate through the elements of the vector via the pointer pt of type *mut T. ③ Now we read the element into our stack variable t and transform it into an element u of type U. Using pt.read() makes this work even for non-Copy elements because it will just perform a simple memory copy. ④ Then we obtain a second pointer to the element that we are iterating over. This pointer pu is a of type *mut U. We are using cast rather than as to cast the pointer, which is a good practice because it will catch changes in mutability at compile time ⁹. Then we write the transformed value to the memory location using pu. Note that we are using pu.write(u) instead of *pu = u because the latter must drop the value behind the pointer before assigning to it to avoid possible memory leaks ¹⁰. This would be a giant problem because the contained value would be dropped as if it was of type U, but is actually of type T ¹¹. If we use write the pointed-to value does intentionally not get dropped. ⑤ Finally we piece a new Vec together from the transformed storage.

You can see that we already took care of a lot of details that are easy to miss and yet the logic is still broken. Let’s see why.

Panic Safety

Yes, panic safety. Coming from a C++ background it’s something that I should be much more mindful of, but I usually forget to take it into account. The reason is that panics in Rust are semantically very different from exceptions and the general advice is not to just catch them as you would catch exceptions in C++. And while by default a panic will unwind the stack and call destructors in an orderly fashion, that behavior can be changed to just abort. In essence, this is what makes it easy (at least for me) to forget that code should behave correctly even in case of a panic, whether or not a user relies on it.

When the function f panics at some point during the loop there are three things we need to make sure ¹²: Firstly, we need to call the destructors of all elements that have been transformed to type U. Secondly, we need to call the destructors of all the remaining elements of type T and thirdly we need to deallocate the storage of the vector to prevent a memory leak ¹³. One solution to do this is to implement a helper structure that keeps track of the elements while they are transitioning from T to U and take care that they are appropriately dropped in case of a panic. First we create an untagged union and the helper structure like so:

union Union<T, U> {
    pub first: ManuallyDrop<T>,
    pub second: ManuallyDrop<U>,
}

struct TransitioningVec<T, U> {
    vector: Vec<Union<T, U>>,
    u_len: usize,
}

The union type is our way of allowing us to store the elements in a vector regardless of whether they are of type T or U. Then, when vector is dropped it will free the storage correctly, but it cannot call the destructors. We have to do that ourselves and for that we have to keep track of the number of elements u_len that have been transformed from T to U. The reason that we enclosed the types of the union variants in a ManuallyDrop is that the compiler cannot know which variant a union holds (since they are not tagged, like enums), so it cannot call the appropriate destructors. Hence we are not allowed to use types that have nontrivial destructors as union variants. To create this helper structure from our initial Vec instance we write this constructor:

impl<T, U> TransitioningVec<T, U> {
    pub fn new(v: Vec<T>) -> Self {
        assert_eq!(Layout::new::<T>(), Layout::new::<U>());
        let (ptr,len,cap) = v.into_raw_parts();
        let data = unsafe { Vec::from_raw_parts(ptr.cast(), len, cap) };
        Self {
            vector: data,
            u_len: 0,
        }
    }
}

We set the number of transformed elements to zero and we change the data type of the vector, so that we can store both Ts and Us in it but we leave the actual data untouched. Before we get to the implementation of the mapping functionality, we need to implement Drop for our helper structure so it can act appropriately when it is dropped while the elements are still transitioning:

impl<T, U> Drop for TransitioningVec<T, U> {
    fn drop(&mut self) {
        let start = self.vector.as_mut_ptr();
        unsafe {
            let u_slice :&mut [U] = std::slice::from_raw_parts_mut(
                start.cast(),
                self.u_len);
            let t_slice :&mut [T] = std::slice::from_raw_parts_mut(
                start.add(self.u_len).cast(),
                self.vector.len() - self.u_len,
            );
            std::ptr::drop_in_place(u_slice);
            std::ptr::drop_in_place(t_slice);
        }
    }
}

We will transform elements from “left to right” and we keep track of the number u_len of transformed elements. This allows us to split the memory into two slices of elements, first of type U and the second of type T. We then drop those slices individually, making sure that the appropriate destructors are called. If you are like me, then you might be tempted to loop over the elements individually and drop them. Don’t do that, the Rust typesystem is your friend and it will understand that you are dropping slices and it will do the correct thing for you ¹⁴.

Finally we can implement the actual functionality in an associated function like so:

impl<T, U> TransitioningVec<T, U> {
    #[inline]
    pub fn map_in_place<F: FnMut(T) -> U>(mut self, mut f: F) -> Vec<U> {
        // (1)
        let start_ptr: *mut T = self.vector.as_mut_ptr().cast();
        while self.u_len < self.vector.len() {
            unsafe {
                let t_ptr = start_ptr.add(self.u_len);
                let u_ptr: *mut U = t_ptr.cast();
                let t = t_ptr.read();
                u_ptr.write(f(t));
            }
            self.u_len += 1;
        }
        // (2)
        let mut me = ManuallyDrop::new(self);
        unsafe {
            Vec::from_raw_parts(
                me.vector.as_mut_ptr().cast(),
                me.vector.len(),
                me.vector.capacity(),
            )
        }
    }
}

① This loop is conceptually identical to the one we had previously, but now we keep track of the number of elements we have transformed. If the function f panics at any point, the destructor of our TransitioningVec instance will get invoked and destroy the elements appropriately and free the allocated storage allocated by dropping its vector field. ② Here all elements have completed their transformation, so we are making sure that the destructor of our instance does not get called anymore and we return a Vec that is now the sole owner of the transformed elements and the allocated storage.

Since we don’t want to expose this helper type publicly, we hide it in the map_in_place free function like so:

fn map_in_place<T, U, F>(v: Vec<T>, f: F) -> Vec<U>
where
    F: FnMut(T) -> U,
{
    TransitioningVec::new(v).map_in_place(f)
}

And voilà we’re done… or are we?

Is That It? All Good Now?

No. For example, we haven’t handled zero sized types yet. The code above implicitly assumes that the elements of the vector have a nonzero size in memory. I think that is the last piece of the puzzle to make this sound, but please do reach out if there is more unsound code in my examples or mistakes in my explanations.

Update: More Mistakes

What follows is a collection of additional problems pointed out by readers.

Panic Double Drop

This one was pointed out by reddit user u/MaxVerevkin here. Say the function f panics when we transform the element at index n. This means the u_len will hold the index n at which the panic occurred, since it would get incremented only after f returns. The element at this index will get dropped twice: once when the scope of the mapping function ends and once when we drop the slice of Ts in the destructor. That’s a problem. Their solution is to replace this line in the destructor

let t_slice: &mut [T] = std::slice::from_raw_parts_mut(
    start.add(self.u_len).cast(),
    self.vector.len() - self.u_len,
);

with

let t_slice: &mut [T] = std::slice::from_raw_parts_mut(
    start.add(self.u_len + 1).cast(),
    self.vector.len() - self.u_len - 1,
);

This makes sense to me since the element at position u_len will have been dropped as a T when the mapping function unwound its stack. Further, we can only enter the destructor of our helper when vector.len() >= 1 and u_len < len.

How To Do It Safely

Okay. Somewhere further above I alluded to the fact that there was a better way of doing all this. One that does not require us to deal with the unsafe details first hand. But didn’t I just write that map_in_place was never stabilized and the whole functionality got removed? Not quite, the magic happens somewhere else now. The way to do the in-place transformation of a vector in today’s Rust (1.71 at the time of writing) is:

v.into_iter().map(f).collect()

Don’t take my word for it, try it on godbolt. And yes, I know: you’re not getting the 15 minutes of your life back (unless you scrolled ahead in which case shame on you 😆).

The iterator implementations are smart enough below the hood to specialize implementations in case the types T and U have the same memory layout and so they will perform the transformation without reallocating. This fact is not explicitly guaranteed or documented but I was pretty mind blown when I learned this. This is truly a zero cost abstraction if I ever saw one.

Endnotes

It will turn out that there is, in fact, a high-level API to achieve this. It’ll even turn out that the API itself is pretty obvious but the fact that it does the transformation in place is not. ↩
I also see this as an advantage because it means I can trust that someone else’s code is free of many classes of bugs. That’s great. ↩
As a matter of fact, even if r2 had merely been an immutable reference, this code would have been rejected by the compiler. ↩
There’s a couple more things that unsafe lets us do, the Rust Book lovingly calls them unsafe superpowers. ↩
Yes, I am referring to pointers as unsafe constructs. Yes, I know that you can create them in safe code but you can’t truly use them, so for all intents and purposes they are a language construct in unsafe land. ↩
Of course there are safe wrappers, like Arc> that serve a conceptually similar, but very different use case. If we want to manipulate data on a low level, pointers are what we want. ↩
It wasn’t always. Perfectly fine code, that is. See the comments from @GoldsteinE below. ↩
It was not. I’ll link the full story further below. ↩
As pointed out by Rust forum member H2CO3 here. ↩
As pointed out by forum members H2CO3 and kpreid independently here. ↩
If you are wondering who drops the value pointed to by pt: Since we assigned that value to t by memcopying it, its Drop logic will be executed when t is dropped. ↩
All of this was pointed out to me by forum user scottmcm here. ↩
If nobody catches the panic and the program terminates, an operating system (if present) will take care of freeing the allocated storage, but it won’t call the destructors. The destructors might perform important logic like communicating with external processes or hardware. ↩
This was pointed out to me by forum users H2CO3 and steffahn here ↩

Rust Deep Dive: Borked Vtables and Barking Cats

2023-03-15T00:00:00+00:00

No, this post does not contain cruelty towards animals but only to our own sanity. We will explore a particular aspect of how Rust’s trait objects work behind the scenes and take a deep dive down the rabbit hole. Sometimes it’s good to be reminded that all the nice things we have as programmers are just sugar on top of ones and zeros in the imagination of some sand that we tricked into thinking.

Inspiration and Motivation

This post was inspired by this brilliantly titled video on the always entertaining and instructive Creel YouTube channel. In that video, the author shows how dynamic dispatch with inheritance works in C++ and how we can break it in interesting ways. We are going to take a look at how a similar thing can be achieved in Rust with trait objects. At the end of the post we’re going to make this piece of code work:

let mut kitty: Box<dyn Pet> = Box::new(Cat::new("Kitty"));
// ... some magic ...
greet_pet(kitty);

and generate this output

You: Hello Kitty!
Kitty: Woof!

which indicates that something very peculiar is going on with our cat, because clearly it should go "Meow!" and not "Woof!". The reason we snuck a mut in front of the kitty will become apparent once we work our evil magic. But first let’s take a step back.

Dynamic Polymorphism: Meowing Cats and Barking Dogs

What we saw at work in our listing above is dynamic polymorphism. Wikipedia has the following to say about polymorphism in general:

In programming language theory and type theory, polymorphism is the provision of a single interface to entities of different types

Dynamic polymorphism is the kind of polymorphism that happens at runtime, in contrast to e.g. static polymorphism with generics that happens at compile time. There are different ways of achieving dynamic polymorphism, but for this article I am interested in the kind of dynamic polymorphism that works with Rust’s trait objects and pointers to them.

In a more object-oriented language¹ like C++ (or Java), the equivalent concept is dynamic dispatch through inheritance hierarchies. With it, we can call the methods of a derived class via a pointer (or reference) to its base class. Inheritance is the classic object oriented way of enabling dynamic polymorphism. In Rust we don’t have inheritance but we have traits and trait objects. Let’s look at a silly example that will accompany us through the rest of the post. We have a Pet trait and an implementor Cat like so. Feel free to skim the next part, because it’s all just boilerplate and none of it will surprise you if you’ve ever implemented a trait.

trait Pet {
  fn name(&self) -> String;
  fn sound(&self) -> String;
}

struct Cat {
  life : u8, //keeping track of the 9 lives
  age : u8,
  my_name : String,
}

impl Cat {
  pub fn new(name : impl Into<String>) -> Self {
    Self {
      my_name: name.into(),
      age : 0,
      life : 0,
    }
  }
}

impl Pet for Cat {
  fn name(&self) -> String {
    &self.my_name
  }
  fn sound(&self) -> String {
    "Meow!".into()
  }
}

We could also implement all kinds of other pet types, like Dog, Bird, and so on. You get the idea. Finally, with this boilerplate out of the way we can implement a function to greet a Pet trait object like so:

fn greet_pet(pet : Box<dyn Pet>) {
  println!("You: Hello {}", pet.name());
  println!("{}: {}", pet.name(), pet.sound());
}

This way, we can pass in the kitty instance from above and get a completely unsurprising output:

You: Hello Kitty!
Kitty: Meow!

Dynamic polymorphism using trait objects is what makes this code work. If we passed in a dog instance (assuming we have coded a Dog type), we would get an output such as Woof! upon greeting a Dog instance. The greet_pet function calls the correct sound(...) member function of either the cat or the dog or any other type for which we chose to implement the Pet trait.

How does it know which sound(...) member function to call at runtime? Because remember, this will even work for a vector of random instances of trait objects implementing the Pet trait ². So the compiler has no way to call the correct method at compile time as would be the case if we had used generics. So what’s the magic here?

A Peek Behind the Curtain: Vtables

A vtable ³, or virtual function table, is what makes the magic above work. We’ll take a step by step look at what those are and how they help to accomplish this. Bear with me, I promise there will be a nice graphic long before this is all over.

Hidden Vtables

When we implement the Pet trait for Cat, the compiler generates a vtable instance, which is a hidden data structure that it puts into our program’s binary. This is our Pet-vtable for Cat. There is also going to be a vtable instance for every other type that implements the Pet trait, meaning a Pet-vtable for Dog, for Bird and so on. Before we go any further let me emphasize that we are entering territory that is dangerous, evolving and not intended to be messed with. The specifics of how all the things described in this post are implemented in the Rust compiler might change at any point in time ⁴.

We’ll get to the specific layout of a vtable below, but for now suffice it to say that a vtable is a contiguous piece of storage in memory that is (mostly) an array of function pointers. The Pet-vtable for Cat contains function pointers to the implementations of the trait methods for Cat, while the Pet-vtable for Dog contains pointers to the implementations for Dog, and so on. And this helps us solve dynamic dispatch at runtime if, for every trait object Box, we keep track of which vtable is associated with a particular instance of a trait object. Let’s now look at how that association is made.

Hidden Pointers to Vtables

A naive approach would be to store the whole vtable instance as part of a trait object. But that would be wasteful for multiple reasons: First of all this approach would add a number of pointer members to each instance, which will waste precious memory. Secondly, not every instance of a cat requires its own vtable instance. All trait related functions pointers of one type would point to the same functions anyways. Specifically, for our cat example the function pointer for sound will always point to the code for the ::sound function. That is true for all instances of Cat. Thus, we only need one vtable per type, so it makes sense to create one global instance of this vtable and refer to it through pointers. Both the Rust compiler and the commonly used C++ compilers do it like that, but there is a crucial difference in how they keep track of the pointer to the vtable. In C++, it is common to make the pointer to the vtable a hidden member of each instance of a class or struct ⁵. The Rust compiler goes a different route and uses so called fat pointers ⁶.

Fat Pointers

Maybe you’ve heard of fat pointers in the context of slices, where a slice is really just a tuple of two elements ⁷: the first element is the pointer to the beginning of the data and the second element is the length of the slice. But if you’re like me you will be (or already were) surprised to learn that the pointer types Box, &T, and &mut T are different from the pointer types Box, &dyn Trait, and &mut dyn Trait. The latter are fat pointers. They, again, consist of two elements: their first element is the pointer to the actual data (the T instance) and the second is the pointer to the associated vtable instance (the Trait-vtable for type T).

(Fat) Pointer and Vtable Memory Layout

We now have all the pieces together to understand how pointers to trait objects work and how to mess with them. Before we start doing that though, let me summarize what we saw so far in a graphic:

Figure 1. Pointer and vtable layout for a type implementing a single trait. Plain old pointers just store the address of the data. The pointers to dyn Trait objects store both the address of the data as well as a pointer to the vtable. There is one global vtable instance for each type T implementing trait Trait. Meaning all dynamic cats point to the same vtable, while all dynamic dogs point another one.

The figure was adapted from Ralph Levien’s now out of date container cheat sheet. We see that the vtable contains a function pointer to the destructor, then usize fields for size and alignment, respectively, and finally the pointers to the member functions in order of declaration ⁸.

Some words of warning: this figure is accurate enough at the time of writing with the Rust compiler version 1.68.0, but it does not show the full picture. If supertraits get involved, the vtable gets more complicated to accomodate the planned trait upcasting feature. The most comprehensive documentation on the current vtable layout I could find is here. But it also pays to look at the rustc source code along with this helpful answer on a reddit thread on the topic. So much for the nitty-gritty details, now let’s try to get our hands dirty and venture even deeper into don’t-try-this-in-prod territory.

Fun With Vtables

It’s time to revisit the code from above and see if we can’t make our kitty go woof. Let’s first create a data structure for our vtable so that we can manipulate it with some more finesse.

#[repr(C)]
#[derive(Copy, Clone)]
struct PetVtable {
    drop : fn(*mut c_void),
    size : usize,
    align : usize,
    sound : fn(*const c_void) -> String,
    name : fn(*const c_void) -> String,
}

By making the struct #[repr(C)], we make sure that it has the same memory layout as the Pet-vtable. For the function pointers we have basically copied the signature of the member functions, with one important difference. We have made the &self parameter a pointer to void, so that we can reuse the structure for any implementor of Pet. Still, we have some degree of type safety by choosing appropriate integer and function pointer types for the members. Now let’s revisit the code from above and see how to mess with the vtable.

const POINTER_SIZE : usize = std::mem::size_of::<usize>();

fn main() {
    unsafe {
        // (1)
        let mut kitty : Box<dyn Pet> = Box::new(Cat::new("Kitty"));
        // (2)
        let addr_of_data_ptr = &mut kitty as *mut _ as *mut c_void as usize;
        // (3)
        let addr_of_pointer_to_vtable = addr_of_data_ptr + POINTER_SIZE;
        // (4)
        let ptr_to_ptr_to_vtable = addr_of_pointer_to_vtable as *mut *const PetVtable;
        // (5)
        let mut new_vtable = **ptr_to_ptr_to_vtable; 
        // (6)
        new_vtable.sound = bark;
        // (7)
        *ptr_to_ptr_to_vtable = &new_vtable;

        greet_pet(kitty);
    }
}

fn bark(_this : *const c_void) -> String {
    "Woof!".to_string()
}

This will produce a barking cat as teased above. Try this example on the playground.

You: Hello, Kitty!
Kitty: Woof!

Okay, now let’s take this step by step: ① We create a pointer to a Pet trait object with a cat instance. From the section above we know that kitty consists of two elements right next to each other in memory: the pointer to the data and the pointer to the vtable. ② Now we find out the address of kitty and store it as an integer (because we can, since the value of a pointer is just an address). It’s important to note that we really care about the address of kitty and not the address of the cat instance that it points to. The address of kitty is also the address of the data pointer. ③ Now we add one pointer size in bytes to this value, which will give us the address of the pointer to the vtable. ④ Finally we cast this address into a pointer. Note that the type we now have is a pointer to pointer to PetVtable. If you are confused why we need two pointer indirections, bear with me. I’ll explain later. ⑤ Here, we copy the vtable into a stack variable, which is possible because our vtable derives Copy. Since we have two pointer indirections, we have to dereference twice. ⑥ Next, we make the function pointer for the sound member function point to the bark function, which is just a free function that has the correct signature. It takes a void pointer as its first argument, which is the reference to self, aka the pointer-to-data part of the fat pointer. ⑦ Finally we make the pointer to the vtable point to the newly created vtable. When passed to the greet_pet function the kitty will now bark.

References or Boxes

The code above works just the same if we had used references instead of boxes. The memory layout for both pointer types is the same. The only difference is where the pointed-to element is stored.

(Im)mutable Vtables

In the code above, we used two pointer indirections to manipulate the vtable. We copied the existing vtable into a new one, manipulated the new vtable and then set the pointer-to-vtable to point to the new vtable. Couldn’t we just grab a mutable pointer to the vtable itself and make the sound function point somewhere else?

Let’s think about what it would mean if we could do that: we could change the vtable for all Cat instances in our program present and future, because only one vtable instance is created for the Cat type. All trait object instances point to the same vtable. If it was possible to manipulate it, then this could turn into all kinds of nightmares quickly (debugging, security, you name it). This is why the compiler places all vtables into a special read only section of the program binary, which will make it a runtime error to try to write to it. That’s a good thing.

We’re Not in Kansas Anymore

Before we end this adventure, let’s try and break the whole thing some more. In the code above, we have made the function pointer members type safe. Granted, we did use pointers to void, but we have restricted the function signatures. Very reasonable, but why would we restrict ourselves like that? We’ve already been naughty and treated pointers as numbers, so let’s alter our vtable structure a bit and remove any semblance of typesafety from it.

#[repr(C)]
#[derive(Copy, Clone)]
struct RawPetVtable {
    drop : usize,
    size : usize,
    align : usize,
    sound : usize,
    name : usize,
}

Now we can stick any address (any number really) into the function pointers. Let’s now create two functions that do not obey the expected signature at all:

fn bark2() -> String {
    "Woof!".to_string()
}

fn add(a : usize, b : usize) -> String {
    format!("{} + {} = {}", a, b, a + b)
}

The bark2 function barks but it takes no arguments. We expect the sound member function to be called with exactly one argument, which is the address of self. The add function is even wackier in this context. It takes not one but two arguments. The first argument is the address of self surely, but where does the second argument come from? Well, as Darth Vader once said to poor Lando: “I am altering the deal. Pray I don’t alter it any further…” At least we give the program the expected return value!

I’ve got a link to the playground with the code here. There’s no new surprises in there, so feel free to explore the code for yourself. Let’s take a look at the outputs. If we assign the bark2 function address to the sound member of the vtable we get the following output:

You: Hello, Kitty!
Kitty: Woof!

That’s surprising, isn’t it? Not the "Woof!", we’re already used to that. However, the fact that the program even runs without crashing might surprise us. The code just works even though the function signature is missing an argument. Now let’s look at what happens if we put the add function address into the sound member. That output is not deterministic but it will look something like this:

You: Hello, Kitty!
Kitty: 93842120210704 + 93842100797168 = 187684221007872

We learn two things. Our kitty is surprisingly good at math and the program still runs. Good grief, why? Well the thing is: function pointers are just addresses to code and code is just ones and zeros. Typesafety is an illusion that exists while the program is being compiled but not after. The CPU just executes the instructions that it was told to jump to. Those will just go ahead and operate on the data that is found at the location of the function arguments, whether someone put something useful in there or not.

Conclusion

As promised we took a deep dive into one particular aspect of trait objects, but in many respects we only scratched the surface. As promised, we did go down some rabbit holes to remind us that, as programmers, we really just live in a world of ones and zeros with a whole lot of sugar poured on top of it ⁹.

I know there is some confusion around traits and trait objects and I wish I could say that this deep dive cleared all (or even any) of that up, but I don’t think that’s the case. If anything this probably left you more confused… all I can hope for is that some fun was had while reading this. I sure had fun writing it.

Endnotes

I mean that C++ is a more object-oriented (OO) language than Rust, not that C++ is a purely an OO language. Further, I don’t want to imply that Rust is an OO language at all. ↩
Not every trait in Rust can be made into a trait object. The key concept here is object safety. In this article, we are only concerned with object safe traits. ↩
Also called virtual method tables, but they also go by many other names. Pronounced “veeh-table”. Vtables are commonly used in C++ compilers for dispatching to the method of a derived class via a pointer to its base class. The virtual keyword plays an important role in dynamic dispatch via inheritance in C++, hence the V in vtable. Our animal example in C++ would be calling the Cat::sound member function via a pointer to super class Pet, where the Cat class derives from Pet, which has a virtual member function sound(). I’ll leave it at that for now and I urge anyone interested in the C++ aspects to check out the aforementioned video on the Creel YouTube channel. ↩
It’s also important that vtables aren’t really a language feature but an implementation detail of the compiler, as reddit user u/myrrlyn pointed out here. It might change in future versions of rustc and might be even different in other rust compilers, once they become available. ↩
The vtable pointer in C++ may e.g. be placed at the beginning of an object. This is why you must not rely on the fact that the address of an object is also the address of its first member in C++. ↩
I’m not going to pretend to understand why Rust chose fat pointers over thin ones, but if you are interested, here and here are some insightful discussions on the topic. ↩
A twople,… get it? Sorry about that… ↩
If you’re wondering why a vtable contains size and alignment, you’re not alone. Let me also link a nifty crate that takes on the very niche problem of providing &dyn Fn but without the vtable. ↩
Even those ones and zeros are abstractions the physical realities involving electrons. ↩

Now With RSS/Atom Feed

2023-03-02T00:00:00+00:00

This blog now has a feed to which you can subscribe using the RSS logo at the top or at the bottom of the page. Or by clicking here. Updates to this blog are not frequent, but they are mostly high-effort posts and I am happy if they reach a lot of people, so if you like my stuff please subscribe :)

The feed uses the jekyll-feed gem, which is very simple to set up. If you’ve read this far, then you have surely spotted my lie. The feed is not really an RSS feed… it’s an Atom feed, but the typical feed readers can usually parse both.

Rust vs Common C++ Bugs

2022-12-21T00:00:00+00:00

I used to like C++. I still do, but I used to, too. Joking aside, I am not one to tell any C++ programmer to just use Rust. There are a ton of valid reasons why companies or individuals decide to use C++ and I don’t wish to deter anyone from doing so. What I am trying to do in this article is to see how Rust stacks up against a handful of very common (and severe) bugs that are easy to produce in C++. I tried to make this article worthwile for both Rust and C++ programmers.

I’ll use Louis Brandy’s excellent CppCon 2017 talk Curiously Recurring C++ Bugs at Facebook as the basis for what constitutes a common and scary bug easily produced in C++. Louis draws his experience from working on the C++ codebase at facebook (now Meta). I am aware that facebook does not represent every C++ use case, but my personal experience is very much compatible with the given list. I’ll try to not repeat the talk too much because it is an excellent presentation that I urge you to watch yourself. In the talk, Louis gives mitigations against many of the bugs, mostly involving Sanitizers. I won’t go into those kind of mitigations here because I want to explore how Rust stacks up on a more fundamental level.

Bug #1: Vector Out of Bounds Access Using `[]`

We probably all know how array or vector access using operator [] can cause out of bounds access in C++. For example, for std::vector it does not perform bounds checking, so an illegal memory access can occurr unnoticed¹. The problem is not that this is possible at all, since a systems language that cares about performance must offer these kinds of unchecked accesses. The problem is that this most fundamental way to access a vector element, deep in every programmers DNA, is inherently unsafe.

There is, of course a bounds-checked API for element access using std::vector::at, but few people seem to use it. Rust does the trade-off differently. On slices, arrays, vectors and the like operator [] is bounds-checked and will panic² for out of bounds access. There is a method get_unchecked that allows unchecked access to the elements if you tell the compiler trust me, I know what I am doing using the unsafe keyword. I personally think this is the right default to have, since the other way leads to the number one bug contained in Louis’ presentation.

Bug #2: `std::map` Access using `[]`

This one is a pretty well known confusing API: on a std::map in C++ the operator [] actually means get me a reference to the element or insert the default value and then get me a reference to that. There are cases where this is a useful API to have. Imbuing the operator [] with that behavior seems problematic because it violates the principle of least surprise. Louis Brandy has an excellent example of how that becomes a serious problem. Take a look at this constructor for a Widget class that performs some logging:

Widget::Widget(
  const std::map<std::string, int>& settings) :
    m_settings(settings) {
      std::cout << "Widget initialized..."
                << "timeout is:"
                << m_settings["timeout"]
                << "\n";
}

Here the programmer’s mindset is let me just log the timeout real quick, but it is easy to forget that this operation inserts a timeout of 0 (which in often means infinite wait) if no such key was already present. Rust’s BTreeMap exposes a much less surprising API, which makes this kind of error arguably impossible to make³.

Let’s have a quick look at the API for accessing entries in Rust’s BTreeMap. We can use the [] operator for accessing an (immutable) reference to an element in the map via its key. This operator panics if the key is not present in the map. That might seem peculiar at first, but is consistent with how a vector or array behaves when using the [] operator, where it means bounds-checked access to the element or panic. We can also use the get or get_mut methods for immutable or mutable access respectively if we are not sure the key exists in the map. Rather than throw an exception they return an optional reference to the element if it exists. Optional references are safe in Rust.

But is there a way to have the behavior of operator[] as in C++ in a clearer manner? Turns out there is, using the entry method. Say we have a BTreeMap named map: then we can write map.entry(key).or_default() to get the equivalent behavior to C++’s map[key]. The expression returns a mutable reference that can be manipulated, just as the C++ operator would. I would argue that this is a much cleaner API.

Bug #3: References to Temporaries

While the bugs described above might theoretically be fixed by an API redesign in the standard library⁴, the next one is inherent in the language. It has to do with the lifetime of temporaries. I assume that many of us know about lifetime in C++ and also know that lifetimes of temporaries may be extended under certain circumstances. However, I personally would be hard-pressed to recount all of those rules and exceptions to them. Louis gives a very cool motivating example.

Building on the discussion about maps above, let’s write a function that gets an element from a map or returns a given default value. He presents us with the following function:

std::string get_or_default(
  const std::map<std::string,std:string>& map,
  const std::string& key,
  const std::string& default_value) {
    auto it = map.find(key);
    return (it != map.end()) ?
            it->second : default_value;
}

This function works perfectly fine and it is commonly used like so:

get_or_default(people_map,"name","John Doe");

The important thing here is that giving the character literal "John Doe" as the default parameter implies the construction of a temporary string. There is nothing wrong with that. The problem arises only when we try to optimize the get_or_default function implementation. The function always returns a copy of the value it chooses to return. Being good C++ programmers, we might want to get rid of that extra copy-construction and change the return value to a constant reference to string instead of the by-value return that causes the copy. Let me quote Louis Brandy: “this code [would be] hopelessly broken”. The problem does not manifest if we return a reference to an element inside the map. This is of course fine, but if the map does not contain the given key, we return a reference to the temporary. This is a dangling reference and thus undefined behavior.

As mentioned above, there are rules for lifetime extension, which make it defined behavior to bind a temporary value to a reference to const. However, we find this exception to the rules:

a temporary bound to a return value of a function in a return statement is not extended: it is destroyed immediately at the end of the return expression. Such return statement always returns a dangling reference.

References to temporaries in C++ are hard, even in 2022 and address sanitizer and static analyzers can be used as mitigations to some degree. Still, it would surely be nice not to have that problem at all. This is where one of Rust’s most well-known features comes in: the Borrow Checker. This is such an integral and well-documented part of the Rust language that I’ll keep the example brief. We can write the get_or_default function in Rust like so:

fn get_or_default<'a,'b,'c>(
  map : &'a BTreeMap<String,String>,
  key : &String,
  default_val : &'b String) -> &'c String 
where 'a : 'c,
      'b : 'c {
    match map.get(key) {
        Some(val) => val,
        None => default_val,
    }
}

This is not idiomatic Rust since there are much cleaner ways to achieve this logic by using the BTreeMap API, but it’s a straightforward translation of the C++ code.

The interesting thing is that the get_or_default function takes references to all of it’s parameters (even the default value) and returns a reference, which will point to either the default value or the entry in the map. Those weird 'a, 'b, and 'c parameters that look like generics are actually named lifetimes that are enforced completely at compile time to check that references can never be dangling. What we have told the compiler here is that the map has a lifetime of 'a. References to the elements inside the map will also have this lifetime. For the key we don’t have to specify an explicit lifetime, because the lifetime of the reference returned from the function is completely independent from it. The compiler can take care of that on its own. The lifetime of the default value is 'b and finally the lifetime of the returned reference is 'c. The real magic happens in the where clause, where we indicate the relationship between the lifetimes. We tell the compiler that the lifetime 'a and 'b must be at least as long as 'c. Thus, at the callsite, the compiler will check that the returned reference is only used as long as the map and the default value are still alive. Again, this is enforced at compile time without any runtime overhead. For any program that compiles, we can be sure that the get_or_default function never returns a dangling reference.

If that syntax seems weird and complicated to you, don’t panic because there is such a thing as lifetime elision which makes those explicit lifetime annotations unneccessary in many common usecases. Still, sometimes we have to help the compiler by annotating the lifetimes which is the price we have to pay in Rust (at compile time) for the absence of dangling references⁵.

Bug #4: Volatile for Atomic

Louis Brandy tells a success story here. Years ago, the volatile keyword was commonly misused to enforce synchronization across threads. With the advent of std::atomic, and more generally the addition of library and language facilities for concurrent programming in C++11, programmers stopped misusing volatile. We started using the newly available language and library facilities, since they were simple to use and simple to teach.

Let’s enjoy this success story, before we get back to the broader topic of thread safety in the next section. Harking back to the first two sections, this success story shows that good API design does go a long way in preventing bugs.

Bug #5: Is `shared_ptr` Thread Safe?

In a very fun section of this great talk, Louis shows that developers seem to forget that a shared_ptr does not enforce any synchronization for its pointed to element, yet the reference count is synchronized across threads⁶. This distinction seems to be sufficiently confusing so that bugs are regularly produced by sharing shared_ptr instances across threads. Rather than blaming the developers, it seems to me that this is a systemic issue that should be addressed.

This is where Rusts notion of fearless concurrency kicks in. It’s a topic too broad to discuss in depth, because a lot of language facilities work together to create it. It’s instructive to look how Rust deals with the equivalent problem of shared pointers. Let’s look at this program, where we’ll try to use a reference counted shared pointer Rc to share an integer value across threads:

use std::sync::Arc;
use std::rc::Rc;

type SharedPtr<T> = Rc<T>;

fn main() {
    let val = SharedPtr::new(10);

    let shared1 = SharedPtr::clone(&val);
    let handle1 = std::thread::spawn(move || {
        println!("thread1: {}",shared1);
    });

    let shared2 = SharedPtr::clone(&val);
    let handle2 = std::thread::spawn(move || {
        println!("thread2: {}",shared2);
    });

    println!("main: {}", val);
    let _ = handle1.join();
    let _ = handle2.join();
}

We have typedef’d SharedPtr to the standard library type Rc. Then we created a new instance via Rc::new (think of this somewhat like std::make_shared) and share a copy of that shared pointer across different threads (using a method called clone because otherwise Rust would move the value out of our scope). All threads including main just print the value of the shared integer. One problem: this program does not compile. The compiler will complain that Rc does not implement the Send trait.

The Send trait is Rust’s way, via the type system, to indicate whether it is safe to send a value to a different thread. This is important for types that carry shared ownership semantics, like Rc. It turns out that Rc is not safe to send across different threads because the reference counting is not atomic as it would be in std::shared_ptr. The correct equivalent for an atomically reference counted shared pointer would be the Rust type Arc. This type does indeed implement Send and thus using type SharedPtr = Arc makes this code compile.

If we now tried to mutate the value from two different threads, we would find out that we cannot simply do that. We’ll eventually hit an error telling us we need a second fundamental trait for concurrency, called Sync. This trait, again via the typesystem, indicates whether a value is safe to share between threads. The short story is that we have to explicitly use a synchronization mechanism for the data that makes it safe to share among threads, such as a mutex. I won’t go into detail here, because I have written a two part series comparing mutexes in Rust and C++ (part 1, part 2).

The takeaway here is that Rust does indeed prevent this type of thread-safety problem (and many more) by making thread safety part of the type system. So thread-safety is checked at compile time. This even presents an opportunity for optimization by giving us non-atomically reference counted pointers that are only safe to use from one thread. We only pay for atomic counting if we need it and we cannot forget to use it during a refactor because the compiler will tell us.

However, it’s only fair to note that the thread-safety guarantees aren’t all sunshine and rainbows in Rust and truly getting into the specifics of what thread-safety guarantees Rust actually gives is complicated. The term is just so loosely defined. For one, Rust does prevent data races and on the other hand you can still deadlock all you want. Also, at the time of writing, Rust does not have a defined memory model, unlike C++ ⁷.

Bug #6: A Vexing Parse that Only Pretends To Lock A Mutex

This bug is a really curious one. Did you know that std::string(foo); is valid C++ and is parsed as std::string foo;? Both produce a default constructed instance of a string named foo. This can lead to an interesting problem, when trying to lock a mutex using the RAII guards in the standard library. Take a look at this deceptive example:

void Widget::update_state() {
  std::unique_lock<std::mutex>(m_mutex);
  do_some_update(m_state);
}

The first line of the function does not actually lock the mutex m_mutex, but instead defines a default constructed instance of a std::unique_lock called m_mutex. A correction would just require two more characters like so: std::unique_lock g(m_mutex). The difference is not easy to spot. Louis shows that despite a running linter, he found two instances that made it into the production codebase with exactly that mistake.

Evaluating whether this mistake is easy to make in Rust is interesting. Rust’s Mutex works very differently than in C++. In Rust a mutex is a class template⁸ where the instance of a mutex is the sole owner of the contained data. When we lock this mutex we get a pointer-like guard that can be dereferenced to access the underlying value. Once the guard goes out of scope, the mutex is locked again. This data-oriented approach to mutexes makes it impossible to forget to lock a mutex that protects data, because we cannot otherwise access the data. I have written about this paradigm in an article and also explored data-oriented mutexes for C++. Spoiler: there’s no way to have data-oriented mutexes in C++ without the risk of running into trouble with lifetimes.

While using mutexes to protect data as described above is surely a very common (maybe the most common) use case, there are valid reasons to use a mutex to protect sections of code rather than data. We can achieve this in Rust with the Mutex<()> specialization, where () in Rust is (in this case) equivalent to void. To lock this mutex we can just call

{
  let _ = my_mutex.lock();
  // ... code in section
}

…and that would produce the same problem as the C++ code above! Question to Rust developers: did you spot the problem?

It turns out that let _ =... is not actually binding the mutex guard to a named variable. This means the created temporary is immediately destructed, rather than at the end of the scope, unlike a named entity. In fact, let _ = expr is semantically equivalent to (void)(expr) in C++ and serves to e.g. suppress compiler warnings about unused return values. On the other hand let _g = expr is somewhat like auto _g = expr in C++ and binds to a named entity whose destructor is invoked at the end of its scope. At the time of writing the code shown above compiles without warnings in Rust. The mutex will not be locked until the end of the scope.

I really think the semantic difference between let _ = expr and let _g = expr is a confusing design choice, as evidenced e.g. by the discussions here and here. There is a clippy lint ⁹ that will catch the mistake in the code using mutexes above, but it cannot prevent deeper underlying problem for other types of RAII guards. We could, of course, argue that using Mutex<()> is an indication of bad design, but that would be unfair to C++. Then we could always say producing bugs is bad design. If Mutex<()> exists, people will eventually use it. Furthermore, the problem with let _ = ... goes beyond just mutexes.

Summary

We looked at six common errors in C++ in the facebook codebase and we saw that Rust enables us to catch most of the bugs at compile time and also emphasizes memory safety and simplicity in its API design. But we also discovered a thread safety bug caused by a syntax quirk in C++ that exists in Rust as well, albeit through a completely different syntax quirk. We also learned that thread safety is a very underspecified aspect of the Rust language despite some amazing compile time guarantees.

I don’t want to pass judgement on Rust vs C++ as languages in general, but it is nice to see that a language which is often hailed as a safe successor to C++ does indeed help to prevent many of the bugs common in C++. And yet we saw that Rust is not without its flaws.

Endnotes

This is true for other array like containers in the STL as well as C-style arrays. ↩
A panic is an early, but orderly termination of the program with stack unwinding. ↩
Half of the reason this operation is so problematic in C++ is that it does not work with const-correctness in an intuitive way. It will not work on a const map, but Louis shows that this does not deter programmers from using the operator. We just get rid of the const. ↩
However, I’d be surprised if they are ever fixed, because of backwards compatibility. ↩
There is such a thing as unsafe Rust which can cause all kinds of UB, including dangling references. Unsafe Rust is not disabling the Borrow Checker. It can introduce legitimate safety problems, but since it’s opt-in we can decide not to use the unsafe subset of Rust and loads of libraries and applications can be and are written without it. ↩
Part of this confusion surely stems from the fact that the control block of a shared pointer is synchronized. This implies that the reference count is actually thread-safe, but this does not mean that the pointed-to instance is safe to access across threads. It’s as safe as sharing a raw pointer would be, i.e. read-only sharing would be safe while any form of mutation could incur a data race. ↩
A reader on reddit characterized this statement as “both factually true and incredibly misleading”. See their comment on the matter. ↩
“Templates” in Rust are not called templates and also work rather differently than templates in C++. But in very broad strokes, Rust generics are somewhat like C++ templates. ↩
Clippy is the static analyzer that ships with the rust toolchain Rust and many projects use it to run extra analyses on their code. However, the compiler does not complain and so I feel it would be unfair to C++ to disregard that problem in the Rust language. ↩

Implementing Rayon’s Parallel Iterators - A Tutorial

2022-10-30T00:00:00+00:00

The rayon library is a brilliant piece of engineering that is so simple to use that one seldom has to descend into its depths. This is the story of how I learned to implement rayon’s ParallelIterator trait for my own type. There are tons of guides on how to use rayon’s parallel iterators and there are a few explanations on how they work under the hood. However, I found no tutorials on how to implement a parallel iterator from the ground up. This article aims to close that gap.

There is a fair bit of complexity around rayon’s parallel iterators and this tutorial cannot explain every nook and cranny. What I’d rather do is give a guide for a not-too-trivial example. It might or might not be enough for your use case, but you’ll have an understanding of the map of the territory either way.

Existing Literature

First, here is a collection of prior art on the subject of implementing parallel iterators. I’ve ordered this in ascending order of usefulness (as perceived by me). I recommend to read this guide first and then go back to the literature referenced in this section. Eventually, reading the source will prove invaluable, though it would not be my first port of call.

Inside the rayon repository, there is a plumbing/README.md. It was too terse as an introduction for me, but it does come in handy as a refresher or if you have prior knowledge. What I found very helpful in understanding how rayon thinks about parallel iterators is the three part blog series (Part 1 - Foundations, Part 2 - Producers, Part 3 - Consumers) by Niko Matsakis, rayon’s creator. It’s a brilliant introduction to this subject and I hope this guide will complement it nicely. We’re going to see the principles applied to an example.

Finally, it’s worth noting that often you don’t really have to implement your own parallel iterator from the ground up because you can use what’s already there in rayon. Here and here are examples of how the par_bridge and par_chunks functionality can be used as quick alternatives to implementing custom iterators. Here is an example of how to make use of rayon’s existing iterators to implement your own iterator with less overhead. But what if it turns out you do have to (or want to) implement a parallel iterator from the ground up? That is where this guide comes in.

Groundwork

First let’s get to know our example and draw a very rough map of the rayon territory.

Our Example

We’ll implement parallel iterators for a collection of some data where sequential iterators are already present. This is a common use case. Our example will be deliberately simple, which is why I use vectors and slices as the underlying ways of storing and accessing our data. Those already give us sequential iterators¹. Note that rayon already has parallel iterators for Vecs and slices, but we will not use them. So we learn how to implement parallel iterators from first principles. For convenience, we’ll use i32s as a stand-in for the data² inside our collection.

type Data = i32;

struct DataCollection {
  data : Vec<Data>,
}

We will make heavy use of the fact that we can split a vector into slices and that there are sequential iterators over slices. Again, we will not exploit rayon’s parallel iterators over slices or Vecs.

Rayon Tour de Force

I am interested in writing an iterator that implements both rayon’s ParallelIterator as well as IndexedParallelIterator, which makes this (in rayon’s terms) a “random access” iterator with an exactly known length. Some of what I am going to say will be true for other types of parallel iterators but some things won’t be, so keep that in mind.

We will start out by writing a structure for the parallel iterator over our data and we’ll see that we can implement all but one required method of ParallelIterator and IndexedParallelIterator pretty easily. For the final piece of the puzzle, we have to understand rayon’s concept of a Producer. It helps to think of rayon as a divide and conquer multithreading library. It wants to split the whole iteration into smaller and smaller chunks, distribute them across threads, then fall back to regular sequential iterators to perform the actual work within the threads. Producers are the glue that allows rayon to understand how to split your iteration into smaller chunks and how to iterate over those chunks sequentially. If that all seems a bit much now, bear with me.

The Implementation

Here we’ll see how to implement parallel iterators that iterate over (mutably or immutably) borrowed data.

(Indexed) Parallel Iterators

Since we are borrowing the data, the easiest way is to provide the iterator with a reference to a slice of the data. Let’s start with iterators over immutable references fist.

struct ParDataIter<'a> {
  data_slice : &'a [Data]
}

I already mentioned that I want to write a parallel iterator that has an exactly known size. So the two traits we have to implement are ParallelIterator and IndexedParallelIterator on our ParDataIter. Let’s start out by implementing all the required methods by just putting a todo!() into each body to appease the compiler. This looks something like this:

impl<'a> ParallelIterator for ParDataIter<'a> {
    type Item = &'a Data;

    fn drive_unindexed<C>(self, consumer: C) -> C::Result
    where C: UnindexedConsumer<Self::Item> {
        todo!()
    }
}

The ParallelIterator trait only has one required method, which seems not that bad, right? The associated type Item is clear, because we want to iterate over references to the data, so we make it &'a Data right away. Now let’s look at the second iterator trait before we go any further:

impl<'a> IndexedParallelIterator for ParDataIter<'a> {
    fn with_producer<CB: ProducerCallback<Self::Item>>(
        self,
        callback: CB,
    ) -> CB::Output {
        todo!()
    }

    fn drive<C: Consumer<Self::Item>>(self, consumer: C) -> C::Result {
        todo!()
    }

    fn len(&self) -> usize {
        todo!()
    }
}

This one has three methods we need to implement. The simplest one is len, which must return the number of elements that this parallel iterator produces. This is just self.data_slice.len() and we’re done. The next two methods we implement are drive_unindexed and drive of ParallelIterator and IndexedParallelIterator, respectively. The three part series by Niko Matsakis linked above gives a great explanation of what the logic behind these methods is. Here, we’ll take a pragmatic approach and look at how rayon goes about implementing these methods. In the parallel iterator Implementation for slices in lines 708 and 721, we can see that both methods are implemented by a simple call to bridge(self, consumer). Interesting! If we look at the documentation of rayon::iter::plumbing::bridge we find:

This helper function is used to “connect” a parallel iterator to a consumer. […] This is useful when you are implementing your own parallel iterators: it is often used as the definition of the drive_unindexed or drive methods.

The last sentence tells us that this is exactly what we need. It is worth noting that bridge requires the first argument (i.e. self) to implement IndexedParallelIterator, which is no problem for us, because that is what we are doing anyways. That lets us fill all but one method with the correct logic. Before we see how to implement with_producer, let’s throw in a low-hanging optimization.

ParallelIterator has a method opt_len(&self)->Option that returns the length of this iterator if it is known. We can just return Some(self.len()), which calls the len method of self as an implementor of IndexedParallelIterator. In summary, this leaves us with this code:

impl<'a> ParallelIterator for ParDataIter<'a> {
    type Item = &'a Data;

    fn drive_unindexed<C>(self, consumer: C) -> C::Result
    where
        C: UnindexedConsumer<Self::Item> {
        bridge(self,consumer)
    }

    fn opt_len(&self) -> Option<usize> {
      Some(self.len())
    }
}

impl<'a> IndexedParallelIterator for ParDataIter<'a> {
    fn with_producer<CB: ProducerCallback<Self::Item>>(
        self,
        callback: CB,
    ) -> CB::Output {
        todo!()
    }

    fn drive<C: Consumer<Self::Item>>(self, consumer: C) -> C::Result {
        bridge(self,consumer)
    }

    fn len(&self) -> usize {
        self.data_slice.len()
    }
}

So the only thing left to implement is with_producer and we’re done.

Producers

It’s pretty important to understand rayon’s concept of producers when we want to implement parallel iterators. They work in hand in hand with a concept called consumers. Put simply, producers produce elements and consumers consume them. I know: thanks, Captain Obvious! But bear with me. Functions like fold on parallel iterators work with a FoldConsumer under the hood that consumes the produced elements. That’s about all we need to know about consumers here, but we need to dive into producers a little bit more. Producers are described by the rayon documentation like this:

A Producer is effectively a “splittable IntoIterator”. That is, a producer is a value which can be converted into an iterator at any time: at that point, it simply produces items on demand, like any iterator. But what makes a Producer special is that, before we convert to an iterator, we can also split it at a particular point using the split_at method.

So a producer allows us to split the range over which we iterate and it can be made into a sequential iterator at any time. So let’s try and create a producer for the elements of our DataCollection. Let’s create a new structure for the producer ³:

struct DataProducer<'a> {
  data_slice : &'a [Data],
}

To implement the Producer trait for this structure we have to know essentially three things.

What is the sequential iterator into which this producer can be made?
What type of item does said iterator produce?
How can we split the producer into two at a given index?

Let’s start at the top. The sequential iterator should be the iterator over a borrowed slice of Data, i.e. std::slice::Iter<'a,Data>. This implies that the type of item returned from this iterator is &'a Data. It is worth noting that Producer requires the returned iterator to implement both DoubleEndedIterator as well as ExactSizeIterator. This is no problem for us because slice iterators implement both these traits⁴. Finally, we can split our producer by splitting the borrowed slice by using split_at. Our implementation can thus look like this:

impl<'a> Producer for DataProducer<'a> {
    type Item = &'a Data;
    type IntoIter = std::slice::Iter<'a, Data>;

    fn into_iter(self) -> Self::IntoIter {
        self.data_slice.iter()
    }

    fn split_at(self, index: usize) -> (Self, Self) {
        let (left, right) = self.data_slice.split_at(index);
        (
            DataProducer { data_slice: left },
            DataProducer { data_slice: right },
        )
    }
}

And just like that we have our producer. Let’s add some convenience functionality to go from a parallel iterator to a producer. This will come in handy momentarily.

impl<'a> From<ParDataIter<'a>> for DataProducer<'a> {
    fn from(iterator: ParDataIter<'a>) -> Self {
        Self {
            data_slice: iterator.data_slice,
        }
    }
}

Finally, we can revisit our implementation of IndexedParallelIterator for ParDataIter and fill in the missing piece.

impl<'iter> IndexedParallelIterator for ParDataIter<'iter> {
    fn with_producer<CB: ProducerCallback<Self::Item>>(
        self,
        callback: CB,
    ) -> CB::Output {
        let producer = DataProducer::from(self);
        callback.callback(producer)
    }

// --- other methods unchanged ---
// [...]
}

If you are wondering what on earth a producer callback is, I recommend to read the appropriately titled section “What on earth is ProducerCallback” in the rayon README. For us as implementors, we just need to remember to invoke that callback function on a producer that we create from our iterator. We do that by using the slightly awkward (but very cleverly designed) callback.callback(producer) syntax.

Usage and Ergonomics

Now we have sucessfully implemented a parallel iterator. There is one final thing we need to do before we can use it. We have to expose an interface on our data structure to get one. We can for example expose a member function

impl DataCollection {
    pub fn parallel_iterator(&self) -> ParDataIter {
    ParDataIter {
      data_slice : &self.data,
    }
  }
}

Now we can call this function on our collection to obtain a parallel iterator. That is a perfectly valid way to obtain parallel iterators. As a matter of fact, for structures that have more than one way to iterate over their data it is good practice to implement descriptive member functions that return different kinds of parallel iterators. Think e.g. of a matrix that can have element wise, column wise, and row wise parallel iterators.

However, if there is just one reasonable way of iterating over the elements in our data structure, rayon has a nice feature through blanket implementations. There is a trait IntoParallelRefIterator that exposes a function par_iter() that iterates over references of elements. We don’t implement this trait directly, but we get it for free when implementing IntoParallelIterator for &DataCollection. So let’s do that:

impl<'a> IntoParallelIterator for &'a DataCollection {
    type Iter = ParDataIter<'a>;
    type Item = &'a Data;

    fn into_par_iter(self) -> Self::Iter {
        ParDataIter { data_slice: &self.data }
    }
}

Example Code on the Rust Playground

You can find all the code plus more on the playground. The playground also code includes an implementation of iterators over mutable data. Putting everything together, allows us to do something like this:

fn main() {
    let mut data = DataCollection{
      data : vec![1, 2, 3, 4]
    };

    data
    .par_iter_mut()
    .for_each(|x| *x = -*x);

    println!("data = {:?}", data);

    let sum_of_squares: Data = data
      .par_iter()
      .map(|x| x * x)
      .sum();
      
    println!("sum = {}", sum_of_squares);
}

Parallel Iterators for Mutable Data

So far we have only seen an implementation to immutably iterate over our data. The good thing is that adding parallel iterators for mutable data is dead simple, because we can just replace all our &'a with &'a mut for mutable iteration. So what we do is create a second iterator for mutable iteration ParDataIterMut that references a mutable slice. We implement the two iterator traits just as above. That means we’ll have to create an analogous DataProducerMut, plug everything together again and voilà we’re done⁵. The playground link above has the code for mutable iterators as well.

Conclusion

Multithreading is hard and it is a testament to the genius design of the rayon library, that so much of the complexity is abstracted away from us. Ninetynine percent of the time we can just replace iter() for par_iter() and enjoy the magic. We usually don’t have to know how to implement our own parallel iterators, but if you find yourself wanting to, I hope this tutorial has given you an idea of how to go about it. Now is probably a good time to look at all the prior art that I mentioned at the beginning of this article, if you haven’t already.

Endnotes

As a matter of fact, those iterators implement ExactSizeIterator as well as DoubleEndedIterator, which will be important later. ↩
Note that i32 is Send, which is also important later. ↩
The astute reader will have noticed that the producer here looks just like the parallel iterator structure. I have seen this code duplication inside of the rayon codebase as well. There’s no reason why we could not implement the Producer trait on our parallel iterator type. This will not work for every use case, but it certainly would for our particular example. I have also written parallel iterators where I was able to modify the sequential iterators so that they implemented the Producer trait. This, of course, is only possible if you own the codebase that contains the sequential iterators. We’ll stick to the most general case here and won’t bother ourselves with reducing code duplication too much. ↩
If the sequential iterator for your use case does not implement these traits, this gets trickier. You can either try and implement them for the iterator (if you own the codebase) or create a new sequential iterator that implements them, possibly by wrapping the existing one. ↩
While it truly is that simple for our case, it does not have to be in all cases. The borrow checker might complain about certain code containing mutable references that it will accept for immutable references. ↩

Compile-Time If-Expressions for Types in C++11

2022-08-20T00:00:00+00:00

The other day I wondered if there is a way to metaprogam a syntax that returns a type based on a conditional expression akin to an if…else if…else expression. The idea came to me when I searched for a programmatic way to choose the right size of integer for a bitset, given a certain number of bits. And yes, I know there’s std::bitset, but I’ll take any excuse to investigate an interesting metaprogramming problem.

Simple Conditionals

The standard library already gives us a metafunction std::conditional, which is the semantic equivalent of a ternary operator for types: if the (compile time known) boolean condition C evaluates to true we get the type T, otherwise F. There’s nothing like an else if logic in there, but it is still instructive to see how to implement this ourselves. We go about this by first defining a structure template Conditional like so

template<bool Condition, typename T, typename E>
struct Conditional {}

Then we specialize this for the boolean cases true and false, respectively:

template<typename T, typename E>
struct Conditional<true,T,E> {
    using type = T;
}

template<typename T, typename E>
struct Conditional<false,T,E> {
    using type = E;
}

template<bool Condition, typename T, typename E>
using Conditional_t = Conditional<Condition,T,E>::type;

All of this might look unwieldy at first, but is from the standard metaprogramming bag of tricks. Now we can use this expression to switch a type based on a compile time known condition like so:

// assuming BIT_COUNT is compile time known
using MyType = Conditional_t<BIT_COUNT <= 8,int8_t,int64_t>;

This conditional enables simple logic with only two branches. Furthermore, without looking up how Conditional (or std::conditional for that matter) works, I don’t find this code very readable¹. So I set out to do something both more powerful and more readable.

The Goal

A few years ago I have implemented if expressions for values, which are also constexpr. So they work for compile time known values, but not for types. However, I like the syntax and I wanted to see whether I can come up with a similar syntax that allows us to choose types at compile time. The syntax I ended up implementing looks like this²:

using IntType = If<(BIT_COUNT <= 8)>
                ::Then<uint8_t>
                ::ElseIf<(BIT_COUNT <= 16)>
                ::Then<uint16_t>
                ::ElseIf<(BIT_COUNT <= 32)>
                ::Then<uint32_t>
                ::Else<uint64_t>;
IntType my_integer = 0;

Whether this construct is useful in other circumstances than this particular example I don’t know. But I find it very readable and was excited to challenge myself with the implementation.

We can build on the general idea of the Conditional implementation above which uses specialization to select for types. Additionally, we can draw inspiration from the if expressions implementation, where we have created transient types for the if, then, else if, and else parts of the expression.

The Implementation

Let’s have a look at the implementation and then walk through it. I suggest to only skim this code for now and jump to the next paragraph where we walk through the code in detail. The layout of the code is so that the compiler is happy. The declarations need to be in a particular order that is less intuitive to a human reader.

// (1)
template <bool Condition, typename T>
struct ThenType {};

// (2)
template <bool IfCondition,
 typename T,
  bool ElifCondition>
struct ElifType {
  // (3)
  template <typename E>
  using Then = typename Conditional<IfCondition,
                        ThenType<true, T>,
                        ThenType<ElifCondition, E>
                        >::type;
};

// (4)
template <typename T>
struct ThenType<true, T> {
  //(5)
  template <typename E>
  using Else = T; 
  // (6)
  template <bool ElifCondition>
  using ElseIf = ElifType<true, T, ElifCondition>;
};

// (7)
template <typename T>
struct ThenType<false, T> {
  // (8)
  template <typename E>
  using Else = E;
  // (9)
  template <bool ElifCondition>
  using ElseIf = ElifType<false, T, ElifCondition>;
};

// (10)
template <bool Condition>
struct If {
  // (11)
  template <typename T>
  using Then = ThenType<Condition, T>;
};

Let’s start at the bottom with the ⑩ If structure template, which takes a compile time known boolean condition Condition. It is the entry point to our metafunction and the key trick here is that we have an associated templated type ⑪ Then. It evaluates to the ThenType type, which we’ll discuss in a moment. We can think about this templated associated type like a function in type space. It is only possible with using declarations in C++11 and up. Before that we would have had to use a typedef and I know of no way to make the typedef generic on a template parameter.

The ① ThenType structure template remembers the condition C from the original ⑩ If and takes a type T that should be returned if this C is true. If we look at the template specializations ④ ThenType and ⑦ ThenType we can see that we have ⑤⑧Else associated types that take a type parameter E. Since else is the last piece of the chain, this Else metafunction evaluates to a concrete type depending on the condition. If the condition is true, it will evaluate to T, which is the original type from the if condition, and otherwise it will evaluate to E, which is the type given to Else. These pieces enable us to write If<(X<3)>::Then::Else. This is already more readable than std::conditional to my mind, but we want to go further.

To enable the else if part of the expression we add ⑥⑨ ElseIf associated types in the ThenType structure template specializations. This type is the reason we have to have this weird out of order declaration of ThenType in the first place. You surely noticed that we declared the ThenType template on top, then the ElifType template and only after that we specialized the ThenType template for the possible outcomes of the condition. This is because we want to use ThenType from ElifType but also ElifType from ThenType. Since the compiler needs to see those declarations before usage, we help it by essentially forward declaring the ThenType template. ② ElifType has one member metafunction ③ Then which uses our Conditional in such a way that we propagate the correct logic and types through the chain.

Conclusions and Code

In summary, we have enabled a syntax for using an if…else if…else logic to expressively select types at compile time based on logical conditions in C++11. This nicely complements my previous if expressions implementation, which we can use to select values (but not types) at compile time. Both implementations are available as part of my func++ repository repository on GitHub.

Endnotes

I know we can nest std::conditional to get an else if logic but that only exacerbates the problems with readability. ↩
For my first try I used constexpr functions on the If type which can be made to return (instances of) different types using if constexpr. There’s many problems with this approach, but the biggest problem is that we have to extract the return type using decltype, which does not make for a pretty syntax ↩

Geo’s Notepad

Global Fitting of Multiple Right Hand Sides with Variable Projection

What is Global Fitting?

VarPro: A Quick Recap

Global Fitting with VarPro

Advantages and Limitations

Limitations and Extensions

Depending on the Data Index

More Efficient Solvers

References

Endnotes

Variable Projection Update

Idiomatic Rust (for C++ Devs): Constructors & Conversions

Initialization

Enforcing Invariants with Constructors that Fail

Default Constructors

Copy Constructors

Using Clone for Explicit Copy Construction

Deriving Clone on Generic Types

Trivially Copyable Types

Copy vs Clone

Move Constructors

Conversions

Coversions that Fail

Summary

Endnotes

Curiously Cumbersome Rust: Type-level Programming

Motivation

Today’s Problem: Same Size for Two Types

Using Associated Constants

Using Associated Types

Rethinking and Making it Work on Stable

Providing Fallback Implementations

Fallback Implementations without Specialization

Final Thoughts

Other Solutions

Endnotes

Learn Unsafe Rust From My Mistakes

A Warning

The Task At Hand: In Place Mapping

Unsafe Rust Confusion

Unsafe Is Not About Circumventing Anything

Detecting Rule Violations

Unsafe as a Gateway

The Transformative Unsafe Journey

A Clear, Simple (and Wrong) Solution

Panic Safety

Is That It? All Good Now?

Update: More Mistakes

Panic Double Drop

Further Reading

How To Do It Safely

Endnotes

Rust Deep Dive: Borked Vtables and Barking Cats

Inspiration and Motivation

Dynamic Polymorphism: Meowing Cats and Barking Dogs

A Peek Behind the Curtain: Vtables

Hidden Vtables

Hidden Pointers to Vtables

Fat Pointers

(Fat) Pointer and Vtable Memory Layout

Fun With Vtables

References or Boxes

(Im)mutable Vtables

We’re Not in Kansas Anymore

Conclusion

Endnotes

Now With RSS/Atom Feed

Rust vs Common C++ Bugs

Bug #1: Vector Out of Bounds Access Using []

Bug #2: std::map Access using []

Bug #3: References to Temporaries

Bug #4: Volatile for Atomic

Bug #5: Is shared_ptr Thread Safe?

Bug #6: A Vexing Parse that Only Pretends To Lock A Mutex

Summary

Endnotes

Implementing Rayon’s Parallel Iterators - A Tutorial

Existing Literature

Groundwork

Bug #1: Vector Out of Bounds Access Using `[]`

Bug #2: `std::map` Access using `[]`

Bug #5: Is `shared_ptr` Thread Safe?