Implementing Rayon’s Parallel Iterators - A Tutorial
The rayon library is a brilliant piece of engineering that is so simple to use
that one seldom has to descend into its depths. This is the story of how I
learned to implement rayon’s
ParallelIterator
trait for my own type. There are tons of guides on how to use rayon’s parallel
iterators and there are a few explanations on how they work under the hood. However,
I found no tutorials on how to implement a parallel iterator from the ground up.
This article aims to close that gap.
There is a fair bit of complexity around rayon’s parallel iterators and this tutorial cannot explain every nook and cranny. What I’d rather do is give a guide for a not-too-trivial example. It might or might not be enough for your use case, but you’ll have an understanding of the map of the territory either way.
Existing Literature
First, here is a collection of prior art on the subject of implementing parallel iterators. I’ve ordered this in ascending order of usefulness (as perceived by me). I recommend to read this guide first and then go back to the literature referenced in this section. Eventually, reading the source will prove invaluable, though it would not be my first port of call.
Inside the rayon repository, there is a plumbing/README.md. It was too terse as an introduction for me, but it does come in handy as a refresher or if you have prior knowledge. What I found very helpful in understanding how rayon thinks about parallel iterators is the three part blog series (Part 1 - Foundations, Part 2 - Producers, Part 3 - Consumers) by Niko Matsakis, rayon’s creator. It’s a brilliant introduction to this subject and I hope this guide will complement it nicely. We’re going to see the principles applied to an example.
Finally, it’s worth noting that often you don’t really have to implement your own
parallel iterator from the ground up because you can use what’s already there
in rayon. Here
and here are examples of how the
par_bridge
and par_chunks
functionality can be used as quick alternatives to implementing custom iterators.
Here is an example of how to
make use of rayon’s existing iterators to implement your own iterator with less
overhead. But what if it turns out you do have to (or want to) implement a parallel
iterator from the ground up? That is where this guide comes in.
Groundwork
First let’s get to know our example and draw a very rough map of the rayon territory.
Our Example
We’ll implement parallel iterators for a collection of some data where sequential
iterators are already present. This is a common use case. Our example will be
deliberately simple, which is why I use vectors and slices as the underlying ways
of storing and accessing our data. Those already give us sequential iterators1.
Note that rayon already has parallel iterators for Vec
s and slices, but we will
not use them. So we learn how to implement parallel iterators from first principles.
For convenience, we’ll use i32
s as a stand-in for the data2 inside our collection.
type Data = i32;
struct DataCollection {
data : Vec<Data>,
}
We will make heavy use of the fact that we can split a vector into slices and
that there are sequential iterators over slices. Again, we will not exploit
rayon’s parallel iterators over slices or Vec
s.
Rayon Tour de Force
I am interested in writing an iterator that implements both rayon’s
ParallelIterator
as well as IndexedParallelIterator
,
which makes this (in rayon’s terms) a “random access” iterator with an exactly
known length. Some of what I am going to say will be true for other types
of parallel iterators but some things won’t be, so keep that in
mind.
We will start out by writing a structure for the parallel iterator over our data
and we’ll see that we can implement all but one required method of
ParallelIterator
and IndexedParallelIterator
pretty easily. For the final
piece of the puzzle, we have to understand rayon’s concept of a Producer
.
It helps to think of rayon as a divide and conquer multithreading library.
It wants to split the whole iteration into smaller and smaller chunks,
distribute them across threads, then fall back to regular sequential iterators
to perform the actual work within the threads.
Producers are the glue that allows rayon to understand how to
split your iteration into smaller chunks and how to iterate over those chunks
sequentially. If that all seems a bit much now, bear with me.
The Implementation
Here we’ll see how to implement parallel iterators that iterate over (mutably or immutably) borrowed data.
(Indexed) Parallel Iterators
Since we are borrowing the data, the easiest way is to provide the iterator with a reference to a slice of the data. Let’s start with iterators over immutable references fist.
struct ParDataIter<'a> {
data_slice : &'a [Data]
}
I already mentioned that I want to write a parallel iterator that has an exactly
known size. So the two traits we have to implement are ParallelIterator
and IndexedParallelIterator
on our ParDataIter
. Let’s start out by implementing all the required methods by
just putting a todo!()
into each body to appease the compiler.
This looks something like this:
impl<'a> ParallelIterator for ParDataIter<'a> {
type Item = &'a Data;
fn drive_unindexed<C>(self, consumer: C) -> C::Result
where C: UnindexedConsumer<Self::Item> {
todo!()
}
}
The ParallelIterator
trait only has one required method, which seems not that
bad, right? The associated type Item
is clear, because we want to iterate over
references to the data, so we make it &'a Data
right away. Now let’s look at the
second iterator trait before we go any further:
impl<'a> IndexedParallelIterator for ParDataIter<'a> {
fn with_producer<CB: ProducerCallback<Self::Item>>(
self,
callback: CB,
) -> CB::Output {
todo!()
}
fn drive<C: Consumer<Self::Item>>(self, consumer: C) -> C::Result {
todo!()
}
fn len(&self) -> usize {
todo!()
}
}
This one has three methods we need to implement. The simplest one is len
, which
must return the number of elements that this parallel iterator produces. This is
just self.data_slice.len()
and we’re done. The next two methods we implement
are drive_unindexed
and drive
of ParallelIterator
and IndexedParallelIterator
,
respectively. The three part series by Niko Matsakis linked above gives a great
explanation of what the logic behind these methods is. Here, we’ll take a pragmatic
approach and look at how rayon goes about implementing these methods.
In the parallel iterator Implementation for slices
in lines 708 and 721, we can see that both methods are implemented by a simple
call to bridge(self, consumer)
. Interesting! If we look at the documentation of
rayon::iter::plumbing::bridge
we find:
This helper function is used to “connect” a parallel iterator to a consumer. […] This is useful when you are implementing your own parallel iterators: it is often used as the definition of the
drive_unindexed
ordrive
methods.
The last sentence tells us that this is exactly what we need. It is worth noting
that bridge
requires the first argument (i.e. self
) to implement IndexedParallelIterator
,
which is no problem for us, because that is what we are doing anyways. That lets
us fill all but one method with the correct logic. Before we see how to implement
with_producer
, let’s throw in a low-hanging optimization.
ParallelIterator
has a method opt_len(&self)->Option<usize>
that returns the length of this iterator if it is known. We can just return
Some(self.len())
, which calls the len
method of self
as an implementor of
IndexedParallelIterator
. In summary, this leaves us with this code:
impl<'a> ParallelIterator for ParDataIter<'a> {
type Item = &'a Data;
fn drive_unindexed<C>(self, consumer: C) -> C::Result
where
C: UnindexedConsumer<Self::Item> {
bridge(self,consumer)
}
fn opt_len(&self) -> Option<usize> {
Some(self.len())
}
}
impl<'a> IndexedParallelIterator for ParDataIter<'a> {
fn with_producer<CB: ProducerCallback<Self::Item>>(
self,
callback: CB,
) -> CB::Output {
todo!()
}
fn drive<C: Consumer<Self::Item>>(self, consumer: C) -> C::Result {
bridge(self,consumer)
}
fn len(&self) -> usize {
self.data_slice.len()
}
}
So the only thing left to implement is with_producer
and we’re done.
Producers
It’s pretty important to understand rayon’s concept of producers when we want
to implement parallel iterators. They work in hand in hand with a concept called
consumers. Put simply, producers produce elements and consumers consume them.
I know: thanks, Captain Obvious! But bear with me. Functions like fold
on
parallel iterators work with a FoldConsumer
under the hood
that consumes the produced elements. That’s about all we
need to know about consumers here, but we need to dive into producers a little
bit more. Producers are described by the rayon documentation like this:
A Producer is effectively a “splittable IntoIterator”. That is, a producer is a value which can be converted into an iterator at any time: at that point, it simply produces items on demand, like any iterator. But what makes a Producer special is that, before we convert to an iterator, we can also split it at a particular point using the
split_at
method.
So a producer allows us to split the range over which we iterate and it can
be made into a sequential iterator at any time. So let’s try and create a
producer for the elements of our DataCollection
. Let’s create a new structure
for the producer 3:
struct DataProducer<'a> {
data_slice : &'a [Data],
}
To implement the Producer
trait for this structure we have to know essentially three things.
- What is the sequential iterator into which this producer can be made?
- What type of item does said iterator produce?
- How can we split the producer into two at a given index?
Let’s start at the top. The sequential iterator should be the iterator over a
borrowed slice of Data
, i.e. std::slice::Iter<'a,Data>
.
This implies that the type of item returned from this iterator is &'a Data
. It is
worth noting that Producer
requires the returned iterator to implement both
DoubleEndedIterator
as well as ExactSizeIterator
.
This is no problem for us because slice iterators implement both these traits4.
Finally, we can split our producer by splitting the borrowed slice by
using split_at
.
Our implementation can thus look like this:
impl<'a> Producer for DataProducer<'a> {
type Item = &'a Data;
type IntoIter = std::slice::Iter<'a, Data>;
fn into_iter(self) -> Self::IntoIter {
self.data_slice.iter()
}
fn split_at(self, index: usize) -> (Self, Self) {
let (left, right) = self.data_slice.split_at(index);
(
DataProducer { data_slice: left },
DataProducer { data_slice: right },
)
}
}
And just like that we have our producer. Let’s add some convenience functionality to go from a parallel iterator to a producer. This will come in handy momentarily.
impl<'a> From<ParDataIter<'a>> for DataProducer<'a> {
fn from(iterator: ParDataIter<'a>) -> Self {
Self {
data_slice: iterator.data_slice,
}
}
}
Finally, we can revisit our implementation of IndexedParallelIterator
for
ParDataIter
and fill in the missing piece.
impl<'iter> IndexedParallelIterator for ParDataIter<'iter> {
fn with_producer<CB: ProducerCallback<Self::Item>>(
self,
callback: CB,
) -> CB::Output {
let producer = DataProducer::from(self);
callback.callback(producer)
}
// --- other methods unchanged ---
// [...]
}
If you are wondering what on earth a producer callback is, I recommend to
read the appropriately titled section “What on earth is ProducerCallback
”
in the rayon README. For us as implementors, we just need to remember
to invoke that callback function on a producer that we create from
our iterator. We do that by using the slightly awkward (but very cleverly
designed) callback.callback(producer)
syntax.
Usage and Ergonomics
Now we have sucessfully implemented a parallel iterator. There is one final thing we need to do before we can use it. We have to expose an interface on our data structure to get one. We can for example expose a member function
impl DataCollection {
pub fn parallel_iterator(&self) -> ParDataIter {
ParDataIter {
data_slice : &self.data,
}
}
}
Now we can call this function on our collection to obtain a parallel iterator. That is a perfectly valid way to obtain parallel iterators. As a matter of fact, for structures that have more than one way to iterate over their data it is good practice to implement descriptive member functions that return different kinds of parallel iterators. Think e.g. of a matrix that can have element wise, column wise, and row wise parallel iterators.
However, if there is just one reasonable way of iterating over the elements
in our data structure, rayon has a nice feature through blanket implementations.
There is a trait IntoParallelRefIterator
that exposes a function par_iter()
that iterates over references of elements.
We don’t implement this trait directly, but we get it for free when implementing
IntoParallelIterator
for &DataCollection
. So let’s do that:
impl<'a> IntoParallelIterator for &'a DataCollection {
type Iter = ParDataIter<'a>;
type Item = &'a Data;
fn into_par_iter(self) -> Self::Iter {
ParDataIter { data_slice: &self.data }
}
}
Example Code on the Rust Playground
You can find all the code plus more on the playground. The playground also code includes an implementation of iterators over mutable data. Putting everything together, allows us to do something like this:
fn main() {
let mut data = DataCollection{
data : vec![1, 2, 3, 4]
};
data
.par_iter_mut()
.for_each(|x| *x = -*x);
println!("data = {:?}", data);
let sum_of_squares: Data = data
.par_iter()
.map(|x| x * x)
.sum();
println!("sum = {}", sum_of_squares);
}
Parallel Iterators for Mutable Data
So far we have only seen an implementation to immutably iterate over our data.
The good thing is that adding parallel iterators for mutable data
is dead simple, because we can just replace all our &'a
with &'a mut
for mutable iteration. So what we do is create a second iterator for mutable iteration
ParDataIterMut
that references a mutable slice. We implement the two iterator
traits just as above. That means we’ll have to create an analogous DataProducerMut
,
plug everything together again and voilà we’re done5. The playground
link above has the code for mutable iterators as well.
Conclusion
Multithreading is hard and it is a testament to the genius design of the
rayon library, that so much of the complexity is abstracted away from us.
Ninetynine percent of the time we can just replace iter()
for par_iter()
and enjoy the magic. We usually don’t have to know how to implement our own parallel
iterators, but if you find yourself wanting to, I hope this tutorial has given
you an idea of how to go about it. Now is probably a good time to look at all the
prior art that I mentioned at the beginning of this article, if you haven’t
already.
Endnotes
-
As a matter of fact, those iterators implement
ExactSizeIterator
as well asDoubleEndedIterator
, which will be important later. ↩ -
Note that
i32
isSend
, which is also important later. ↩ -
The astute reader will have noticed that the producer here looks just like the parallel iterator structure. I have seen this code duplication inside of the rayon codebase as well. There’s no reason why we could not implement the
Producer
trait on our parallel iterator type. This will not work for every use case, but it certainly would for our particular example. I have also written parallel iterators where I was able to modify the sequential iterators so that they implemented theProducer
trait. This, of course, is only possible if you own the codebase that contains the sequential iterators. We’ll stick to the most general case here and won’t bother ourselves with reducing code duplication too much. ↩ -
If the sequential iterator for your use case does not implement these traits, this gets trickier. You can either try and implement them for the iterator (if you own the codebase) or create a new sequential iterator that implements them, possibly by wrapping the existing one. ↩
-
While it truly is that simple for our case, it does not have to be in all cases. The borrow checker might complain about certain code containing mutable references that it will accept for immutable references. ↩
Comments
You can comment on this post using your GitHub account.
Join the discussion for this article on this ticket. Comments appear on this page instantly.