Improving overconstrained Rust library APIs

In one of my earlier posts, “How to write CRaP Rust code,” I warned you about overusing generics. And for a binary crate or an initial version of any code, that is still a good idea.

However, when designing Rust library crate APIs, you can often use generics to good effect: being more lenient with our inputs may offer the caller the chance to avoid some allocations or otherwise find a different representation of the input data that suits them better.

In this guide, we’ll demonstrate how to make Rust library APIs more lenient without losing any functionality. But before we start, let’s examine the possible downsides of doing this.

First, generic functions offer the type system less information about what is what. If what was a concrete type now becomes an impl, the compiler will have a harder time inferring the types of each expression (and will probably fail more often). This may require your users to add more type annotations to get their code to compile, leading to arguably worse ergonomics.

Also, by specifying one concrete type, we get exactly one version of our function compiled into the resulting code. With generics, we either pay with dynamic dispatch’s runtime cost or risk bloating the binary with multiple versions by choosing monomorphization — in Rust lingo, we choose dyn Trait vs. impl Trait.

Which point on the tradeoff you choose depends mostly on the use case. Note that dynamic dispatch has some runtime cost, but code bloat will also increase cache misses and thus can negatively affect performance. As always, measure twice, code once.

Even so, there are some rules of thumb you can follow for all public methods.

Jump ahead:

A slice of traits
Let’s re-iterate
Into the woods
Keeping code bloat in check
What’s so bad about code bloat?

A slice of traits

Take a slice (&[T]) instead of a &Vec<T> if you can (that one actually has a clippy lint). Your callers may use a VecDeque, which has a .make_continuous() method that returns a &mut [T]) instead of a Vec, or perhaps an array.

If you can also take two slices, VecDeque::as_slices can work for your users without moving any values. You will, of course, still need to know your use case to decide whether that’s worth it.

If you only dereference your slice elements, you can use &[impl Deref<Target = T>]. Note that besides Deref, there is also the AsRef trait, which is quite often used in path handling, because std methods may take an AsRef<T> for a cheap reference conversion.

For example, if you’re taking a set of file paths, &[impl AsRef<Target = Path>] will work with far more types than &[String]:

fn run_tests(
    config: &compiletest::Config,
    filters: &[String],
    mut tests: Vec<tester::TestDescAndFn>,
) -> Result<bool, io::Error> { 
    // much code omitted for brevity
    for filter in filters {
        if dir_path.ends_with(&*filter) {
            // etc.
        }
    }
    // ..
}

The above might be expressed as:

fn run_tests(
    config: &compiletest::Config,
    filters: &[impl std::convert::AsRef<Path>],
    mut tests: Vec<tester::TestDescAndFn>,
) -> Result<bool, io::Error> { 
// ..

Now filters could be a slice of String, &str, or even Cow<'_, OsStr>. For mutable types, there is AsMut<T>. Similarly, if we require that any reference to T works the same as T itself in terms of equality, order and hashing, we can use Borrow<T> / BorrowMut<T> instead.

What does that even mean? It means that types implementing Borrow must guarantee that a.borrow() == b.borrow(), a.borrow() < b.borrow() and a.borrow().hash() return the same as a == b, a < b and a.hash() if the type in question implements Eq, Ord and Hash, respectively.

Let’s re-iterate

Similarly, if you only iterate over the bytes of a string slice, unless your code somehow requires the UTF-8-ness that str and String guarantee to work correctly, you can simply take an AsRef<[u8]> argument.

In general, if you only iterate once, you can even take either an Iterator<Item = T>. This allows your users to supply their own iterators which may use non-continuous slices of memory, intersperse other operations with your code or even calculate your inputs on the fly. Doing this, you don’t even need to make the item type generic, because the iterator can usually easily produce a T if one is needed.

In effect, you can use an impl Iterator<Item = impl Deref<Target = T>> if your code iterates only once; use a slice or two if you need the items more than once. If your iterator returns owned items, such as the recently added array IntoIterators, you can forgo the impl Deref and use impl Iterator<Item = T>.

Unfortunately, IntoIterator‘s into_iter will consume self, so there’s no generic way to take an iterator that lets us iterate multiple times — unless, perhaps, taking an argument of impl Iterator<_> + Clone, but that Clone operation might be costly, so I wouldn’t advise using it.

Into the woods

Not related to performance, but also often welcome is an implicit conversion of impl Into<_> arguments. This can often make an API that feels magical, but beware: Into conversions may be expensive.

Still, there are a few tricks you can pull for nice usability wins. For example, taking an Into<Option<T>> instead of an Option<T> will let your users omit the Some. For example:

use std::collections::HashMap;

fn with_optional_args<'a>(
    _foo: u32,
    bar: impl Into<Option<&'a str>>,
    baz: impl Into<Option<HashMap<String, u32>>>
) {
    let _bar = bar.into();
    let _baz = baz.into();
    // etc.
}

// we can call this in various ways:
with_optional_args(1, "this works", None);
with_optional_args(2, None, HashMap::from([("boo".into(), 0)]));
with_optional_args(3, None, None);

Again, there may be types that implement Into<Option<T>> in a costly fashion. This is yet another example where we may choose between a beautiful API and making costs obvious. In general, choosing the latter is usually considered idiomatic in Rust.

Keeping code bloat in check

Rust monomorphizes generic code. That means for each unique type your function gets called with, a version of all of its code using that specific type will be generated and optimized.

This has the upside that it leads itself to inlining and other optimizations that give Rust the great performance qualities we all know and love. It also has the downside that potentially a lot of code gets generated.

As a possible extreme example, consider the following function:

use std::fmt::Display;

fn frobnicate_array<T: Display, const N: usize>(array: [T; N]) {
    for elem in array {
        // ...2kb of generated machine code
    }
}

This function will be instantiated for each item type and array length each, even if we just iterate. Unfortunately, there is no way to avoid the code bloat and still avoid copying/cloning, because all those iterators contain their size in their type.

Over 200k developers use LogRocket to create better digital experiences

Learn more →

If we can do with referenced items, we can go unsized and iterate over slices instead:

use std::fmt::Display;

fn frobnicate_slice<T: Display>(slice: &[T]) {
    for elem in slice {
        // ...2kb of generated machine code
    }
}

This will at least only generate one version per item type. Even then, let’s say we only use the array or slice to iterate. We can then factor out a frobnicate_item method that is dependent on the type. What’s more, we can decide whether to use static or dynamic dispatch:

use std::fmt::Display;

/// This gets instantiated for each type it's called with
fn frobnicate_with_static_dispatch(_item: impl Display) {
    todo!()
}

/// This gets instantiated once, but adds some overhead for dynamic dispatch
/// also we need to go through a pointer
fn frobnicate_with_dynamic_dispatch(_item: &dyn Display) {
    todo!()
}

The outer frobnicate_array method now only contains a loop and a method call, which is not that much code to instantiate. Code bloat averted!

In general, it’s a good idea to take a good look at your method’s interface and see where the generics get either used or cast away. In both cases, there’s a natural border at which we can factor out a function that removes the generics.

If you don’t want all this typing and are OK with adding a small bit of compile time, you can use my momo crate to factor out generic traits such as AsRef or Into.

What’s so bad about code bloat?

For some background, code bloat has an unfortunate consequence: today’s CPUs employ a hierarchy of caches. While those allow for very good speed when dealing with local data, they lead to very nonlinear effects on usage. If your code takes up more of any cache, it may make other code go slower! So Amdahl’s law no longer helps you find the place to optimize when dealing with memory.

For one, that means that it may be counterproductive to optimize a part of your code in isolation by measuring a microbenchmark (because the whole code might actually become slower). For another, when writing library code, optimizing your library may pessimize your users’ code. But neither you nor they could learn that from microbenchmarks.

How, then, should we decide when to use dynamic dispatch and when to generate multiple copies? I don’t have a clear rule here, but I do note that dynamic dispatch is certainly underused in Rust! First, it has the stigma of being considered slower (which isn’t exactly wrong, considering the vtable lookups do add some overhead). Second, it’s often unclear how to get there while avoiding allocation.

Even so, Rust makes it easy enough to go from dynamic to static dispatch if measurement shows that it’s beneficial, and since dynamic dispatch can save a lot of compile time, I’d suggest starting dynamic where possible and only going monomorphic when measurement shows it to be faster. This gives us a fast turnaround time and thus more time to improve performance elsewhere. At the very best, have an actual application to measure, as opposed to a microbenchmark.

This concludes my rant on how to effectively use generics in Rust library code. Go forth and Rust happily!

LogRocket: Full visibility into web frontends for Rust apps

Debugging Rust applications can be difficult, especially when users experience issues that are hard to reproduce. If you’re interested in monitoring and tracking the performance of your Rust apps, automatically surfacing errors, and tracking slow network requests and load time, try LogRocket.

LogRocket is like a DVR for web and mobile apps, recording literally everything that happens on your Rust application. Instead of guessing why problems happen, you can aggregate and report on what state your application was in when an issue occurred. LogRocket also monitors your app’s performance, reporting metrics like client CPU load, client memory usage, and more.

Modernize how you debug your Rust apps — start monitoring for free.