Andre Bogus Andre "llogiq" Bogus is VP of Engineering at Aleph Alpha GmbH, a Rust contributor, and Clippy maintainer. A musician turned programmer, he has worked in many fields, from voice acting, to programming, to teaching, to managing software projects. He enjoys learning new things and telling others about them.

How to organize your Rust tests

8 min read 2306

How to Organize Your Rust Tests

Whenever you write any kind of code, it’s critical to put it to the test. In this guide, we’ll walk you through how to test Rust code.

But before we get to that, I want to explain why it’s so important to test. To put it plainly, code has bugs. This unfortunate truth was uncovered by the earliest programmers and it continues to vex programmers to this day. Our only hope is to find and fix the bugs before we bring our code to production.

Testing is a cheap and easy way to find bugs. The great thing about unit tests is that they are inexpensive to set up and can be rerun at a modest cost.

Think of testing as a game of hide-and-seek on a playground after dark. You could bring a flashlight, which is highly portable and durable but only illuminates a small area at any given time. You could even combine it with a motor to rotate the light to reveal more, random spots. Or, you could bring a large industrial lamp, which would be heavy to lug around, difficult to set up, and more temporary, but it would light up half the playground on its own. Even so, there would still be some dark corners.

Testing methodology

There is a whole spectrum of test methodologies, from test-driven design, to full-coverage testing, to the test-whenever-you-feel-like-it approach. I won’t judge you no matter what your preference, but I’ll note that my testing strategy depends heavily on the task at hand.

When to assert

If the function under review has preconditions that must be true for all possible inputs or post-conditions that must hold when the function returns, it’s acceptable to assert those conditions within the method, provided checking the condition is not prohibitively expensive in terms of time and/or memory.

In some cases, a debug_assert! will be OK. Doing this is usually a win because those assertions will be checked whether the code is under test or not. You can always remove them later once you no longer need them. Consider them part of the scaffolding to build a program.

Tests usually know their outputs and can assert the exact equivalence. That said, it’s sometimes OK to “just” have the code under test check itself.

Dual-use doctests

Your first round of testing should usually consist of doctests for the happy paths of our API. It’s often acceptable to not assert anything, provided the code has sufficient self-checks. It’s OK to unwrap() or use the ? operator here.

A little-known feature of doctests is that you can still omit the main method if the last line is Ok::<_, T>(()) (for some type T). For example, my aleph-alpha-tokenizer crate has the following module-level doctest:

We made a custom demo for .
No really. Click here to check it out.

//! ```
//!# use std::error::Error;
//! use aleph_alpha_tokenizer::AlephAlphaTokenizer;
//!
//! let source_text = "Ein interessantes Beispiel";
//! let tokenizer = AlephAlphaTokenizer::from_vocab("vocab.txt")?;
//! let mut ids: Vec<i64> = Vec::new();
//! let mut ranges = Vec::new();
//! tokenizer.tokens_into(source_text, &mut ids, &mut ranges, None);
//! for (id, range) in ids.iter().zip(ranges.iter()) {
//!      let _token_source = &source_text[range.clone()];
//!      let _token_text = tokenizer.text_of(*id);
//!      let _is_special = tokenizer.is_special(*id);
//!      // etc.
//! }
//!# Ok::<_, Box<dyn Error + Send + Sync>>(())
//! ```

We see a few things here:

  • All module-level doc comment lines start with //! to refer to the outer scope. Other item comments usually start with /// to refer to the item below
  • Within the example, we can append a # hash mark to the line prefix to have rustdoc omit the line when rendering the example
  • The last line has the Ok(()) with explicit error type ascribed via the famous turbofish syntax

Having doctests for all public APIs is a very good start. It proves both that the methods work as intended in the nonerror case and that the API is at least somewhat usable. To return to our hide-and-seek analogy, it’s a very handy flashlight. You’ll only see a small area fo the playground, but it’s bright and in good detail.

Lead by example

It can also be useful, especially for library crates, to provide example programs that show typical usage. Those go into the examples subdirectory, and while they are not executed by cargo test, they can be executed via cargo run --``example <name>. Those examples can be fully fledged and are especially helpful for libraries to give possible users a good starting point.

Test things going wrong

Depending on the required level of assurance, this may be far from enough. Bugs tend to lurk in the various error cases. So it makes sense to add tests for all error classes that may happen to an extent that is economical.

For example, for a parser, I would write a test to verify that it behaves well with empty inputs. Those tests add nothing to the documentation, so I would add not a doctest, but a test method.

#[test]
#[should_panic(expected = "empty input")]
fn empty_input() {
    parse("").unwrap();
}

The expected parameter takes the expected output message substring of the panic. Without it, any panic is deemed a successful test, so you won’t be notified when your test breaks. Note that should_panic doesn’t work with Result-returning test functions. In the context of our playground game, this is a spotlight pointing toward the bushes.

Black and white boxes

In this case, the parse method belongs to the public interface, so the test is a black-box test, meaning it only uses the public API of your crate. Black-box tests usually belong in one or more files in the tests subdirectory of your crate. A good convention is to have one test file per module to make it easy to find the corresponding tests.

Sometimes it makes sense to test private functionality so you can better pinpoint a bug or regression. Those tests are called white-box tests. Because they need to access the crate internals, they must be defined within the crate. The best practice is to include a test submodule directly in your crate and only compile it under test.

#[cfg(test)]
mod test {
    use super::{parse_inner, check};

    #[test]
    fn test_parse_inner() { .. }

    #[test]
    fn test_check() { .. }
}

Verify that things fail the right way

You might even want to verify that something doesn’t compile. There are two crates to enable that. The first one, called compiletest, is part of the testing infrastructure of the Rust compiler and maintained by the Rust developers.

With UI testing, you simply write a Rust program that should fail to compile, and compiletestruns the compiler, writing the error output into a .stderr file per test file. If the error output changes, the test fails. The Rustc dev guide has good setup documentation.

For example, my compact_arena crate has a number of UI tests, one of which I’ll reproduce below.

extern crate compact_arena;

use compact_arena::mk_nano_arena;

fn main() {
    mk_nano_arena!(a);
    mk_nano_arena!(b);
    let i = a.add(1usize);
    let j = b.add(1usize);
    let _ = b[i];
}

Running the test will create a .stderr file in the target/debug/ui_tests crate subdirectories (you can configure this). Copying those files next to the test programs will make the test pass, as long as the compiler output stays the same. That means it will fail whenever the error messages are improved, which happens quite often.

Incidentally, when trying out this with a new Rustc, all the tests failed due to improved diagnostics, which mnotivated me to port the tests to compile_fail tests. Those embed matchers for the various error, warning, and help messages as comments, which makes them a bit more resilient against changes. The above test would look like this as a compile test:

extern crate compact_arena;

use compact_arena::mk_nano_arena;

fn main() {
    mk_nano_arena!(a);
    mk_nano_arena!(b);
    //~^ ERROR `tag` does not live long enough [E0597]
    let i = a.add(1usize);
    let j = b.add(1usize);
    let _ = b[i];
}

Each matcher must start with //~. Optionally, any number of ^s would move the expected line of the error message up by 1 each, as well as a substring of the actual error message. Again, you can read more about this in the Rustc dev guide.

The second, create, called trybuild, is younger and smaller. Written by the unstoppable David Tolnay, it’s quite similar to the UI tests in compiletest, but it only prints the expected and actual output, not the difference.

Note that doctests can also be marked compile_fail (optionally, with an error message substring), but I’d only use it if making some things unrepresentable in working code is one of the main selling point of your crate.

Test randomly

Now that we’ve shined a spotlight on a few different areas, it’s time to bring in the big guns. Remember that motor we talked about that rotates the flashlight to various spots on the playground? That’s our property test. The idea is to define a property that must hold and then randomly generate inputs to test with.

Of course, this only works if you have a clearly defined property to test with. Depending on your use case, it may be very simple or very difficult to come up with suitable properties. Some things to keep in mind:

  • Many functions have some sort of bounds. You should test that they stay within them
  • Sometimes, certain types have invariants — properties that always hold outside of the code within a module. For example, a Vec has a length and a capacity, and the length is always less than or equal to the capacity
  • If you have a simple version and a fast version of an algorithm, you can test that both arrive at the same solution.
  • Parsers may fail, but should not crash with random input

Good property testing tools will not only come up with a failing test case, but also shrink the input to something minimal that still exhibits the failure — a very helpful trait if you don’t want to look through 50,000 integers to find the three-integer sequence that triggered a bug.

There are two crates to help property testing. One is Andrew Gallant’s QuickCheck. You can call it from a unit test, and it will quickly generate 100 inputs to test with. For example, my bytecount crate has both a simple and a fast count function and tests them against each other.

use quickcheck::quickcheck;

quickcheck! {
    fn check_count_correct((haystack, needle): (Vec<u8>, u8)) -> bool {
        count(&haystack, needle) == naive_count(&haystack, needle)
    }
}

In this case, the inputs are fairly simple, but you can either derive values from bytes or implement QuickCheck’s Arbitrary trait for the types to test.

The other crate for random property testing is proptest. Compared to QuickCheck, it has a much more refined shrinking machinery that, on the other hand, also takes a bit more time to run. Otherwise, they are quite similar. In my experience, both produce good results.

Our example as proptest would look like this:

use proptest::prelude::*;

proptest! {
    #[test]
    fn check_count_correct(haystack: Vec<u8>, needle: u8) {
        prop_assert_eq!(count(&haystack, needle), naive_count(&haystack, needle));
    }
}

One benefit of proptest, besides the higher flexibility regarding strategy, is failure persistence: all found errors are written into files to be run automatically the next time. This creates a simple regression test harness.

Fuzzing

Let’s say we have a clever apparatus that will bias our light motor’s random gyrations to shine more light on previously dark places. Code coverage is used to steer the randomness away from already-tried code paths. This simple idea makes for a surprisingly powerful technique, one that has potential to uncover a whole lot of bugs.

The easiest way to do this from Rust is to use cargo-fuzz. This is installed with cargo install cargo-fuzz. Note that running the fuzz tests will require a nightly Rust compiler for now.

$ cargo fuzz init
$ cargo fuzz add count
$ # edit the fuzz target file
$ cargo fuzz run count

For my bytecount crate, one fuzz target looks like this:

#![no_main]
use libfuzzer_sys::fuzz_target;

use bytecount::{count, naive_count};
fuzz_target!(|data: &[u8]| {
    if !data.is_empty() {
        return;
    }
    let needle = data[0];
    let haystack = &data[1..];
    assert_eq!(count(&haystack, needle), naive_count(&haystack, needle));
});

I can happily attest that even after two days of running, libfuzzer didn’t find any error in this implementation. This fills me with great confidence in the correctness of my code.

Test helpers

Complex systems make for complex tests, so it’s often useful to have some test helpers. Especially for doctests, you can define functions under #[cfg(test)] so they won’t turn up in production code. Files in the tests subdirectory can, of course, contain arbitrary functions.

In rare cases, tests can benefit from mocking. There is a number of mocking crates for Rust. Alan Somers created a good comparison page.

The sum of its parts

Of course, unit tests only look at components in isolation. Integration tests often uncover problems in the definition of the interface between components, even when the components themselves are tested successfully. So don’t spend all your budgeted time on unit tests; leave some for integration tests, too.

Two Unit Tests, Zero Integration Tests

As always, know when to stop

Testing is a nice activity. The goals are clearly defined, and test code is usually very simple and easy to write. So it’s vital to keep in mind that the goal of tests is to find bugs. A test that never fails is pretty much worthless.

Unfortunately, you can’t foresee when a test will start to fail, so I wouldn’t suggest deleting tests that haven’t failed in a while. You’ll have to weigh the trade-off between having a fast test suite and guarding against regressions.

Plug: , a DVR for web apps

LogRocket is a frontend application monitoring solution that lets you replay problems as if they happened in your own browser. Instead of guessing why errors happen, or asking users for screenshots and log dumps, LogRocket lets you replay the session to quickly understand what went wrong. It works perfectly with any app, regardless of framework, and has plugins to log additional context from Redux, Vuex, and @ngrx/store.

In addition to logging Redux actions and state, LogRocket records console logs, JavaScript errors, stacktraces, network requests/responses with headers + bodies, browser metadata, and custom logs. It also instruments the DOM to record the HTML and CSS on the page, recreating pixel-perfect videos of even the most complex single-page apps.

.
Andre Bogus Andre "llogiq" Bogus is VP of Engineering at Aleph Alpha GmbH, a Rust contributor, and Clippy maintainer. A musician turned programmer, he has worked in many fields, from voice acting, to programming, to teaching, to managing software projects. He enjoys learning new things and telling others about them.

Leave a Reply