Andre Bogus Andre "llogiq" Bogus is VP of Engineering at Aleph Alpha GmbH, a Rust contributor, and Clippy maintainer. A musician turned programmer, he has worked in many fields, from voice acting, to programming, to teaching, to managing software projects. He enjoys learning new things and telling others about them.

Rust serialization: What’s ready for production today?

9 min read 2540

Rust Serialization: What's Ready for Production Today?

Serialization has always been a strong point of Rust. In particular, Serde was available well before Rust 1.0.0 was released (though the derive macro was unstable until 1.15.0). The idea behind this is to use traits to decouple the objects and (de)serialize from the serialization format — a very powerful idea. Format writers only need to implement Serde’s (de)serializer traits and users can #[derive(Serialize, Deserialize)] to get serialization for their objects, regardless of the format.

Of course, there are format-specific crates, such as protocol buffers, bincode, FlatBuffers, etc. Those can offer good compile-time and runtime performance, but they lock the data into their respective protocols, often with implementations available in other languages. For many uses, and especially in polyglot environments, this is an acceptable tradeoff.

In this guide, we’ll zoom in on both kinds of frameworks, considering API usability and performance. While I’m sure you’ll find plenty of value in examining this juxtaposition, make no mistake: we are comparing apples to oranges.

For our benchmark, we’ll use this relatively simple data structure (please don’t use this for anything in production):

pub enum StoredVariants {
    YesNo(bool),
    Small(u8),
    Signy(i64),
    Stringy(String),
}

pub struct StoredData {
    pub variant: StoredVariants,
    pub opt_bool: Option<bool>,
    pub vec_strs: Vec<String>,
    pub range: std::ops::Range<usize>,
}

The benchmark will then serialize and deserialize a slice of those StoredDatas. We’ll measure the time it takes to compile the benchmark, as well as the time to bytes and back.

Serde

Serde, the incumbent serialization/deserialization library, is elegant, flexible, fast to run, and slow to compile.

The API has three facets. The Serialize and Deserialize traits bridge the formats to the data structures the user might want to serialize and deserialize. You rarely need to implement those traits manually since Serde comes with powerful derive macros that allow many tweaks to the output format. For example, you can rename fields, define defaults for missing values on deserialization, and more. Once you have derived or implemented those traits, you can use the format crates, such as serde_json or bincode, to (de)serialize your data.

The third facet is the Serializer and Deserializer traits that format crates need to implement to (de)serialize arbitrary data structures. This means reducing the problem of serializing N data structures to M formats from M × N to M + N.

Because Serde relies heavily on monomorphisation to facilitate great performance despite its famous flexibility, compile time has been an issue from the beginning. To counter this, multiple crates have appeared, from miniserde, to tinyserde, to nanoserde. The idea behind these tools is to use runtime dispatch to reduce code bloat due to monomorphisation.

Our benchmark will serialize, then deserialize our data structure to a Vec<u8> with sufficient capacity to ensure allocation for the output does not disturb the benchmark.

We made a custom demo for .
No really. Click here to check it out.

JSON

serde_json crate allows serialization to and deserialization from JSON, which is plain text and thus (somewhat) readable, at the cost of some overhead during parsing and formatting.

Serializing can be done with to_string, to_vec, or to_writer with _pretty-variants to write out nicely formatted instead of minified JSON. For deserializing, serde_json has from_reader, from_string, and from_vec. serde_json also has its own Value type with to_value and from_value functions.

Of course, it’s not the fastest approach. Serializing StoredData takes about a quarter of a microsecond on my machine. Deserializing takes less than half a microsecond. The minified data takes up 100 bytes. Prettyfied, it becomes 153 bytes. The overhead will vary depending on the serialized types and values. For example, a boolean might ideally take up 1 bit, but it will take at least 4 bytes to serialize to JSON (true, not counting the key).

YAML

YAML’s name is a recursive acronym: YAML ain’t markup language. This is another textual format with multiple language bindings like JSON, but it’s very idiosyncratic. Serializing takes about 3 microseconds and deserializing takes about 7, so it’s more than 10 times slower than JSON for this particular workload. The size is 99 bytes, comparable with JSON.

Using YAML probably only makes sense if your data is already in YAML.

RON

Rusty Object Notation is a new, Rust-derived textual format. It’s a wee bit terser than JSON at 91 bytes, but slower to work with. Serialization took a tad more than half a microsecond; deserializing took roughly two microseconds. This is slower than JSON, but not egregiously so.

bincode

Like serde_json, bincode also works with serde to serialize or deserialize any types that implement the respective traits. Since bincode uses a binary format, there are obviously no String-related methods. serialize creates a Vec<u8> and serialize_into takes a &mut Writer along with the value to serialize.

Because Writer is implemented for a good deal of types (notably &mut Vec<u8>, File, and TcpStream), this simple API gives us plenty of bang for the buck. Deserialization works similarly with deserialize, which takes a &[u8], and deserialize_from takes any type that implements Read.

A small downside of all this genericity is that when you get the arguments wrong, the type errors may be confusing. Case in point: when I wrote the benchmark, I forgot a &mut at one point and had to look up the implementors of the Write trait before I found the solution.

Performance-wise, it is, predictably, faster than serde_json. Serializing took roughly 35 nanoseconds and deserializing a bit less than an eighth of a microsecond. The serialized data ended up taking 58 bytes.

MessagePack

MessagePack is another binary format with multiple language bindings. It prides itself on being very terse on the wire, which our benchmark case validated: the data serialized to just 24 bytes.

The Rust implementation of the MessagePack protocol is called rmp, which also works with serde. The interface (apart from the small differences in naming) is the same as the above. Its thriftiness when it comes to space comes with a small performance overhead compared to bincode. Serializing takes a tenth of a microsecond, while deserializing comes at roughly one-third of a microsecond.

jsonway

jsonway provides an interface to construct a serde_json JsonValue. However, there is no derive macro, so we have to implement the serialization by hand.

struct DataSerializer<'a> {
        data: &'a StoredData,
}

impl Serializer for DataSerializer<'_> {
        fn root(&self) -> Option<&str> { None }
        fn build(&self, json: &mut jsonway::ObjectBuilder) {
                let data = &self.data;
                use StoredVariant::*;
                json.object("variant", |json| match data.variant {
                        YesNo(b) => { json.set("YesNo", b) },
                        Small(u) => { json.set("Small", u) },
                        Signy(i) => { json.set("Signy", i) },
                        Stringy(ref s) => { json.set("Stringy", s) },
                });
                match data.opt_bool {
                         Some(true) => json.set("opt_bool", true),
                         Some(false) => json.set("opt_bool", false),
                         None => {},
                };
                json.array("vec_strs", |json| json.map(data.vec_strs.iter(), |s| s[..].into()));
                json.object("range", |json| {
                        json.set("start", data.range.start);
                        json.set("end", data.range.end);
                });
        }
}

Note that this will only construct a serde_json::Value, which is pretty fast (to the tune of only a few nanoseconds), but not exactly a serialized object. Serializing this cost us about 5 millis, which is far slower than using serde_json directly.

CBOR

Concise Binary Object Representation (CBOR) is another binary format that in our test came out on the larger side, taking 72 bytes. Serialization was speedy enough at roughly 140 nanoseconds, but deserialization was, unexpectedly, slower at almost half a millisecond.

Postcard

The Postcard crate was built for use in embedded systems. At 41 bytes, it’s a good compromise between size and speed, because at 60 nanoseconds to serialize and 180 nanoseconds to deserialize, it’s roughly 1.5x slower than bincode, at roughly 70 percent of the message size.

The relatively fast serialization and the thrifty format are a natural fit for embedded systems. MessagePack might overtax the embedded CPU, whereas we often have a beefier machine to deserialize the data.

FlexBuffers

FlexBuffers is a FlatBuffers-derived, multilanguage, binary, schemaless format. In this benchmark, it performed even worse than RON for serialization and worse than JSON for deserialization. That said, the format is as compact as Postcard with 41 bytes.

I would only use this if I had to work with a service that already uses it. For a freely chosen polyglot format, both JSON and msgpack best it in every respect.

There are other Serde-based formats, but those are mainly to interface to existing systems — e.g., Python’s Pickle format, Apache Hadoop’s Avro, or DBus’ binary wire format.

FlatBuffers

From Google comes a polyglot serialization format with bindings to C, C++, Java, C#, Go, Lua, and Rust, among others. To bring them all together, FlatBuffers has its own interface definition language, which you’ll have to learn (I had to learn it while writing this).

It’s got structs, enums (which are C-style single-value enums), unions, and, for some reason, tables. Interestingly, the latter are FlatBuffers’ main way to define types. They work like Rust structs with all-Optional fields. Besides this, structs are the same, only with nonoptional fields. This is done to facilitate interface evolution.

Apart from that, the basic types are mostly there, only with different names:

Rust FlatBuffers
u8, u16, u32, u64 uint8, uint16, uint32, uint64
i8, i16, i32, i64 int8, int16, int32, int64
bool bool
String string
[T; N], e.g., [String; 5] [T:N], e.g., [string:5]
Vec<T> [T]

FlatBuffer’s unions are akin to Rust enums but they always have a NONE variant and all variants must be tables, presumably to allow for changes later on.

Below is the StoredData from above as a FlatBuffers schema, which goes into a storeddate.fbs file. The fbs extension stands for “FlatBuffer Schema.”

file_identifier "SDFB";

table Bool {
    value: bool;
}

table Uint8 {
    value: uint8;
}

table Int64 {
    value: int64;
}

table String {
    value: string;
}

union StoredVariants {
    Bool,
    Uint8,
    Int64,
    String
}

struct Range {
    start: uint64;
    end: uint64;
}

table StoredData {
    variant_content: StoredVariants;
    opt_bool: bool;
    vec_strs: [String];
    range: Range;
}

root_type StoredData;

FlatBuffers appears to lack a direct representation of pointer-length integers (e.g., usize nor of Ranges), so in this example, I just picked uint64 and an array of length 2 to represent them. This is less than ideal on 32-bit machines. The documentation also tells us that FlatBuffers will store integer values as little endian, so on big endian machines, this will cost you in terms of performance. But that’s the price to pay for being network widely applicable.

Compiling this to Rust code requires the flatc, which is available as a Windows binary. It may also be in your Linux distribution’s repository. Otherwise, it can be built from source. To build it, you’ll need bazel, a build tool developed by Google.

After installing that, you can clone the FlatBuffers repo (I used the 1.12.0 release) and build it with bazel build flatc.

Once the build is done, the executable will be at bazel-out/k8-fastbuild/bin/flatc. Putting it on the $PATH allows the following command line.

$ flatc --rust stored_data.fbs

Now we have a storeddata_generated.rs file we can include! in our code. Obviously, this is not our original data structure, but for the sake of comparability, we’ll benchmark the serialization and deserialization via this slightly modified type. Deserialization is basically a memcpy, so it’s nearly free. However, this is actually misleading since the accessors do all the actual work. To account for this, I added code to convert FlatBuffer’s types into our own type on deserialization.

Abomonation

This crate is a slight misnomer, because it really is an abomination. Using it is definitely unsafe, also likely unsound. Using it for anything but benchmarking to measure the maximum theoretical performance of serialization and deserialization is downright inadvisable. You have been warned.

Abomonation does not use serde and has its own Abomonation trait to both serialize and deserialize any data instead. What it really does is basically a memcpy, plus fixing the occasional pointer so it can handle things like Vec and String. However, for now, it lacks handling of BTreeMap, HashMap, VecDeque, and other standard data structures — including Range, which we use in our StoredData. I cloned the repository and set up a pull request to implement the Abomonation trait for Range. Until it’s merged, I’ll use my own branch in this benchmark.

For deserialization, we need to keep the data alive as long as we want to use the decoded values because Abomonatio won’t copy the memory — it’ll opt to reuse it in place. This also means we have to copy the data on each benchmark run. Otherwise, we would corrupt the data.

While the resulting data takes up 116 bytes, it is indeed very fast. Serialization takes a bit more than 15ns, and deserialization take just a smidgen more than 10ns, even with the additional copy.

Again, please be warned before using it in production. For many types, it is actually unsound, and every time you use it, the coding gods kill a sweet little crab. Or something.

Results

The following table shows all formats with the serialized size and rounded time to serialize/deserialize (measured on my machine — your mileage will vary).

Format Crate version Bytes Time to serialize Time to deserialize
json 1.0.57 100 250ns 450ns
yaml 0.8.13 99 3µs 7µs
ron 0.6.0 91 650ns 2µs
bincode 1.3.1 58 35ns 120ns
msgpack 0.14.4 24 100ns 200ns
cbor 0.11.1 72 140ns 460ns
postcard 0.5.1 41 60ns 180ns
flexbuffers 0.1.1 41 1.5µs 750ns
flatbuffers 0.6.1 104 180ns 120nsith
abomonation master* 116 15ns 10ns

*With an additional change to allow for serializing Ranges

Conclusion

Serialization really is a strong point of Rust. The libraries are mature and fast.

I feel I should sing Serde’s praise here. It’s a great piece of engineering and I highly recommend it.

Let’s quickly summarize what we learned about the other choices:

  • For your choice of format, if you need fast serialization and deserialization, bincode is the best you can do
  • For the smallest possible serialized size, MessagePack is the format to beat, though you pay with more runtime on deserialization
  • Postcard offers a good compromise between size and speed that allows for embedded usage
  • FlatBuffers are complex, feel decidedly un-Rust-y, and take up more space than they should. Unless you use the schema definition in multiple languages, there is really no reason to use it. Even then, JSON is faster
  • JSON is the fastest of the three readable formats, which makes sense since it has seen wide industry usage and benefits from SIMD optimizations

Regarding maturity, only bincode and JSON are marked with a 1.* major version number. Still, there’s a tendency in the Rust world to be very careful — perhaps even too careful — when it comes to going 1.0, so this doesn’t say much about the actual maturity of the crates.

I found all of the Serde-based crates easy to use, though more consistency of interfaces between the crates wouldn’t hurt. The benchmark serializes with to_writer, serialize_into, serialize(Serializer::new(_)), to_slice, or to_bytes and deserializes with from_slice, from_bytes, from_read_ref, or deserialize.

My benchmark code is available on GitHub. If you find problems or improvements, feel free to send me an issue or PR.

LogRocket: Full visibility into production Rust apps

Debugging Rust applications can be difficult, especially when users experience issues that are difficult to reproduce. If you’re interested in monitoring and tracking performance of your Rust apps, automatically surfacing errors, and tracking slow network requests and load time, try LogRocket.

LogRocket is like a DVR for web apps, recording literally everything that happens on your Rust app. Instead of guessing why problems happen, you can aggregate and report on what state your application was in when an issue occurred. LogRocket also monitors your app’s performance, reporting metrics like client CPU load, client memory usage, and more.

Modernize how you debug your Rust apps — .

Andre Bogus Andre "llogiq" Bogus is VP of Engineering at Aleph Alpha GmbH, a Rust contributor, and Clippy maintainer. A musician turned programmer, he has worked in many fields, from voice acting, to programming, to teaching, to managing software projects. He enjoys learning new things and telling others about them.

Leave a Reply