Serialization has always been a strong point of Rust. In particular, Serde was available well before Rust 1.0.0 was released (though the derive macro was unstable until 1.15.0). The idea behind this is to use traits to decouple the objects and (de)serialize from the serialization format — a very powerful idea. Format writers only need to implement Serde’s (de)serializer traits and users can #[derive(Serialize, Deserialize)]
to get serialization for their objects, regardless of the format.
Of course, there are format-specific crates, such as protocol buffers, bincode
, FlatBuffers, etc. Those can offer good compile-time and runtime performance, but they lock the data into their respective protocols, often with implementations available in other languages. For many uses, and especially in polyglot environments, this is an acceptable tradeoff.
In this guide, we’ll zoom in on both kinds of frameworks, considering API usability and performance. While I’m sure you’ll find plenty of value in examining this juxtaposition, make no mistake: we are comparing apples to oranges.
For our benchmark, we’ll use this relatively simple data structure (please don’t use this for anything in production):
pub enum StoredVariants {
YesNo(bool),
Small(u8),
Signy(i64),
Stringy(String),
}
pub struct StoredData {
pub variant: StoredVariants,
pub opt_bool: Option<bool>,
pub vec_strs: Vec<String>,
pub range: std::ops::Range<usize>,
}
The benchmark will then serialize and deserialize a slice of those StoredData
s. We’ll measure the time it takes to compile the benchmark, as well as the time to bytes and back.
Serde, the incumbent serialization/deserialization library, is elegant, flexible, fast to run, and slow to compile.
The API has three facets. The Serialize
and Deserialize
traits bridge the formats to the data structures the user might want to serialize and deserialize. You rarely need to implement those traits manually since Serde comes with powerful derive macros that allow many tweaks to the output format. For example, you can rename fields, define defaults for missing values on deserialization, and more. Once you have derived or implemented those traits, you can use the format crates, such as serde_json
or bincode
, to (de)serialize your data.
The third facet is the Serializer
and Deserializer
traits that format crates need to implement to (de)serialize arbitrary data structures. This means reducing the problem of serializing N data structures to M formats from M Ă— N to M + N.
Because Serde relies heavily on monomorphisation to facilitate great performance despite its famous flexibility, compile time has been an issue from the beginning. To counter this, multiple crates have appeared, from miniserde, to tinyserde, to nanoserde. The idea behind these tools is to use runtime dispatch to reduce code bloat due to monomorphisation.
Our benchmark will serialize, then deserialize our data structure to a Vec<u8>
with sufficient capacity to ensure allocation for the output does not disturb the benchmark.
serde_json
crate allows serialization to and deserialization from JSON, which is plain text and thus (somewhat) readable, at the cost of some overhead during parsing and formatting.
Serializing can be done with to_string
, to_vec
, or to_writer
with _pretty
-variants to write out nicely formatted instead of minified JSON. For deserializing, serde_json
has from_reader
, from_string
, and from_vec
. serde_json
also has its own Value
type with to_value
and from_value
functions.
Of course, it’s not the fastest approach. Serializing StoredData
takes about a quarter of a microsecond on my machine. Deserializing takes less than half a microsecond. The minified data takes up 100 bytes. Prettyfied, it becomes 153 bytes. The overhead will vary depending on the serialized types and values. For example, a boolean might ideally take up 1 bit, but it will take at least 4 bytes to serialize to JSON (true
, not counting the key).
This is a crate that tries to make the most of our CPU’s vector units to make JSON serializing really fast. Which it unsurprisingly does. More surprising (at least to me) is that deserialization lagged considerably in comparison. Perhaps this is just not simple enough to vectorize, or my benchmark does something wrong.
YAML’s name is a recursive acronym: YAML ain’t markup language. This is another textual format with multiple language bindings like JSON, but it’s very idiosyncratic. Serializing takes about 3 microseconds and deserializing takes about 7, so it’s more than 10 times slower than JSON for this particular workload. The size is 99 bytes, comparable with JSON.
Using YAML probably only makes sense if your data is already in YAML.
Rusty Object Notation is a new, Rust-derived textual format. It’s a wee bit terser than JSON at 91 bytes, but slower to work with. Serialization took a tad more than half a microsecond; deserializing took roughly two microseconds. This is slower than JSON, but not egregiously so.
bincode
Like serde_json
, bincode
also works with serde
to serialize or deserialize any types that implement the respective traits. Since bincode
uses a binary format, there are obviously no String
-related methods. serialize
creates a Vec<u8>
and serialize_into
takes a &mut Writer
along with the value to serialize.
Because Writer
is implemented for a good deal of types (notably &mut Vec<u8>
, File
, and TcpStream
), this simple API gives us plenty of bang for the buck. Deserialization works similarly with deserialize
, which takes a &[u8]
, and deserialize_from
takes any type that implements Read
.
A small downside of all this genericity is that when you get the arguments wrong, the type errors may be confusing. Case in point: when I wrote the benchmark, I forgot a &mut
at one point and had to look up the implementors of the Write
trait before I found the solution.
Performance-wise, it is, predictably, faster than serde_json
. Serializing took roughly 35 nanoseconds and deserializing a bit less than an eighth of a microsecond. The serialized data ended up taking 58 bytes.
MessagePack is another binary format with multiple language bindings. It prides itself on being very terse on the wire, which our benchmark case validated: the data serialized to just 24 bytes.
The Rust implementation of the MessagePack protocol is called rmp
, which also works with serde
. The interface (apart from the small differences in naming) is the same as the above. Its thriftiness when it comes to space comes with a small performance overhead compared to bincode
. Serializing takes a tenth of a microsecond, while deserializing comes at roughly one-third of a microsecond.
jsonway provides an interface to construct a serde_json
JsonValue
. However, there is no derive
macro, so we have to implement the serialization by hand.
struct DataSerializer<'a> {
data: &'a StoredData,
}
impl Serializer for DataSerializer<'_> {
fn root(&self) -> Option<&str> { None }
fn build(&self, json: &mut jsonway::ObjectBuilder) {
let data = &self.data;
use StoredVariant::*;
json.object("variant", |json| match data.variant {
YesNo(b) => { json.set("YesNo", b) },
Small(u) => { json.set("Small", u) },
Signy(i) => { json.set("Signy", i) },
Stringy(ref s) => { json.set("Stringy", s) },
});
match data.opt_bool {
Some(true) => json.set("opt_bool", true),
Some(false) => json.set("opt_bool", false),
None => {},
};
json.array("vec_strs", |json| json.map(data.vec_strs.iter(), |s| s[..].into()));
json.object("range", |json| {
json.set("start", data.range.start);
json.set("end", data.range.end);
});
}
}
Note that this will only construct a serde_json::Value
, which is pretty fast (to the tune of only a few nanoseconds), but not exactly a serialized object. Serializing this cost us about 5 millis, which is far slower than using serde_json
directly.
Concise Binary Object Representation (CBOR) is another binary format that in our test came out on the larger side, taking 72 bytes. Serialization was speedy enough at roughly 140 nanoseconds, but deserialization was, unexpectedly, slower at almost half a millisecond.
The Postcard crate was built for use in embedded systems. At 41 bytes, it’s a good compromise between size and speed, because at 60 nanoseconds to serialize and 180 nanoseconds to deserialize, it’s roughly 1.5x slower than bincode, at roughly 70 percent of the message size.
The relatively fast serialization and the thrifty format are a natural fit for embedded systems. MessagePack might overtax the embedded CPU, whereas we often have a beefier machine to deserialize the data.
FlexBuffers is a FlatBuffers-derived, multilanguage, binary, schemaless format. In this benchmark, it performed even worse than RON for serialization and worse than JSON for deserialization. That said, the format is as compact as Postcard with 41 bytes.
I would only use this if I had to work with a service that already uses it. For a freely chosen polyglot format, both JSON and msgpack best it in every respect.
There are other Serde-based formats, but those are mainly to interface to existing systems — e.g., Python’s Pickle format, Apache Hadoop’s Avro, or DBus’ binary wire format.
From Google comes a polyglot serialization format with bindings to C, C++, Java, C#, Go, Lua, and Rust, among others. To bring them all together, FlatBuffers has its own interface definition language, which you’ll have to learn (I had to learn it while writing this).
It’s got struct
s, enum
s (which are C-style single-value enum
s), union
s, and, for some reason, table
s. Interestingly, the latter are FlatBuffers’ main way to define types. They work like Rust struct
s with all-Option
al fields. Besides this, struct
s are the same, only with nonoptional fields. This is done to facilitate interface evolution.
Apart from that, the basic types are mostly there, only with different names:
Rust | FlatBuffers |
u8, u16, u32, u64 | uint8, uint16, uint32, uint64 |
i8, i16, i32, i64 | int8, int16, int32, int64 |
bool | bool |
String | string |
[T; N], e.g., [String; 5] | [T:N], e.g., [string:5] |
Vec<T> | [T] |
FlatBuffer’s union
s are akin to Rust enums but they always have a NONE
variant and all variants must be table
s, presumably to allow for changes later on.
Below is the StoredData
from above as a FlatBuffers schema, which goes into a storeddate.fbs
file. The fbs
extension stands for “FlatBuffer Schema.”
file_identifier "SDFB";
table Bool {
value: bool;
}
table Uint8 {
value: uint8;
}
table Int64 {
value: int64;
}
table String {
value: string;
}
union StoredVariants {
Bool,
Uint8,
Int64,
String
}
struct Range {
start: uint64;
end: uint64;
}
table StoredData {
variant_content: StoredVariants;
opt_bool: bool;
vec_strs: [String];
range: Range;
}
root_type StoredData;
FlatBuffers appears to lack a direct representation of pointer-length integers (e.g., usize
nor of Range
s), so in this example, I just picked uint64
and an array of length 2
to represent them. This is less than ideal on 32-bit machines. The documentation also tells us that FlatBuffers will store integer values as little endian, so on big endian machines, this will cost you in terms of performance. But that’s the price to pay for being network widely applicable.
Compiling this to Rust code requires the flatc
, which is available as a Windows binary. It may also be in your Linux distribution’s repository. Otherwise, it can be built from source. To build it, you’ll need bazel
, a build tool developed by Google.
After installing that, you can clone the FlatBuffers repo (I used the 1.12.0 release) and build it with bazel build flatc
.
Once the build is done, the executable will be at bazel-out/k8-fastbuild/bin/flatc
. Putting it on the $PATH
allows the following command line.
$ flatc --rust stored_data.fbs
Now we have a storeddata_generated.rs
file we can include!
in our code. Obviously, this is not our original data structure, but for the sake of comparability, we’ll benchmark the serialization and deserialization via this slightly modified type. Deserialization is basically a memcpy (or even no memcpy, but then you can only do it once, which would have made benchmarking that much harder), so it’s nearly free. However, if you don’t want to yield control over your types to flatbuffers, you need code to convert flatbuffer’s types into our own type on deserialization. This is what I benchmarked.
After we published this post, people asked why I left out Cap’n Proto, so I added it to the benchmark. It works similarly to flatbuffers, but the interface is somewhat impenetrable, so I cannot guarantee the results. The high runtime suggests I did something wrong.*
UPDATE, Sept. 25, 2020: One of the Cap’n Proto crate maintainers sent a PR my way that showed I did do something wrong: I used a nested struct to represent an Option<bool>
, where using an OptionBool
enum (like the one in my optional crate) would be a better fit. The updated benchmark results show some improvement (and the resulting bytestream is reduced a bit), but it’s still slower than comparable solutions.
UPDATE, Nov. 24, 2020: There has been another update from the Cap’nProto team that improved performance again, although as they explained to me they cannot match the performance of other solutions because they do far more checks on the data. Which might be a good point, security-wise.
This crate is a slight misnomer, because it really is an abomination. Using it is definitely unsafe, also likely unsound. Using it for anything but benchmarking to measure the maximum theoretical performance of serialization and deserialization is downright inadvisable. You have been warned.
Abomonation does not use serde
and has its own Abomonation
trait to both serialize and deserialize any data instead. What it really does is basically a memcpy, plus fixing the occasional pointer so it can handle things like Vec
and String
. However, for now, it lacks handling of BTreeMap
, HashMap
, VecDeque
, and other standard data structures — including Range
, which we use in our StoredData
. I cloned the repository and set up a pull request to implement the Abomonation
trait for Range
. Until it’s merged, I’ll use my own branch in this benchmark.
For deserialization, we need to keep the data alive as long as we want to use the decoded values because Abomonatio
won’t copy the memory — it’ll opt to reuse it in place. This also means we have to copy the data on each benchmark run. Otherwise, we would corrupt the data.
While the resulting data takes up 116 bytes, it is indeed very fast. Serialization takes a bit more than 15ns, and deserialization take just a smidgen more than 10ns, even with the additional copy.
Again, please be warned before using it in production. For many types, it is actually unsound, and every time you use it, the coding gods kill a sweet little crab. Or something.
The following table shows all formats with the serialized size and rounded time to serialize/deserialize (measured on my machine — your mileage will vary).
Format | Crate version | Bytes | Time to serialize | Time to deserialize |
json |
1.0.57 | 100 | 284.85 ns | 443.51 ns |
simd-json |
0.3.22 | 100 | 77.819 ns | 456.60 ns |
yaml |
0.8.13 | 99 | 3.2847 us | 8.0961 us |
ron |
0.6.0 | 91 | 642.23 ns | 1.1271 us |
bincode |
1.3.1 | 58 | 43.451 ns | 127.10 ns |
msgpack |
0.14.4 | 24 | 93.329 ns | 197.28 ns |
cbor |
0.11.1 | 72 | 194.83 ns | 534.99 ns |
postcard |
0.5.1 | 41 | 55.363 ns | 210.12 ns |
flexbuffers |
0.1.1 | 41 | 1.4346 us | 815.03 ns |
flatbuffers |
0.6.1 | 104 | 167.30 ns | 114.11 ns |
capnproto |
0.13.6 | 96 | 143.10 ns | 357.23 ns |
packed |
0.13.6 | 39 | 231.22 ns | 485.49 ns |
abomonation |
master* | 116 | 15.157 ns | 12.711 ns |
*With an additional change to allow for serializing Range
s that has been merged, but not published
Serialization really is a strong point of Rust. The libraries are mature and fast.
I feel I should sing Serde’s praise here. It’s a great piece of engineering and I highly recommend it.
Let’s quickly summarize what we learned about the other choices:
bincode
is the best you can dosimd-json
crateRegarding maturity, only bincode
and JSON are marked with a 1.
* major version number. Still, there’s a tendency in the Rust world to be very careful — perhaps even too careful — when it comes to going 1.0, so this doesn’t say much about the actual maturity of the crates.
I found all of the Serde-based crates easy to use, though more consistency of interfaces between the crates wouldn’t hurt. The benchmark serializes with to_writer
, serialize_into
, serialize(Serializer::new(_))
, to_slice
, or to_bytes
and deserializes with from_slice
, from_bytes
, from_read_ref
, or deserialize
.
My benchmark code is available on GitHub. If you find problems or improvements, feel free to send me an issue or PR.
Debugging Rust applications can be difficult, especially when users experience issues that are hard to reproduce. If you’re interested in monitoring and tracking the performance of your Rust apps, automatically surfacing errors, and tracking slow network requests and load time, try LogRocket.
LogRocket is like a DVR for web and mobile apps, recording literally everything that happens on your Rust application. Instead of guessing why problems happen, you can aggregate and report on what state your application was in when an issue occurred. LogRocket also monitors your app’s performance, reporting metrics like client CPU load, client memory usage, and more.
Modernize how you debug your Rust apps — start monitoring for free.
Hey there, want to help make our blog better?
Join LogRocket’s Content Advisory Board. You’ll help inform the type of content we create and get access to exclusive meetups, social accreditation, and swag.
Sign up nowLearn how to manage memory leaks in Rust, avoid unsafe behavior, and use tools like weak references to ensure efficient programs.
Bypass anti-bot measures in Node.js with curl-impersonate. Learn how it mimics browsers to overcome bot detection for web scraping.
Handle frontend data discrepancies with eventual consistency using WebSockets, Docker Compose, and practical code examples.
Efficient initializing is crucial to smooth-running websites. One way to optimize that process is through lazy initialization in Rust 1.80.
4 Replies to "Rust serialization: What’s ready for production today?"
Hi Ander,
Thanks for the interesting article.
I try to understand the background of the serialization/deserialization methods on the Serde. For example, how the BinCode serialization method encodes data and which metadata store after encoding.
Best,
Saeed
This is great. I really appreciate this concise overview which is exactly what I needed to start shopping. One thing though that would help immensely is if, at least in the comparison table, the units were all standard or if there was an option to make them standard or to sort them. For today, I will calculate them in a spreadsheet (happy to share, obviously) but I’m sure the next person who wants to use this the same way will appreciate the lack of an extra step.
Thanks for all the great work!
Thanks for the nice words. The units *are* all standard SI units. But I think you may actually mean they should be the same unit (e.g. nanoseconds). The downside is that this would yield some pretty big numbers which aren’t that easy to parse either. I’ll try and see if a nanosecond-based comparison is any easier to read.
MessagePack is a self describing protocol and therefore has to encode the field names (‘variant’,’opt_bool’,’vec_strs’,’range’). I really can’t see how this would fit into 24 bytes total, only the field names would take 28 bytes (so without any protocol overhead and values) if I’m counting correctly…