Andre Bogus FollowAndre "llogiq" Bogus is a Rust contributor and Clippy maintainer. A musician-turned-programmer, he has worked in many fields, from voice acting and teaching, to programming and managing software projects. He enjoys learning new things and telling others about them.
Rust compression libraries
8 min read2376
Editor’s note: This post was updated on Jan. 8, 2021 to correct errors in the original benchmark.
Data compression is an important component in many applications. Fortunately, the Rust community has a number of crates to deal with this.
Unlike my review of Rust serialization libraries, it doesn’t make sense to compare performance between different formats. For some formats, all we have are thin wrappers around C libraries. Here’s hoping most will be ported to Rust soon.
There are two variants of compression utilities: stream compressors and archivers. A stream compressor takes a stream of bytes and emits a shorter stream of compressed bytes. An archiver enables you to serialize multiple files and directories. Some formats (such as the venerable .zip) can both collect and compress files.
For the web, there are only two stream formats that have achieved widespread implementation: gzip/deflate and Brotli. I list gzip and deflate under the same title because they implement the same underlying algorithm, and gzip adds more checks and header information. Some clients also allow for bzip2 compression, but this isn’t as widespread anymore since gzip can be made to get similar compression ratio with Zopfli (trading compression time), and you could use Brotli to go even smaller.
What are the best compression libraries for Rust?
Like any problem, there are myriad solutions that have different trade-offs in terms of runtime for compression and decompression, CPU and memory use vs. compression ratio, the capability to stream data, and safety measures such as checksums. We’ll focus only on lossless compression — no image, audio or video-related lossy compression formats.
For a somewhat realistic benchmark, I’ll use the following files to compress and decompress, ranging from very compressible to very hard to compress:
A 100MB file of zeroes created with cat /dev/zero | head -c $[1024 * 1024 * 100] > zeros.bin
The concatenated markdown of my personal blog, small and text-heavy (creaated with cat llogiq.github.io/_posts/*.md > blogs.txt)
An image of my cat
The x86_64 rustc binary of the current stable toolchain (1.47.0)
A TV recording of the movie “Hackers,” which I happen to have lying around
A 100MB file of pseudorandom numbers created with cat /dev/urandom | head -c $[1024 * 1024 * 100] > randoms.bin
All compression and decompression will be from and into memory. My machine has a Ryzen 3550H four-core CPU running Linux 5.8.18_1.
Note: Some of the compressors failed the roundtrip test, which means the compressed and then uncompressed data failed to match the original. This may mean my implementation of the benchmark is faulty, or it may be a fault in the library. I will conduct further tests. For now, I have marked the respective libraries with anasterisk (*).
Stream compression libraries for Rust
In the stream compressor department, the benchmark covers the following crates:
DEFLATE/zlib
DEFLATE is an older algorithm that uses a dictionary and Huffman encoding. It has three variants: plain (without any headers), gzip (with some headers, popularized by the UNIX compression utility of the same name), and zlib (which is often used in other file formats and also by browsers). We will benchmark the plain variant.
yazi 0.1.3 has a simple interface that will return its own Vec, but there is a more complex interface we benchmark that allows us to supply our own. On the upside, we can choose the compression level
deflate 0.8.6 does not allow supplying an output Vec, so its runtime contains allocation*
flate2 1.0.14 comes with three possible backends, two of which wrap a C implementation. This benchmark only uses the default backend because I wanted to avoid the setup effort — sorry
Snappy
Snappy is Google’s 2011 answer to LZ77, offering fast runtime with a fair compression ratio.
Also released in 2011, LZ4 is another speed-focused algorithm in the LZ77 family. It does away with arithmetic and Huffman coding, relying solely on dictionary matching. This makes the decompressor very simple.
There is some variance in the implementations. Though all of them use basically the same algorithm, the resulting compression ratio may differ since some implementations choose defaults that maximize speed whereas others opt for a higher compression ratio.
The ZStandard (or ‘zstd’) algorithm, published in 2016 by Facebook, is meant for real-time applications. It offers very fast compression and decompression with zlib-level or better compression ratios. It is also used in other cases where time is of the essence, e.g., in BTRFS file system compression.
zstd 0.5.3 binds the standard C-based implementation. This needs pkg-config and the libzstd library, including headers*
LZMA
Going in the other direction and trading runtime for higher compression, the LZMA algorithm extends the dictionary-based LZ algorithm class with Markov chains. This algorithm is quite asymmetric in that compression is far slower and requires much more memory than decompression. It is often used for Linux distribution’s package format to allow reduced network usage with agreeable decompression CPU and memory requirements.
Zopfli is a zlib-compatible compression algorithm that trades superior compression ratio for a long runtime. It is useful on the web, where zlib decompression is widely implemented.
Though Zopfli takes more time to compress, this is an acceptable tradeoff for reducing network traffic. Since the format is DEFLATE-compatible, the crate only implements compression.
zopfli 0.4.0* (Could not be unpacked matching the original using either deflate or fflate2)
Brotli
Brotli, developed by the Google team that created Zopfli, extends the LZ77-based compression algorithm with second-order context modeling, which gives it an edge in compressing otherwise hard-to-compress data streams.
brotli 3.3.0 is a rough translation of the original C++ source that has seen a good number of tweaks and optimization (as the high version number attests). The interface is somewhat unidiomatic, but works well enough.
Archiving libraries for Rust
For the archivers, the benchmark has the tar 0.4.30, zip 0.5.8, and rar 0.2.0 crates.
zip is probably the most well-known format. Its initial release was in 1989, and it’s probably the most widely supported format of its kind. tar, the venerable Tape ARchiver, is the oldest format, with an initial release in 1979. It actually has no compression of its own but, in typical UNIX fashion, delegates to stream archivers such as gzip (DEFLATE), bzip2 (LZW), and xz (LZMA). The rar format is somewhat younger than zip and rose to popularity on file sharing services due to less file overhead and slightly better compression.
Interfaces
Regarding stream processors, there are three possible options to implement the API. The easiest approach is to take a &[u8] slice of bytes and return a Vec<u8> with the compressed data. An obvious optimization is to not return the compressed data, but take a &mut Vec<u8> mutable reference to a Vec in which to write the compressed data instead.
The most versatile interface is obviously a method that takes a impl Read and impl Write, reading from the former and writing into the latter. This may not be optimal for all formats, though, because sometimes you need to go back and fix block lengths in the written output. This interface would leave some performance on the table, unless the output also implements Seek — which, for Vecs, can be done with a std::io::Cursor. On the other hand, it allows us to somewhat comfortably work with data that may not fit in memory.
In any event, for somewhat meaningful comparison, welll compress from RAM to RAM, preallocated where the API allows this. Exceptions are marked as such.
Some libraries allow you to set options to (de)activate certain features, such as checksums, and pick a certain size/runtime tradeoff. This benchmark takes the default configuration, sometimes varying compression level if this is readily available in the API.
Rust compression libraries: Benchmarks
Without further ado, here are the results, presented in six tables to avoid cluttering up your display:
LogRocket: Full visibility into web frontends for Rust apps
Debugging Rust applications can be difficult, especially when users experience issues that are difficult to reproduce. If you’re interested in monitoring and tracking performance of your Rust apps, automatically surfacing errors, and tracking slow network requests and load time, try LogRocket.
LogRocket is like a DVR for web and mobile apps, recording literally everything that happens on your Rust app. Instead of guessing why problems happen, you can aggregate and report on what state your application was in when an issue occurred. LogRocket also monitors your app’s performance, reporting metrics like client CPU load, client memory usage, and more.
Andre Bogus FollowAndre "llogiq" Bogus is a Rust contributor and Clippy maintainer. A musician-turned-programmer, he has worked in many fields, from voice acting and teaching, to programming and managing software projects. He enjoys learning new things and telling others about them.
Zip compressing 100mb random data in 60kb? That’s impossible. In fact, it’s very similar for all test inputs, another huge red flag.
What is happening (from a quick look): After compression, you take the position of the Cursor instead of the length of the compressed data – the whole point of a Cursor is that it’s seekable.
Also, take a look at the lz4_flex and lzzzz, decompressing in 5.3/7.6 ns – impossibly fast – regardless of input (with one exception that has realistic time). It’s not actually decompressing the data. I don’t know why though, maybe criterion::black_box is failing?
Yes, there is an error in the measurement of the ZIP benchmark – the code looks at the cursor position, but zip resets the cursor to write the length.
Yep, zip is beating them all hands down.
Hi, this is a nice comparison and you put quite some effort into it.
However, as like many other statistics, it would require additional details on how the data was measured to make the data usable.
Such as, are those numbers from a single run? Is it a mean value? If it is a mean value, how often was it executed (cold cache, hot cache, …)? If it is a mean value, how big are the outliers, variance or standard deviation.
Disclaimer: did not look at the Github project, since I prefer to have this information directly available with the data tables.
Zip compressing 100mb random data in 60kb? That’s impossible. In fact, it’s very similar for all test inputs, another huge red flag.
What is happening (from a quick look): After compression, you take the position of the Cursor instead of the length of the compressed data – the whole point of a Cursor is that it’s seekable.
Also, take a look at the lz4_flex and lzzzz, decompressing in 5.3/7.6 ns – impossibly fast – regardless of input (with one exception that has realistic time). It’s not actually decompressing the data. I don’t know why though, maybe criterion::black_box is failing?
Yes, there is an error in the measurement of the ZIP benchmark – the code looks at the cursor position, but zip resets the cursor to write the length.
Yep, zip is beating them all hands down.
Hi, this is a nice comparison and you put quite some effort into it.
However, as like many other statistics, it would require additional details on how the data was measured to make the data usable.
Such as, are those numbers from a single run? Is it a mean value? If it is a mean value, how often was it executed (cold cache, hot cache, …)? If it is a mean value, how big are the outliers, variance or standard deviation.
Disclaimer: did not look at the Github project, since I prefer to have this information directly available with the data tables.