Dylan Tientcheu I build experiences to make your everyday life simpler.

Encoding and decoding packages for Rust

6 min read 1742

Encoding and Decoding in Rust

Encoding is the process of converting data from one form to another. Decoding means exactly the same thing. Though it’s often defined as a process, encoding also refers to a particular form of data (character encoding or media encoding).

Character encoding/decoding is particularly crucial in programming because computers recognize only binary data. It’s how we translate a sequence of characters (letters, numbers, symbols, punctuations, etc.) into a specialized format to help us speak the computer’s language and understand what it says back.

In this guide, we’ll demonstrate how to encode and decode your data in Rust.

Encoding and decoding in Rust

If you think encoding and decoding sound like a drag, you’re not alone. There are many edge cases and the process can be quite complex.

Fortunately, in Rust, as in many other programming languages, encoding and decoding are handled by modules that have been thoroughly tested against most of these edge cases. Efficient encoding and decoding libraries are especially critical for a language as close to the machine as Rust.

Encoding in Rust is relatively simple. Though it doesn’t come in the core Rust package, the few solutions developed by the community handle the job quite well. These tools enable you to send a string of characters to encode or decode through a function and receive the pursued result (encoded or decoded string).

base64 Rust library

base64 is designed to encode and decode to/from base64 as fast and precisely as possible. As its name suggests, it works only with base64. It literally has two transforming functions — encode() and decode() — along with configuration functions to help you shape the way it decodes and encodes.

extern crate base64;
use base64::{encode};
fn main() {
    let a = "hello world";
    println!("{}", encode(a)); // -> aGVsbG8gd29ybGQ=
}

Believe it or not, base64 is actually a first-class necessity when dealing with binary files on your computer. Base64 is commonly used to encode binary data (images, sound files etc), which are used in everything we share on the web, from emails attachment to saving files in our databases.

For a much deeper dive, head to base64 guru.

rust-encoding

rust-encoding supports a massive amount of character encoding (all those supported by the WHATWG standards). Quite uniquely, it handles errors by replacing the error received while treating a string by a specified value. When an error is detected, you can use the strict mode to completely stop execution=.

Encoding’s encode and decode methods convert a String to Vec<u8> and vice versa. Since there’s support for a lot of encoding types, the library ships with two ways to get your encoding:

    1. Encoding::all to which you attach the encoding you’ll use for the encoding process. All the unused encoding types are discarded from the binary
    2. encoding::label, which captures the encoding based on the label given and returns the static encoding type, resulting in a bigger binary
use encoding::{Encoding, EncoderTrap};
use encoding::all::ISO_8859_1; // use with all
use encoding::label::encoding_from_whatwg_label; // use with label

assert_eq!(ISO_8859_1.encode("caf\u{e9}", EncoderTrap::Strict),
Ok(vec![99,97,102,233]));
let euckr = encoding_from_whatwg_label("euc-kr").unwrap();
assert_eq!(euckr.name(), "windows-949");

rust-encoding is one of the top downloaded libraries (3.5k/week) even though it hasn’t been updated in four years. It’s safe to say that it’s extremely stable and robust.

data-encoding

data-encoding handles 15 different encoding types and allows users to define their own custom edge cases. Encoding and decoding your characters with data-encoding is very specific and simple, assuming you know what type you want to encode to or decode from.

BASE64.encode(&input_to_encode)
HEXLOWER.decode(&input_to_decode)

The library gives you the latitude to define your own little-endian ASCII base conversion encodings for bases of size 2 to 64, which is unachievable with native encoding types because they are particular use cases.

data-encoding is a small, modern library. Quite popular and well-maintained, it’s a great choice if you ever have to work with any of their supported encoding types.



integer-encoding

We consider integers as characters, thus they also need to be encoded and decoded. integer-encoding supports two integer types: FixedInt and VarInt. It also provides efficient read and write types to simplify working with the integers.

This library has a special use case, well-known among developers working with Google’s protocol buffers, that illustrates the need to encode to/decode from noncommon types.

Rust encoding and decoding for the web

One of the most practical and common use cases for encoding and decoding is on the web, where we make two different entities (backend and frontend) with the use of strings, forms, arrays, JSON, etc. It may look seamless, but a lot goes on behind the scenes to make it happen.

Let’s zoom in on some of the libraries that help make this communication possible.

urlencoded

People often wonder why browsers and servers are able to read parameters and form data from URLs, along with other bizarre characters such as spaces, question marks, etc. Though this is a standard for the web, every language should be able to read and understand it to be able to communicate with a user’s browser.

In Rust, urlencoded plays the role of a middleware in the Iron web framework. Its duty is to parse URL query strings into hashmaps, which are much easier to read in Rust. The values gotten are kept in a Vec to ensure that no information is lost in case a key is duplicated. Hence, the query a=b&a=c will result in a mapping from a to [b, c].

urlencoded is also able to parse post body of the MIME type: application/x-www-form-urlencoded). This comes in handy when working with forms on the frontend.

urlencoded is a complete library when it comes to url encoding and decoding. Working with fewer than four dependencies, it is considered stable and ready to hop into production.

multer-rs

Heavily inspired by multipart and multipart-async, multer-rs is known for its ability to asynchronously parse multipart/form-data content types in Rust. Previous libraries couldn’t handle any async server, which is crucial because Rust’s ecosystem is moving toward asynchronicity.

multer-rs accepts a Stream of Bytes as a source and generates an iterable containing all the needed fields gotten from the multipart/form-data, including data from files, which could then be written in a custom file.

multer-rs rose in popularity due to the need for a multipart decoder that works asynchronously. The previous popular ones fell short. Although its maintainers are still working on new issues, multer-rs can be considered stable.

percent-encoding

percent-encoding is a great alternative to url-encoder. They share a lot of similar features, but percent-encoding is different because:

    1. It doesn’t rely on the Iron web framework
    2. It is a complete URL parser that can split a URL’s structure to get only a specific part rather than the query parameters
    3. It parses and serializes the application/x-www-form-urlencoded syntax, as used in HTML forms
// code from url docs
use url::{Url, Host, Position};
let issuelisturl = Url::parse(
"https://github.com/rust-lang/rust/issues?labels=E-easy&state=open"
)?

;assert!(issue_list_url.scheme() == "https"); // takes the scheme
assert!(issue_list_url.hoststr() == Some("github.com")); //get the url's host
assert!(issue_list_url.port() == None); // get the port if there's one
assert!(issue_list_url.path() == "/rust-lang/rust/issues"); // the whole path
assert!(issue_list_url.query() == Some("labels=E-easy&state=open")); // get only the query
assert!(&issue_list_url[Position::BeforePath..] == "/rust-lang/rust/issues?labels=E-easy&state=open");

This library handles edge cases gracefully, like having relative URLs. It enables you to join URLs and create new ones.

use url::Url;

let this_document = Url::parse("http://servo.github.io/rust-url/url/index.html")?;
let css_url = this_document.join("../main.css")?;
assert_eq!(css_url.as_str(), "http://servo.github.io/rust-url/main.css");

When comparing the developer experience, percent-encoding has an edge over url-encoded, especially if you’re not working with the Iron framework. Besides, although it still uses the old Rust from 2015, percent-encoding appears to be more robust.

base64-url

base64-url is — you guessed it — another base64 encoder, now in the URL. Though it sounds improbable, this is a handy feature when you need to shorten a URL or include some binary data.

blob-uuid

Sometimes you simply encode to shorten the length of a string. blob-uuid helps you do that with 36-character UUIDs, which changes to 22 after encoding. It can also help you hide a UUID if you ever want to share it in a URL.

let uuid = Uuid::parse_str("557c8018-5e21-4b74-8bb0-9040e2e8ead1").unwrap();
assert_eq!("VXyAGF4hS3SLsJBA4ujq0Q", blob_uuid::to_blob(uuid));

Playground

I prepared a playground for these libraries on my Repl.it. Make sure you add the library you want using Cargo.toml and hit cargo run on the terminal.

Serialization in Rust

Serialization means converting data (arrays, objects, and similar structures) into a single string so it can be stored or transmitted easily. Serialization is a very specific topic and is considered as a subset of encoding. However, for the sake of simplicity, this article doesn’t include serialization libraries.

Rust’s ecosystem offers some excellent serialization libraries, including serde_json, toml, and bincode. These also undergo a form of encoding and decoding behind the scenes.

Conclusion

There are many more Rust encoding libraries than what we described above. We highlighted some of the most common. In a real-life scenario, you would most likely choose an encoding library according to the format you need to work with.

Encoding and decoding aren’t limited to character encoding. There is also a wide variety of media encoding libraries for Rust.

All the libraries listed above are well-established, popular, and stable for production use. This is simply because encoding and decoding is a mandatory feature in any modern language. The list is diversified and each library handles particular use cases differently. That makes it quite difficult to compare them to one another.

Rust has a strong foundation of encoding and decoding libraries. However, I believe Rust would feel more mature if at least the most popular encoding types (UTF-8 and base64) were shipped in the core Rust and designed to be as malleable as the module available in Python.

LogRocket: Full visibility into web frontends for Rust apps

Debugging Rust applications can be difficult, especially when users experience issues that are difficult to reproduce. If you’re interested in monitoring and tracking performance of your Rust apps, automatically surfacing errors, and tracking slow network requests and load time, try LogRocket.

LogRocket is like a DVR for web and mobile apps, recording literally everything that happens on your Rust app. Instead of guessing why problems happen, you can aggregate and report on what state your application was in when an issue occurred. LogRocket also monitors your app’s performance, reporting metrics like client CPU load, client memory usage, and more.

Modernize how you debug your Rust apps — .

Dylan Tientcheu I build experiences to make your everyday life simpler.

Leave a Reply