In this guide, we’ll explain everything you need to know about unsafe
Rust. We’ll focus on the following:
unsafe
codeBefore we explain how and when to use (or not use) unsafe
in Rust, I want to dispatch with a few persistent myths about unsafe code in Rust.
No. The distinction is subtle, but safe Rust code cannot violate the safety guarantees, as long as neither the compiler nor the unsafe code it builds on have any bugs that allow this to happen. So, unlike other low-level languages, where the safety of the code rests on each line of code plus the compiler implementation, you can reduce the attack surface you need to audit for errors considerably.
The RustBelt project mathematically proved that if you have a portion of safe code and a portion of unsafe code that guards its invariants, the safe code cannot break the guarantees as long as the unsafe code doesn’t let it.
As an aside, an invariant is a condition that is unchanged by all methods of a type or by all functions of a module.
Statistically speaking, less than 1 percent of Rust code found on Crates.io — which is likely less than 50 percent of code out there, but should nevertheless be a sufficiently representative sample — is unsafe code, and many crates have no single unsafe
line of code.
Yes, the standard library has more unsafe code than the average crate, which is to be expected, since many of the abstractions it offers cannot be implemented in safe Rust efficiently or at all. Also, we can rest assured that the standard library has received more review and is, therefore, more trustworthy than your average crate.
That is not to say it’s perfect — after all, bugs have been found in the past. Still, there are good efforts on both verifying and fuzzing large parts of the standard library, which makes all Rust code safer.
unsafe
, I can turn off the borrow checker, summon the devil, open a portal to the pain dimension, etc.No. The Rustonomicon meticulously lists the additional powers unsafe
grants you in exchange for guaranteeing to uphold the safety invariants in that part of the code. For example, you can:
unsafe
functions (including C functions, compiler intrinsics, and the raw allocator)unsafe
traitsunion
sThere is no dark ritual and no switching off any borrow checker. However, even those possibly benign-sounding powers can have consequences you should be mindful of:
unsafe
functions carries the risk of calling them with arguments that fail to meet their safety requirements, with possibly exploitable effectsunsafe
traits for types that fail to uphold their invariants could also lead to callers inadvertently causing their safety requirements to fail with possibly exploitable effectsunion
s might let you interpret data as types they don’t represent a valid instance of, or observe uninitialized data (if the types differ in length, one field could include the padding of another), both of which lead to undefined behavior and possibly exploitable effectsSo while unsafe code is not quite the fearsome beast some make it out to be, care needs to be taken to handle it safely. Then you can write safe foundations upon unsafe code.
No. Once you present a safe interface on top of unsafe code, your code either upholds the safety invariants no matter what or your code is unsound.
Some people feel very strongly about unsoundness, it’s no reason to get riled up. It’s still a bug, and you should address it openly and calmly. If the bug can be solved with a more careful design, go for it. In the meantime, you can state openly that your code is unsound and that users need to take extra care to not have the safety invariants violated. And if you come up with a sound design, yank all published unsound versions you and report the vulnerability.
The problem with undefined behavior is not that it will fail directly. In fact, it may never fail at all. It may also work right up until you put the code in production, at which point it could fail catastrophically. Or, it might work until a hacker takes a stab at it and crafts just the correct input to break your unsound code. Now all your users have a crypto-extortion Trojan on their PC.
Even running it multiple times will give you zero assurance that it will work the next time. As the space shuttle Columbia disaster shows, just because it worked 135 times doesn’t mean it cannot fail on the 136th attempt.
Leaking cannot be reliably avoided and does not, by itself, pose any danger to memory safety — though the operating system may stop your program or simply crash if you exhaust the available memory, this will at worst lead to denial of service. Therefore, it was deemed out of scope of the memory safety guarantees, and mem::forget
became a safe function. So if your code relies on some value not leaking for safety, at some point that leak may just happen and the loss of safety guarantees is on you.
Mind you, this myth is so nontrivial that it took until just before Rust 1.0 to finally allow leaking in safe code. The solution to this source of unsafety is usually leak amplification — leaking all possibly observable corrupt state that may result from a leak before attempting the unsafe action and putting everything back together afterwards. This way, a leak will become bigger, sometimes much bigger, but at least it cannot subvert memory safety.
The commonly accepted boundary is the module, of which the crate is a special kind. So the usual course of action is to create the unsafe
code in a module. This is usually not meant to be used from the outside, but it can sometimes be public because people might use unsafe methods in their code if they want to assume the ensuing responsibility in exchange for performance (or other things).
The next step is to write another module that will present a safe interface using the aforementioned unsafe code. This module should be the minimum abstraction to allow all other use cases — the core functionality, if you will. Leave out all the stuff that can be implemented by building upon this safe code. This is the part that needs careful auditing.
Finally, write the actual interface you intend people to use on top of your safe API. Since you’re in safe Rust, this code needs less attention. The compiler will uphold all its guarantees, provided you did a good job on your core interface implementation.
Now that we’ve dispelled the myths associated with unsafe
Rust code, we have but one thing to discuss before going into actual code.
unsafe
codeMore often than not, unsafe code is actually used in the quest for performance. But as I wrote in “How to write CRaP Rust code,” you should always have a slow and safe version working, if only as a snapshot to test and a baseline to benchmark against.
Just because unsafe code can sometimes be faster doesn’t mean that it has to be. Measure accordingly and keep the safe version if it turns out to be as fast or faster.
For example, while trying to speed up one of the Benchmark Game entries as an exercise, I wanted to remove an allocation by using an array instead of a Vec
, which required a bit of unsafe code to deal with uninitialized data. However, this turned out to be slower than the Vec
based version, so I dropped the effort. Cliff L. Biffle wrote about a similar experience in “Learn Rust the Dangerous Way.”
With unsafe
code, not only do you have less assurance by the compiler, the compiler also has less things assured to work with, so some optimizations may in fact be disabled to avoid breaking code. So always measure before making the switch, and keep the safe version around.
Ok, let’s get started!
When Rust hit 1.0.0, it had one unsafe function to get uninitialized memory: std::mem::uninitialized()
(there is also std::mem::zeroed()
, but the only distinction between the two is that the latter fills the returned memory region with 0
bytes).
This has been widely considered a bad idea, and now the function is deprecated and people are advised to use the std::mem::MaybeUninit
type instead. The reason for uninitialized
’s woes is that the value might implicitly be drop
ped, be it on panic or other early return. For example:
let x = std::mem::uninitialized(); this_function_may_panic(); mem::forget(x);
If the this_function_may_panic
function actually panics, x
gets dropped before we even get to forget
ting it. However, dropping an uninitialized value is undefined behavior, and as drops are usually implicit, it can be very hard to avoid this. Thus, MaybeUninit
was conceived to deal with potentially uninitialized data. The type will never drop automatically (like std::mem::ManuallyDrop
), is known by the compiler to be potentially uninitialized, and has a number of functions to handle uninitialized data soundly.
Let’s recap. We may not std::ptr::read
from uninitialized memory. We may not even reference it (create a &
or &mut
to it), since the contract of references requires the referenced value to be a valid instance of the referenced type, and uninitialized data usually is not (one exemption here is a reference to one or more MaybeUninit<_>
s, since that expressly doesn’t require initialization).
Because of that, we may also not drop it, since this would create a mutable reference (remember, fn drop(&mut self)
). We can transmute it to other types whose contract allow uninitialized data (this is still the canonical way of creating arrays of uninitialized values) or std::ptr::write
to the pointer we get from its as_mut_ptr()
method, or assign another MaybeUninit
to it — and that’s about it. Note that we can assign to a MaybeUninit
even if uninitialized, since the type does not drop.
As an example, let’s assume we want to create an array of values with a function. Either the component type of our array is not Copy
or has no const
initializer, or LLVM is unable to optimize out the double store for some reason. So we go unsafe
:
use std::mem::{MaybeUninit, transmute}; unsafe { // first part: initialize the array. This is one of the rare cases where // directly calling `assume_init` is OK, because an array of // `MaybeUninit` may contain uninitialized data. let mut array: [MaybeUninit<MyType>; 256] = MaybeUninit::uninit().assume_init(); // second part: initialize the data. This is safe because we assign // to a `MaybeUninit`, which is exempt from `drop`. for (i, elem) in array.iter_mut().enumerate() { *elem = MaybeUninit::new(calculate_elem(i)); } // third part: transmute to the initialized array. This works because // `MaybeUninit<T>` is guaranteed to have the same Layout as `T`. transmute::<_, [MyType; 256]>(array) }
If any of the calculate_elem(_)
calls fails, the whole array of MaybeUninit
s will be dropped. Because MaybeUninit
does not drop
its contents, all elements calculated so far will be leaked.
To avoid this, we need some extra moving parts:
use std::mem::{forget, MaybeUninit, transmute}; // first extra part: We need a "guard" that drops all *initialized* elements // on drop struct Guard<'a> { // a mutable ref to access the array from array: &'a mut [MaybeUninit<MyType>; 256], // the index until which all elements are initialized index: usize, } impl Drop for Guard<'_> { // drop all elements that are initialized fn drop(&mut self) { for i in 0..self.index { unsafe { std::ptr::drop_in_place(self.array[i].as_mut_ptr()); } } } } unsafe { let mut array: [MaybeUninit<MyType>; 256] = MaybeUninit::uninit().assume_init(); // second extra part: here we initialize the guard. From here on, it // borrows our array mutably. All access will be done through the guard // (because the borrow checker won't let us access `array` directly // while it's mutably borrowed). let mut guard = Guard { array: &mut array, index: 0 }; for i in 0..256 { guard.array[guard.index] = MaybeUninit::new(calculate_elem(i)); // update the index so `drop` will include the newly created element. guard.index += 1; } // third extra part: forget the guard to avoid dropping the initialized // elements and also end the borrow. forget(guard); transmute::<_, [MyType; 256]>(array) }
If you think that’s a lot of machinery just to initialize an array, you’d be right. Also, at this point, be sure to measure the effect on performance; I cannot say how a Vec<MyType>
will compare.
Anyway, this shows the canonical way to deal with uninitialized data: identify an invariant (“all elements until index
are initialized”), maintain it (“increment index after writing an element”), and you can reap the benefits — in this case, no leak in case of panics.
This approach also works well with data structures, although there the invariant is typically used differently. For example, in a Vec
before a resizing operation that would copy the memory over to a larger backing store, the length is set to zero and reinstated once the operation finishes (leak amplification). This ensures that a drop
operation cannot observe uninitialized or already freed memory.
Depending on your data structure, the invariants can be rather arcane. For example, you could use a bit set to encode the initialized elements, thus requiring one-eighths more memory than the plain array but allowing arbitrary elements to be set or unset. In this case, the invariant would be “where a bit is set, the value is initialized.” Rust’s HashMap
s basically do this.
Rust’s rules around aliasing — meaning how many things can read from or write to a location at every given point in time — are quite strict. But sometimes we do need to bend the rules a little.
To allow this, Rust has blessed a (unsafe
, of course) type with interior mutability — you can get a mutable pointer (not reference, of course) from an immutable borrow using the get(&self)
method. There’s also a get_mut(&mut self)
method that returns a mutable borrow to the contents.
This means that the compiler will assume that whatever is in any UnsafeCell
is aliased. The standard library offers a number of safe abstractions on top, notably Cell
, RefCell
, RwLock
, Mutex
and the various Atomic
* types. For example, AtomicBool
is defined as follows (annotations removed for brevity):
pub struct AtomicBool { v: UnsafeCell<u8>, }
Of course, the implementation has to make sure to avoid data races, which is done by using actual atomic operations via LLVM intrinsics. I have not checked what the upcoming Cranelift backend does, but it appears to have some kind of implementation, too.
Again, before you use UnsafeCell
directly, please check if any of the safe wrappers will work for you and determine whether going unsafe
gives you enough performance boost (or other benefits) to be worth it.
Rust’s standard library has a set of intrinsics per CPU type in the std::arch
module. All of them are defined as unsafe
, mainly because they may not be implemented on your CPU. Luckily there’s a canonical way to ensure you have a matching CPU either at compile time or runtime.
Let’s assume you have built your algorithm the “normal” way, looked at the assembly and decided that the autovectorizer didn’t do a good enough job on its own. It’s time to break out the big guns. You’re going to basically write assembly in Rust (because the arch
intrinsics are mapped to single machine instructions).
As stated above, you need to make sure that the user has the correct platform. The following code shows the way to check support at compile time and runtime:
// compile time check #[cfg(any(target_arch = "x86", target_arch = "x86_64"))] mod simd { fn calculate() -> .. { // this is a std library macro that allows us to safely access CPU // features by doing runtime detection if is_x86_feature_detected!("avx2") { // code using avx2 intrinsics here } else if is_x86_feature_detected!("sse2") // code using sse2 intrinsics here } else { // code using no special features, using e.g. avx2 intrinsics // would be UB! } } } #[cfg(not(any(target_arch = "x86", target_arch = "x86_64")))] mod simd { // fallback code here }
This example has only specialized code for x86
and x86_64
, plus various runtime-detected CPU-features. If you want your program to use SIMD intrinsics on other platforms (e.g., ARM-NEON), you’d need to add yet another #[cfg]
‘d module declaration. Needless to say, you’ll end up with a lot of code.
Apart from availability, some instructions also have requirements around alignment. To simplify a bit, alignment tells us how many of the last bits of an address must be zero. For example, a 32-bit value might have an alignment of 4
, which means the last two bits of its address should be zero. Please refer to the library documentation for the specifics and to the last chapter for help to get it right.
Let’s say you’re writing a kernel and need some really weird things with the stack pointer, or other stuff that absolutely, positively will require assembly. Rust has two foreign language interfaces: one to C and the other to assembly. Unfortunately, this is both unstable and unsafe, so you’ll need a nightly compiler, the #[feature(asm)]
attribute and an unsafe
block. Obviously, Rust cannot check what you do in the assembly code.
The specifics of using inline assembly are out of scope for this article. Please look at the Rust Book chapter or the RFC text. For the purpose of this article, inline assembly is just a funky variant of Foreign-Function Interface (FFI).
You have that large C codebase and want to move it to Rust, which is a daunting task. A good way to do this is to use the foreign function interface to move over smaller parts of the codebase first and then go module-by-module until the whole thing is in Rust and you can throw away the C (this is what librsvg has been doing, by the way). Or you want to use Rust from C++ code.
In either case, you have to build a bridge between the safe, coddled world of Rust and the hard, uncaring world beyond. And since that world out there is dangerous, of course you need unsafe
to do it.
First, make sure you have the interface right, lest you be faced with many unhappy hours spent debugging. The bindgen (for accessing C from Rust) and cbindgen (for accessing Rust from C) utilities are great for this.
If you access Rust from C (or C++ via a C interface), be mindful of the object’s lifetimes and keep the Rust objects’ lifetimes in Rust code — that is, have Rust drop
them, and vice versa for C structs/pointers. As we all know, Rust is very particular about who owns what and how long it needs to be accessible, so document your requirements.
If, on the other hand, you wrap C (or C++ via extern C
) in Rust, you will find that C libraries usually also need to take the lifetimes of data into account. Once you have the plain bindings in place, try to structure your types to take the lifetime relations into account. The unofficial patterns book has an instructive chapter about this.
If you interface with C++, you might want to use the cxx crate. However, please note that, unlike the usual bindings generators, cxx will not mark your functions as unsafe
! The authors’ argument here is that the binding mechanism, which builds partly in Rust and partly in C++ is safe, and it is up to you to audit the C++ side. You may still want to wrap the resulting interface in a friendly interface that is hopefully impossible to misuse in an unsafe way.
As we’ve seen, writing unsafe Rust needs a bit more care to do correctly than safe Rust, because you can no longer rely on the compiler to have your back. So if you go this route, you best take all the help you can get:
Miri is the Rust MIR — this is the intermediate representation Rust uses to optimize its programs before handing over to LLVM or Cranelift — interpreter. You can install it via rustup with rustup component add miri
. Running it is done by writing cargo miri
instead of plain cargo
— e.g., cargo miri test
will run your tests in the interpreter.
Miri has a number of tricks up its sleeve to detect undefined behavior such as accessing uninitialized data, and will tell you if something is amiss. However, it will only detect undefined behavior on code paths that are actually executed, so it cannot give you full assurance.
Rust’s official lint suite has a number of lints that may also be helpful within unsafe
code. At the very least, the missing_safety_docs
lint will help you keep all your unsafe methods’ requirements documented. Also, the Rust compiler doesn’t activate all lints by default; calling rustc -W help
will show you an up to date list.
Prusti is still in development (and currently having some problems while being updated to the newest stable Rust, so the newest stable version targets some 2018 Rust compiler), but it’s very promising tool that allows you to mathematically verify that some post-conditions about the code hold given certain preconditions.
Basically, you construct a proof that certain invariants about your code hold, which is ideal for safe abstractions that have to uphold unsafe code’s invariants. See the user guide for more information.
The Rust Fuzz Book lists a number of fuzzers that can be used with Rust. It currently has detailed sections for using cargo-fuzz/libfuzzer and American Fuzzy Lop/afl.rs. Both will create oodles of test inputs for your code and run it to find some combination that triggers a crash.
For detecting the use of uninitialized memory, libdiffuzz is a drop-in memory allocator that will initialize each memory allocation with different values. By running your code twice and comparing results, you can determine if any portion of uninitialized memory made it into the result. Even better, memory sanitizer is now supported at least on nightly (the tracking issue lists various sanitizers and their support across platforms), and will detect any uninitialized read, even if it doesn’t change the result.
While fuzzers are statistically more likely to find code paths than plain property tests, there is no guarantee that they will find a specific code path after any amount of time. I personally had a bug in a tokenizing function that turned out to be triggered by a unicode wide whitespace I found within a random document on the internet after running cargo fuzz for a week with billions of test cases didn’t find anything. Still, the Rust fuzz trophy case shows a good number of bugs that were uncovered by fuzzing. If you find one, please add it.
rutenspitz is a procedural macro that is great for model testing stateful code — e.g., data structures. Model testing means you have a “model” — i.e., a simple but slow version that models the behavior you want to ensure and you use this to test your unsafe implementation against. It will then generate sequences of operations to test whether the equality relation holds. If you’ve followed my advice above, you already should have a safe implementation to test against.
Debugging Rust applications can be difficult, especially when users experience issues that are hard to reproduce. If you’re interested in monitoring and tracking the performance of your Rust apps, automatically surfacing errors, and tracking slow network requests and load time, try LogRocket.
LogRocket is like a DVR for web and mobile apps, recording literally everything that happens on your Rust application. Instead of guessing why problems happen, you can aggregate and report on what state your application was in when an issue occurred. LogRocket also monitors your app’s performance, reporting metrics like client CPU load, client memory usage, and more.
Modernize how you debug your Rust apps — start monitoring for free.
Would you be interested in joining LogRocket's developer community?
Join LogRocket’s Content Advisory Board. You’ll help inform the type of content we create and get access to exclusive meetups, social accreditation, and swag.
Sign up nowBuild scalable admin dashboards with Filament and Laravel using Form Builder, Notifications, and Actions for clean, interactive panels.
Break down the parts of a URL and explore APIs for working with them in JavaScript, parsing them, building query strings, checking their validity, etc.
In this guide, explore lazy loading and error loading as two techniques for fetching data in React apps.
Deno is a popular JavaScript runtime, and it recently launched version 2.0 with several new features, bug fixes, and improvements […]