BufRead
for faster Rust I/O speedOne of the many reasons people write applications in Rust is to have the highest possible performance. This is the sort of thing that is hard to compare across languages, but some benchmarks (like this one and this one) generally show that Rust is the fastest major language not named C or C++.
Rust makes it easy to write high performance code because its data copies are generally explicit; you know when they’re happening because you have to write .clone()
or something similar to make them happen. However, there are still ways where you can write lower-performing code in Rust without being aware of it, and using unbuffered I/O is one of those ways!
In this article, we’ll cover the following:
First, let’s go over a few definitions. Buffered I/O is input or output that goes through a buffer before going to or coming from disk. Unbuffered I/O is input or output that doesn’t go through a buffer.
Buffering I/O is important for performance optimization because doing many small reads or writes on disk can have a significant overhead; a system call has to be done and the driver for the disk has to set up to access data. In the worst case, all of that overhead has to be done for every character that is being read or written.
In reality, a lot of buffering can happen at various layers of calls, so it’s rarely that bad. But using unbuffered I/O in Rust can still have a noticeable impact on performance.
Let’s look at an example. This GitHub repo has a large text file (around 8 MB) containing a list of English word frequencies. Each line has a word made up of ASCII characters, followed by a space and then a number. We want to calculate the total of all the numbers. I wrote four different functions to do this and benchmarked them.
Whenever you’re concerned about performance, there are many things you can try to make your code run faster, but there’s no substitute for measuring how fast it runs to see what actually helps!
Unfortunately, benchmarking code can be tricky. The problem is that there are many things that can affect performance. To be sure you’re measuring what you care about, you need to take a number of measurements and average the results. But how many runs is enough? 10? 100? 1,000? Too low and your results won’t be reliable, too high and you’re wasting the computer’s time and yours.
The Crate criterion is a handy way to figure this out using statistics — it measures the difference between runs to see if the results are converging. It also has a warmup step where it runs the code a few times to make sure everything is loaded into any caches. You can read more about the analysis criterion does here. It also remembers the results of the last run and gives you a comparison, which is helpful when trying out changes!
To use criterion for this experiment, I added it with cargo add criterion
, then added these lines to Cargo.toml
:
[[bench]] name = "process_lines" harness = false
Then I added process_lines.rs
to the benches
directory, with a function to measure each of the approaches listed below. Each function looks something like this:
fn bench_unbuffered_one_character_at_a_time(c: &mut Criterion) { c.bench_function("unbuffered_one_character_at_a_time", |b| b.iter(|| read_unbuffered_one_character_at_a_time())); }
The argument to b.iter()
is what is actually being benchmarked. In this case, the function we want to benchmark doesn’t take arguments, so this works nicely.
Because you’ll be running these benchmarks in release mode, you need to be careful that the compiler doesn’t optimize away your function! You can use criterion::black_box()
to signal to the compiler not to do this. Here’s an example from the criterion book:
fn criterion_benchmark(c: &mut Criterion) { c.bench_function("fib 20", |b| b.iter(|| fibonacci(black_box(20)))); }
In our case, this isn’t necessary because we’re reading from a file; the compiler knows it’s not possible to optimize any of that away.
To run the benchmarks, simply run cargo bench
, which will build your crate in release mode and then call criterion to do the measurements. It will print out what it’s doing and the results, including any outliers it finds.
Here’s the output for the buffered_allocate_string_every_time()
function:
buffered_allocate_string_every_time time: [45.728 ms 45.784 ms 45.851 ms] change: [-49.792% -48.593% -47.393%] (p = 0.00 < 0.05) Performance has improved.
Here, the middle time
value is the median (45.784 ms), and it’s showing that performance has improved since the last run, and that result is statistically significant. (This is the difference plugging in my laptop makes!)
In order from slowest to fastest (as measured on my laptop while plugged in), here they are:
Speed: 10.1 seconds
This is the read_unbuffered_one_character_at_a_time()
function, which is implemented as:
pub fn read_unbuffered_one_character_at_a_time() -> io::Result<u64> { let mut file = File::open(FILENAME)?; let len = file.metadata().expect("Failed to get file metadata").len() as usize; let mut v: Vec<u8> = Vec::new(); v.resize(len, 0u8); for index in 0..len { file.read_exact(&mut v[index..(index+1)])?; } let s = String::from_utf8(v).expect("file is not UTF-8?"); let mut total = 0u64; for line in s.lines() { total += get_count_from_line(line); } Ok(total) }
This is the worst case scenario — the read_exact()
call reads exactly one character at a time until the whole file has been read, which, for this file, means more than eight million times! This takes more than 20 times longer than the next slowest method.
Speed: 45.8 milliseconds
This is the read_buffered_allocate_string_every_time()
function, which looks like this:
pub fn read_buffered_allocate_string_every_time() -> io::Result<u64> { let file = File::open(FILENAME)?; let reader = BufReader::new(file); let mut total = 0u64; for line in reader.lines() { let s = line?; total += get_count_from_line(&s); } Ok(total) }
Here, we’re using the BufReader
class to wrap the file and read it in a buffer-sized chunk at a time. (BufReader
implements the BufRead
trait, which can be implemented by any sort of reader that has an internal buffer). Then we can just call lines()
on the BufRead
to get an iterator over each line of the file, which is very convenient!
Note that, by default, BufReader
has a buffer size of 8 KB, though this may change in the future. If you want to change this, you can use BufReader::with_capacity()
instead of BufReader::new()
to construct it.
Speed: 29.4 milliseconds
This is the read_buffered_reuse_string()
function, which is implemented as:
pub fn read_buffered_reuse_string() -> io::Result<u64> { let file = File::open(FILENAME)?; let mut reader = BufReader::new(file); let mut string = String::new(); let mut total = 0u64; while reader.read_line(&mut string).unwrap() > 0 { total += get_count_from_line(&string); string.clear(); } Ok(total) }
This is very similar in concept to the previous function. The only difference is that we allocate one String
and pass this in to reader.read_line()
so it will fill in the line to the existing String
instead of allocating a new one. This small difference to avoid an allocation per line is enough to make this method run 1.5 times faster than the previous function!
Speed: 22.9 milliseconds
The final function we’ll look at is read_buffer_whole_string_into_memory()
, which looks like this:
pub fn read_buffer_whole_string_into_memory() -> io::Result<u64> { let mut file = File::open(FILENAME)?; let mut s = String::new(); file.read_to_string(&mut s)?; let mut total = 0u64; for line in s.lines() { total += get_count_from_line(line); } Ok(total) }
This is the extreme version of a buffer; here we allocate one big buffer and read the whole string into it all at once. This is the best way of showing that the number of read calls is really the determining factor in our performance; this function, which does only one read call, is the fastest one of all. It is 1.3 times faster than the next fastest version.
The downside to this technique is that you need enough memory to be able to hold all of the file contents at once. In this case, the file is only around 8 MB big, which is not much memory, but if you’re writing a program to process arbitrary files, this could easily fail. In general, it’s safer to use a BufReader
as described above; you can tweak it to increase its buffer size if you’re comfortable using more memory.
BufReader
is more capable than we’ve shown here; it’s capable of wrapping any struct that implements the Read
trait. Notably, this includes the TcpStream
struct, so you can use BufReader
for network connections too.
In a larger sense, whenever you’re making repeated calls to read from something, consider using buffered I/O; it can make a big difference in performance!
Debugging Rust applications can be difficult, especially when users experience issues that are hard to reproduce. If you’re interested in monitoring and tracking the performance of your Rust apps, automatically surfacing errors, and tracking slow network requests and load time, try LogRocket.
LogRocket is like a DVR for web and mobile apps, recording literally everything that happens on your Rust application. Instead of guessing why problems happen, you can aggregate and report on what state your application was in when an issue occurred. LogRocket also monitors your app’s performance, reporting metrics like client CPU load, client memory usage, and more.
Modernize how you debug your Rust apps — start monitoring for free.
Hey there, want to help make our blog better?
Join LogRocket’s Content Advisory Board. You’ll help inform the type of content we create and get access to exclusive meetups, social accreditation, and swag.
Sign up nowuseState
useState
can effectively replace ref
in many scenarios and prevent Nuxt hydration mismatches that can lead to unexpected behavior and errors.
Explore the evolution of list components in React Native, from `ScrollView`, `FlatList`, `SectionList`, to the recent `FlashList`.
Explore the benefits of building your own AI agent from scratch using Langbase, BaseUI, and Open AI, in a demo Next.js project.
Demand for faster UI development is skyrocketing. Explore how to use Shadcn and Framer AI to quickly create UI components.