A benchmark is a type of function that executes a code segment multiple times and compares each output against a standard, assessing the code’s overall performance level. Golang includes built-in tools for writing benchmarks in the testing
package and the go
tool, so you can write useful benchmarks without installing any dependencies.
In this tutorial, we’ll introduce some best practices for running consistent and accurate benchmarks in Go, covering the fundamentals of writing benchmark functions and interpreting the results.
To follow along with this tutorial, you’ll need a basic knowledge of the Go syntax and a working installation of Go on your computer. Let’s get started!
For benchmarking to be useful, the results must be consistent and similar for each execution, otherwise, it will be difficult to gauge the true performance of the code being tested.
Benchmarking results can be greatly affected by the state of the machine on which the benchmark is running. The effects of power management, background processes, and thermal management can impact the test results, making them inaccurate and unstable.
Therefore, we need to minimize the environmental impact as much as possible. When possible, you should use either a physical machine or a remote server where nothing else is running to perform your benchmarks.
However, if you don’t have access to a reserved machine, you should close as many programs as possible before running the benchmark, minimizing the effect of other processes on the benchmark’s results.
Additionally, to ensure more stable results, you should run the benchmark several times before recording measurements, ensuring that the system is sufficiently warmed up.
Lastly, it’s crucial to isolate the code being benchmarked from the rest of the program, for example, by mocking network requests.
Let’s demonstrate the fundamentals of benchmarking in Go by writing a simple benchmark. We’ll determine the performance of the following function, which computes all of the prime numbers between one and an integer:
// main.go func primeNumbers(max int) []int { var primes []int for i := 2; i < max; i++ { isPrime := true for j := 2; j <= int(math.Sqrt(float64(i))); j++ { if i%j == 0 { isPrime = false break } } if isPrime { primes = append(primes, i) } } return primes }
The function above determines if a number is a prime number by checking whether it is divisible by a number between two and its square root. Let’s go ahead and write a benchmark for this function in main_test.go
:
package main import ( "testing" ) var num = 1000 func BenchmarkPrimeNumbers(b *testing.B) { for i := 0; i < b.N; i++ { primeNumbers(num) } }
Like unit tests in Go, benchmark functions are placed in a _test.go
file, and each benchmark function is expected to have func BenchmarkXxx(*testing.B)
as a signature, with the testing.B
type managing the benchmark’s timing.
b.N
specifies the number of iterations; the value is not fixed, but dynamically allocated, ensuring that the benchmark runs for at least one second by default.
In the BenchmarkPrimeNumbers()
function above, the primeNumbers()
function will be executed b.N
times until the developer is satisfied with the stability of the benchmark.
To run a benchmark in Go, we’ll append the -bench
flag to the go test
command. The argument to -bench
is a regular expression that specifies which benchmarks should be run, which is helpful when you want to run a subset of your benchmark functions.
To run all benchmarks, use -bench=.
, as shown below:
$ go test -bench=. goos: linux goarch: amd64 pkg: github.com/ayoisaiah/random cpu: Intel(R) Core(TM) i7-7560U CPU @ 2.40GHz BenchmarkPrimeNumbers-4 14588 82798 ns/op PASS ok github.com/ayoisaiah/random 2.091s
goos
, goarch
, pkg
, and cpu
describe the operating system, architecture, package, and CPU specifications, respectively. BenchmarkPrimeNumbers-4
denotes the name of the benchmark function that was run. The -4
suffix denotes the number of CPUs used to run the benchmark, as specified by GOMAXPROCS
.
On the right side of the function name, you have two values, 14588
and 82798 ns/op
. The former indicates the total number of times the loop was executed, while the latter is the average amount of time each iteration took to complete, expressed in nanoseconds per operation.
On my laptop, the primeNumbers(1000)
function ran 14,588 times, and each call took an average of 82,798 nanoseconds to complete. To verify that the benchmark produces a consistent result, you can run it multiple times by passing a number to the -count
flag:
$ go test -bench=. -count 5 goos: linux goarch: amd64 pkg: github.com/ayoisaiah/random cpu: Intel(R) Core(TM) i7-7560U CPU @ 2.40GHz BenchmarkPrimeNumbers-4 14485 82484 ns/op BenchmarkPrimeNumbers-4 14557 82456 ns/op BenchmarkPrimeNumbers-4 14520 82702 ns/op BenchmarkPrimeNumbers-4 14407 87850 ns/op BenchmarkPrimeNumbers-4 14446 82525 ns/op PASS ok github.com/ayoisaiah/random 10.259s
If there are any unit test functions present in the test files, when you run the benchmark, those will also be executed, causing the entire process to take longer or the benchmark to fail.
To avoid executing any test functions in the test files, pass a regular expression to the -run
flag:
$ go test -bench=. -count 5 -run=^#
The -run
flag is used to specify which unit tests should be executed. By using ^#
as the argument to -run
, we effectively filter out all of the unit test functions.
When benchmarking your code, it’s essential to test how a function behaves when it is presented with a variety of inputs. We’ll utilize the table driven test pattern that is commonly used to write unit tests in Go to specify a variety of inputs. Next, we’ll use the b.Run() method
to create a sub-benchmark for each input:
var table = []struct { input int }{ {input: 100}, {input: 1000}, {input: 74382}, {input: 382399}, } func BenchmarkPrimeNumbers(b *testing.B) { for _, v := range table { b.Run(fmt.Sprintf("input_size_%d", v.input), func(b *testing.B) { for i := 0; i < b.N; i++ { primeNumbers(v.input) } }) } }
When you run the benchmark, the results will be presented in the format shown below. Notice how the name for each sub-benchmark is appended to the main benchmark function name; it’s considered best practice to give each sub-benchmark a distinct name that reflects the input being tested:
$ go test -bench=. BenchmarkPrimeNumbers/input_size_100-4 288234 4071 ns/op BenchmarkPrimeNumbers/input_size_1000-4 14337 82603 ns/op BenchmarkPrimeNumbers/input_size_74382-4 43 27331405 ns/op BenchmarkPrimeNumbers/input_size_382399-4 5 242932020 ns/op
For larger input values, the function required more time to calculate the result, and it completed fewer iterations.
The previous benchmark ran only five times, a sample size too small to trust. For a more accurate result, we can increase the minimum amount of time that the benchmark should run using the -benchtime
flag:
$ go test -bench=. -benchtime=10s BenchmarkPrimeNumbers/input_size_100-4 3010218 4073 ns/op BenchmarkPrimeNumbers/input_size_1000-4 143540 86319 ns/op BenchmarkPrimeNumbers/input_size_74382-4 451 26289573 ns/op BenchmarkPrimeNumbers/input_size_382399-4 43 240926221 ns/op PASS ok github.com/ayoisaiah/random 54.723s
The argument to -benchtime
sets the minimum amount of time that the benchmark function will run. In this case, we set it to ten seconds.
An alternative way to control the amount of time a benchmark should run is by specifying the desired number of iterations for each benchmark. To do so, we’ll pass an input in the form Nx
to -benchtime
, with N
as the desired number:
$ go test -bench=. -benchtime=100x BenchmarkPrimeNumbers/input_size_100-4 100 4905 ns/op BenchmarkPrimeNumbers/input_size_1000-4 100 87004 ns/op BenchmarkPrimeNumbers/input_size_74382-4 100 24832746 ns/op BenchmarkPrimeNumbers/input_size_382399-4 100 241834688 ns/op PASS ok github.com/ayoisaiah/random 26.953s
The Go runtime also tracks memory allocations made by the code being tested, helping you determine if a portion of your code can use memory more efficiently.
To include memory allocation statistics in the benchmark output, add the -benchmem
flag while running the benchmarks:
$ go test -bench=. -benchtime=10s -benchmem BenchmarkPrimeNumbers/input_size_100-4 3034203 4170 ns/op 504 B/op 6 allocs/op BenchmarkPrimeNumbers/input_size_1000-4 138378 83258 ns/op 4088 B/op 9 allocs/op BenchmarkPrimeNumbers/input_size_74382-4 422 26562731 ns/op 287992 B/op 19 allocs/op BenchmarkPrimeNumbers/input_size_382399-4 46 255095050 ns/op 1418496 B/op 25 allocs/op PASS ok github.com/ayoisaiah/random 55.121s
In the output above, the fourth and fifth columns indicate the average number of bytes allocated per operation and the number of allocations per operation, respectively.
If you’ve determined that the acceptable performance threshold is not being met by the function you are benchmarking, the next step is to find a way to make the operation faster.
Depending on the operation in question, there are a couple of different ways to do this. For one, you can try using a more efficient algorithm to achieve the desired result. Alternately, you can execute different parts of the computation concurrently.
In our example, the performance of the primeNumbers()
function is acceptable for small numbers, however, as the input grows, it exhibits exponential behavior. To improve its performance, we can change the implementation to a faster algorithm, like the Sieve of Eratosthenes:
// main.go func sieveOfEratosthenes(max int) []int { b := make([]bool, max) var primes []int for i := 2; i < max; i++ { if b[i] { continue } primes = append(primes, i) for k := i * i; k < max; k += i { b[k] = true } } return primes }
The benchmark for the new function is the same as the BenchmarkPrimeNumbers
function, however, the sieveOfEratosthenes()
function is called instead:
// main_test.go func BenchmarkSieveOfErastosthenes(b *testing.B) { for _, v := range table { b.Run(fmt.Sprintf("input_size_%d", v.input), func(b *testing.B) { for i := 0; i < b.N; i++ { sieveOfEratosthenes(v.input) } }) } }
After running the benchmark, we receive the following results:
$ go test -bench=Sieve BenchmarkSieveOfErastosthenes/input_size_100-4 1538118 764.0 ns/op BenchmarkSieveOfErastosthenes/input_size_1000-4 204426 5378 ns/op BenchmarkSieveOfErastosthenes/input_size_74382-4 2492 421640 ns/op BenchmarkSieveOfErastosthenes/input_size_382399-4 506 2305954 ns/op PASS ok github.com/ayoisaiah/random 5.646s
Upon first glance, we can see that the Sieve of Eratosthenes algorithm is much more performant than the previous algorithm. However, instead of eyeballing the results to compare the performance between runs, we can use a tool like benchstat
, which helps us compute and compare benchmarking statistics.
To compare the output of both implementations of our benchmark with benchstat
, let’s start by storing each in a file. First, run the benchmark for the old primeNumbers()
function implementation and save its output to a file called old.txt
:
$ go test -bench=Prime -count 5 | tee old.txt
The tee
command sends the output of the command to the specified file and prints it to the standard output. Now, we can view the benchmark’s results with benchstat
. First, let’s ensure it’s installed:
$ go install golang.org/x/perf/cmd/benchstat@latest
Then, run the command below:
$ benchstat old.txt name time/op PrimeNumbers/input_size_100-4 3.87µs ± 1% PrimeNumbers/input_size_1000-4 79.1µs ± 1% PrimeNumbers/input_size_74382-4 24.6ms ± 1% PrimeNumbers/input_size_382399-4 233ms ± 2%
benchstat
displays the mean time difference across the samples along with the percentage variation. In my case, the ±
variation was between one and two percent, which is ideal.
Anything greater than five percent suggests that some of the samples are not reliable. In such cases, you should rerun the benchmark, keeping your environment as stable as possible to increase reliability.
Next, change the call to primeNumbers()
in BenchmarkPrimeNumbers()
to sieveOfEratosthenes()
and run the benchmark command again, this time piping the output to a new.txt
file:
$ go test -bench=Prime -count 5 | tee new.txt
After the benchmark finishes running, use benchstat
to compare the results:
$ benchstat old.txt new.txt name old time/op new time/op delta PrimeNumbers/inputsize100-4 3.90µs ± 1% 0.76µs ± 2% -80.48% (p=0.008 n=5+5) PrimeNumbers/inputsize1000-4 79.4µs ± 1% 5.5µs ± 1% -93.11% (p=0.008 n=5+5) PrimeNumbers/inputsize74382-4 25.0ms ± 1% 0.4ms ± 1% -98.47% (p=0.008 n=5+5) PrimeNumbers/inputsize382399-4 236ms ± 1% 2ms ± 0% -99.13% (p=0.008 n=5+5)
The delta
column reports the percentage change in performance, the P-value, and the number of samples that are considered to be valid, n
. If you see an n
value lower than the number of samples taken, it may mean that your environment wasn’t stable enough while the samples were being collected. See the benchstat docs to see the other options available to you.
Benchmarking is a useful tool for measuring the performance of different parts of your code. It allows us to identify potential opportunities for optimization, performance improvements, or regressions after we’ve made a change to the system.
The tools provided by Go for benchmarking are easy to use and reliable. In this article, we’ve only scratched the surface of what is possible with these packages. Thanks for reading, and happy coding!
Install LogRocket via npm or script tag. LogRocket.init()
must be called client-side, not
server-side
$ npm i --save logrocket // Code: import LogRocket from 'logrocket'; LogRocket.init('app/id');
// Add to your HTML: <script src="https://cdn.lr-ingest.com/LogRocket.min.js"></script> <script>window.LogRocket && window.LogRocket.init('app/id');</script>
Hey there, want to help make our blog better?
Join LogRocket’s Content Advisory Board. You’ll help inform the type of content we create and get access to exclusive meetups, social accreditation, and swag.
Sign up nowJavaScript’s Date API has many limitations. Explore alternative libraries like Moment.js, date-fns, and the new Temporal API.
Explore use cases for using npm vs. npx such as long-term dependency management or temporary tasks and running packages on the fly.
Validating and auditing AI-generated code reduces code errors and ensures that code is compliant.
Build a real-time image background remover in Vue using Transformers.js and WebGPU for client-side processing with privacy and efficiency.
2 Replies to "Benchmarking in Golang: Improving function performance"
You are awesome, thanks.
That helps so much.
your argument order to benchstat is reversed. its old and then new. your output is saying that sieve is 11321.31% _slower_ than naive approach.