Ayooluwa Isaiah I'm a software developer from Nigeria with a keen interest in web technologies, security, and performance. I'm currently working on my own products and teaching programming via my website freshman.tech.

Benchmarking in Golang: Improving function performance

7 min read 2105

Benchmarking Golang

A benchmark is a type of function that executes a code segment multiple times and compares each output against a standard, assessing the code’s overall performance level. Golang includes built-in tools for writing benchmarks in the testing package and the go tool, so you can write useful benchmarks without installing any dependencies.

In this tutorial, we’ll introduce some best practices for running consistent and accurate benchmarks in Go, covering the fundamentals of writing benchmark functions and interpreting the results.

To follow along with this tutorial, you’ll need a basic knowledge of the Go syntax and a working installation of Go on your computer. Let’s get started!

Setting the right conditions for benchmarking

For benchmarking to be useful, the results must be consistent and similar for each execution, otherwise, it will be difficult to gauge the true performance of the code being tested.

Benchmarking results can be greatly affected by the state of the machine on which the benchmark is running. The effects of power management, background processes, and thermal management can impact the test results, making them inaccurate and unstable.

Therefore, we need to minimize the environmental impact as much as possible. When possible, you should use either a physical machine or a remote server where nothing else is running to perform your benchmarks.

However, if you don’t have access to a reserved machine, you should close as many programs as possible before running the benchmark, minimizing the effect of other processes on the benchmark’s results.

Additionally, to ensure more stable results, you should run the benchmark several times before recording measurements, ensuring that the system is sufficiently warmed up.

Lastly, it’s crucial to isolate the code being benchmarked from the rest of the program, for example, by mocking network requests.

Writing a benchmark in Golang

Let’s demonstrate the fundamentals of benchmarking in Go by writing a simple benchmark. We’ll determine the performance of the following function, which computes all of the prime numbers between one and an integer:

// main.go
func primeNumbers(max int) []int {
    var primes []int

    for i := 2; i < max; i++ {
        isPrime := true

        for j := 2; j <= int(math.Sqrt(float64(i))); j++ {
            if i%j == 0 {
                isPrime = false
                break
            }
        }

        if isPrime {
            primes = append(primes, i)
        }
    }

    return primes
}

The function above determines if a number is a prime number by checking whether it is divisible by a number between two and its square root. Let’s go ahead and write a benchmark for this function in main_test.go:

package main

import (
    "testing"
)

var num = 1000

func BenchmarkPrimeNumbers(b *testing.B) {
    for i := 0; i < b.N; i++ {
        primeNumbers(num)
    }
}

Like unit tests in Go, benchmark functions are placed in a _test.go file, and each benchmark function is expected to have func BenchmarkXxx(*testing.B) as a signature, with the testing.B type managing the benchmark’s timing.

b.N specifies the number of iterations; the value is not fixed, but dynamically allocated, ensuring that the benchmark runs for at least one second by default.

In the BenchmarkPrimeNumbers() function above, the primeNumbers() function will be executed b.N times until the developer is satisfied with the stability of the benchmark.



Running a benchmark in Go

To run a benchmark in Go, we’ll append the -bench flag to the go test command. The argument to -bench is a regular expression that specifies which benchmarks should be run, which is helpful when you want to run a subset of your benchmark functions.

To run all benchmarks, use -bench=., as shown below:

$ go test -bench=.
goos: linux
goarch: amd64
pkg: github.com/ayoisaiah/random
cpu: Intel(R) Core(TM) i7-7560U CPU @ 2.40GHz
BenchmarkPrimeNumbers-4            14588             82798 ns/op
PASS
ok      github.com/ayoisaiah/random     2.091s

goos, goarch, pkg, and cpu describe the operating system, architecture, package, and CPU specifications, respectively. BenchmarkPrimeNumbers-4 denotes the name of the benchmark function that was run. The -4 suffix denotes the number of CPUs used to run the benchmark, as specified by GOMAXPROCS.

On the right side of the function name, you have two values, 14588 and 82798 ns/op. The former indicates the total number of times the loop was executed, while the latter is the average amount of time each iteration took to complete, expressed in nanoseconds per operation.

On my laptop, the primeNumbers(1000) function ran 14,588 times, and each call took an average of 82,798 nanoseconds to complete. To verify that the benchmark produces a consistent result, you can run it multiple times by passing a number to the -count flag:

$ go test -bench=. -count 5
goos: linux
goarch: amd64
pkg: github.com/ayoisaiah/random
cpu: Intel(R) Core(TM) i7-7560U CPU @ 2.40GHz
BenchmarkPrimeNumbers-4            14485             82484 ns/op
BenchmarkPrimeNumbers-4            14557             82456 ns/op
BenchmarkPrimeNumbers-4            14520             82702 ns/op
BenchmarkPrimeNumbers-4            14407             87850 ns/op
BenchmarkPrimeNumbers-4            14446             82525 ns/op
PASS
ok      github.com/ayoisaiah/random     10.259s

Skipping unit tests

If there are any unit test functions present in the test files, when you run the benchmark, those will also be executed, causing the entire process to take longer or the benchmark to fail.

To avoid executing any test functions in the test files, pass a regular expression to the -run flag:

$ go test -bench=. -count 5 -run=^#

The -run flag is used to specify which unit tests should be executed. By using ^# as the argument to -run, we effectively filter out all of the unit test functions.

Benchmarking with various inputs

When benchmarking your code, it’s essential to test how a function behaves when it is presented with a variety of inputs. We’ll utilize the table driven test pattern that is commonly used to write unit tests in Go to specify a variety of inputs. Next, we’ll use the b.Run() method to create a sub-benchmark for each input:

var table = []struct {
    input int
}{
    {input: 100},
    {input: 1000},
    {input: 74382},
    {input: 382399},
}

func BenchmarkPrimeNumbers(b *testing.B) {
    for _, v := range table {
        b.Run(fmt.Sprintf("input_size_%d", v.input), func(b *testing.B) {
            for i := 0; i < b.N; i++ {
                primeNumbers(v.input)
            }
        })
    }
}

When you run the benchmark, the results will be presented in the format shown below. Notice how the name for each sub-benchmark is appended to the main benchmark function name; it’s considered best practice to give each sub-benchmark a distinct name that reflects the input being tested:

$ go test -bench=.
BenchmarkPrimeNumbers/input_size_100-4            288234              4071 ns/op
BenchmarkPrimeNumbers/input_size_1000-4            14337             82603 ns/op
BenchmarkPrimeNumbers/input_size_74382-4              43          27331405 ns/op
BenchmarkPrimeNumbers/input_size_382399-4              5         242932020 ns/op

For larger input values, the function required more time to calculate the result, and it completed fewer iterations.

Adjusting the minimum time

The previous benchmark ran only five times, a sample size too small to trust. For a more accurate result, we can increase the minimum amount of time that the benchmark should run using the -benchtime flag:

$ go test -bench=. -benchtime=10s
BenchmarkPrimeNumbers/input_size_100-4           3010218              4073 ns/op
BenchmarkPrimeNumbers/input_size_1000-4           143540             86319 ns/op
BenchmarkPrimeNumbers/input_size_74382-4             451          26289573 ns/op
BenchmarkPrimeNumbers/input_size_382399-4             43         240926221 ns/op
PASS
ok      github.com/ayoisaiah/random     54.723s

The argument to -benchtime sets the minimum amount of time that the benchmark function will run. In this case, we set it to ten seconds.


More great articles from LogRocket:


An alternative way to control the amount of time a benchmark should run is by specifying the desired number of iterations for each benchmark. To do so, we’ll pass an input in the form Nx to -benchtime, with N as the desired number:

$ go test -bench=. -benchtime=100x
BenchmarkPrimeNumbers/input_size_100-4               100              4905 ns/op
BenchmarkPrimeNumbers/input_size_1000-4              100             87004 ns/op
BenchmarkPrimeNumbers/input_size_74382-4             100          24832746 ns/op
BenchmarkPrimeNumbers/input_size_382399-4            100         241834688 ns/op
PASS
ok      github.com/ayoisaiah/random     26.953s

Display memory allocation statistics

The Go runtime also tracks memory allocations made by the code being tested, helping you determine if a portion of your code can use memory more efficiently.

To include memory allocation statistics in the benchmark output, add the -benchmem flag while running the benchmarks:

$ go test -bench=. -benchtime=10s -benchmem
BenchmarkPrimeNumbers/input_size_100-4           3034203              4170 ns/op             504 B/op          6 allocs/op
BenchmarkPrimeNumbers/input_size_1000-4           138378             83258 ns/op            4088 B/op          9 allocs/op
BenchmarkPrimeNumbers/input_size_74382-4             422          26562731 ns/op          287992 B/op         19 allocs/op
BenchmarkPrimeNumbers/input_size_382399-4             46         255095050 ns/op         1418496 B/op         25 allocs/op
PASS
ok      github.com/ayoisaiah/random     55.121s

In the output above, the fourth and fifth columns indicate the average number of bytes allocated per operation and the number of allocations per operation, respectively.

Making your code faster

If you’ve determined that the acceptable performance threshold is not being met by the function you are benchmarking, the next step is to find a way to make the operation faster.

Depending on the operation in question, there are a couple of different ways to do this. For one, you can try using a more efficient algorithm to achieve the desired result. Alternately, you can execute different parts of the computation concurrently.

In our example, the performance of the primeNumbers() function is acceptable for small numbers, however, as the input grows, it exhibits exponential behavior. To improve its performance, we can change the implementation to a faster algorithm, like the Sieve of Eratosthenes:

// main.go
func sieveOfEratosthenes(max int) []int {
    b := make([]bool, max)

    var primes []int

    for i := 2; i < max; i++ {
        if b[i] {
            continue
        }

        primes = append(primes, i)

        for k := i * i; k < max; k += i {
            b[k] = true
        }
    }

    return primes
}

The benchmark for the new function is the same as the BenchmarkPrimeNumbers function, however, the sieveOfEratosthenes() function is called instead:

// main_test.go
func BenchmarkSieveOfErastosthenes(b *testing.B) {
    for _, v := range table {
        b.Run(fmt.Sprintf("input_size_%d", v.input), func(b *testing.B) {
            for i := 0; i < b.N; i++ {
                sieveOfEratosthenes(v.input)
            }
        })
    }
}

After running the benchmark, we receive the following results:

$ go test -bench=Sieve
BenchmarkSieveOfErastosthenes/input_size_100-4           1538118               764.0 ns/op
BenchmarkSieveOfErastosthenes/input_size_1000-4           204426              5378 ns/op
BenchmarkSieveOfErastosthenes/input_size_74382-4            2492            421640 ns/op
BenchmarkSieveOfErastosthenes/input_size_382399-4            506           2305954 ns/op
PASS
ok      github.com/ayoisaiah/random     5.646s

Upon first glance, we can see that the Sieve of Eratosthenes algorithm is much more performant than the previous algorithm. However, instead of eyeballing the results to compare the performance between runs, we can use a tool like benchstat, which helps us compute and compare benchmarking statistics.

Comparing benchmark results

To compare the output of both implementations of our benchmark with benchstat, let’s start by storing each in a file. First, run the benchmark for the old primeNumbers() function implementation and save its output to a file called old.txt:

$ go test -bench=Prime -count 5 | tee old.txt

The tee command sends the output of the command to the specified file and prints it to the standard output. Now, we can view the benchmark’s results with benchstat. First, let’s ensure it’s installed:

$ go install golang.org/x/perf/cmd/[email protected]

Then, run the command below:

$ benchstat old.txt
name                              time/op
PrimeNumbers/input_size_100-4     3.87µs ± 1%
PrimeNumbers/input_size_1000-4    79.1µs ± 1%
PrimeNumbers/input_size_74382-4   24.6ms ± 1%
PrimeNumbers/input_size_382399-4   233ms ± 2%

benchstat displays the mean time difference across the samples along with the percentage variation. In my case, the ± variation was between one and two percent, which is ideal.

Anything greater than five percent suggests that some of the samples are not reliable. In such cases, you should rerun the benchmark, keeping your environment as stable as possible to increase reliability.

Next, change the call to primeNumbers() in BenchmarkPrimeNumbers() to sieveOfEratosthenes() and run the benchmark command again, this time piping the output to a new.txt file:

$ go test -bench=Prime -count 5 | tee new.txt

After the benchmark finishes running, use benchstat to compare the results:

$ benchstat old.txt new.txt
name                              old time/op  new time/op  delta
PrimeNumbers/inputsize100-4     3.90µs ± 1%  0.76µs ± 2%  -80.48%  (p=0.008 n=5+5)
PrimeNumbers/inputsize1000-4    79.4µs ± 1%   5.5µs ± 1%  -93.11%  (p=0.008 n=5+5)
PrimeNumbers/inputsize74382-4   25.0ms ± 1%   0.4ms ± 1%  -98.47%  (p=0.008 n=5+5)
PrimeNumbers/inputsize382399-4   236ms ± 1%     2ms ± 0%  -99.13%  (p=0.008 n=5+5)

The delta column reports the percentage change in performance, the P-value, and the number of samples that are considered to be valid, n. If you see an n value lower than the number of samples taken, it may mean that your environment wasn’t stable enough while the samples were being collected. See the benchstat docs to see the other options available to you.

Conclusion

Benchmarking is a useful tool for measuring the performance of different parts of your code. It allows us to identify potential opportunities for optimization, performance improvements, or regressions after we’ve made a change to the system.

The tools provided by Go for benchmarking are easy to use and reliable. In this article, we’ve only scratched the surface of what is possible with these packages. Thanks for reading, and happy coding!

Get setup with LogRocket's modern error tracking in minutes:

  1. Visit https://logrocket.com/signup/ to get an app ID.
  2. Install LogRocket via NPM or script tag. LogRocket.init() must be called client-side, not server-side.
  3. $ npm i --save logrocket 

    // Code:

    import LogRocket from 'logrocket';
    LogRocket.init('app/id');
    Add to your HTML:

    <script src="https://cdn.lr-ingest.com/LogRocket.min.js"></script>
    <script>window.LogRocket && window.LogRocket.init('app/id');</script>
  4. (Optional) Install plugins for deeper integrations with your stack:
    • Redux middleware
    • ngrx middleware
    • Vuex plugin
Get started now
Ayooluwa Isaiah I'm a software developer from Nigeria with a keen interest in web technologies, security, and performance. I'm currently working on my own products and teaching programming via my website freshman.tech.

2 Replies to “Benchmarking in Golang: Improving function performance”

  1. your argument order to benchstat is reversed. its old and then new. your output is saying that sieve is 11321.31% _slower_ than naive approach.

Leave a Reply