2021-12-24

#go

Emmanuel John

83919

Dec 24, 2021 ⋅ 6 min read

Building a web scraper in Go with Colly

Emmanuel John I'm a full-stack software developer, mentor, and writer. I am an open source enthusiast. In my spare time, I enjoy watching sci-fi movies and cheering for Arsenal FC.

Introduction

When building applications, you might need to extract data from some website or other source to integrate with your application. Some websites expose an API you can use to get this information while some do not. In this case, you might need to extract the data yourself from the website. This is known as web scraping.

Web scraping is extracting data from websites by getting the data, selecting the relevant parts, and presenting them in a readable or parsable format.

In this tutorial, we will be taking a look at a Go package that allows us to build web scrapers, Colly, and we will be building a basic web scraper that gets product information from an ecommerce store and saves the data to a JSON file. Without further ado, let’s get started!

An intro to Colly

Colly is a Go framework that allows you to create web scrapers, crawlers, or spiders. According to the official documentation, Colly allows you to easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing, or archiving. Here are some of the features of Colly:

Speed: Colly is fast. >1k request/sec on a single core
Sync/async/parallel scraping
Support for caching
Support for robots.txt

Here’s a link to the Colly official website to learn more about it. Now that we know a bit about Colly, let’s build a web scraper with it.

Prerequisites

To follow along with this tutorial, you need to have Go installed on your local machine and you need to have at least a basic knowledge of Go. Follow the steps here to install it.

Make sure you can run Go commands in your terminal. To check this, type in the command and go version in the terminal. You should get an output similar to this

Check Go Commands

Diving into the code

Alright, let’s start writing some code. Create a file called main.go and add the following code:

package main

import (
   "github.com/gocolly/colly"
)

func main() {
   c := colly.NewCollector()
   c.Visit("https://jumia.com.ng")
}

Let’s take a look at what each line of code does. First, the package main directive tells Go that this file is part of the main package. Next, we are importing Colly, and finally, we have our main function. The main function is the entry point of any Go program, and here we are instantiating a new instance of a Colly collector object.

The collector object is the heart of web scraping with Colly. It allows you to trigger certain functions whenever an event happens, such as a request successfully completes, a response is received, etc.

Let’s take a look at some of these methods in action. Modify your main.go file to this:

package main

import (
   "fmt"
   "time"

   "github.com/gocolly/colly"
)

func main() {
   c := colly.NewCollector()
   c.SetRequestTimeout(120 * time.Second)
   c.OnRequest(func(r *colly.Request) {
       fmt.Println("Visiting", r.URL)
   })

   c.OnResponse(func(r *colly.Response) {
       fmt.Println("Got a response from", r.Request.URL)
   })

   c.OnError(func(r *colly.Response, e error) {
       fmt.Println("Got this error:", e)
   })

   c.Visit("https://jumia.com.ng/")
}

First, we import the Go fmt package that allows us to print text to the console. We are also importing the time package. This allows us to increase the timeout duration of Colly to prevent our web scraper from failing too quickly.

Next, in our main method, we set the request timeout to 120 seconds and we call three callback functions.

The first is OnRequest. This callback runs whenever Colly makes a request. Here we are just printing out "Visiting" along with the request URL.

The next is OnResponse. This callback runs whenever Colly receives a response. We are printing out "Got a response from" along with the request URL as well.

Over 200k developers use LogRocket to create better digital experiences

Learn more →

The final call back we have is OnError. This runs whenever Colly encounters an error while making the request.

Before you run this, here are a couple of things you have to do:

First, initialize Go modules in the current directory. To do this, use the go mod init command:

go mod init Command

Next, run go mod tidy to fetch all dependencies:

go mod tidy Command

Now, let’s test our code so far. Run go run main.go to run the Go program:

go run main Command

As you have seen, we have successfully made a request to jumia.com.ng and we have gotten a response.

Analyzing the Jumia website

Alright, we have set up the basics of our web scraper, but before we go on, let’s analyze the website we are going to scrape. Navigate to the URL https://jumia.com.ng in your browser and let’s take a look at the DOM structure.

Jumia Website

As you can see, the website has a bunch of cards with product information. Let’s inspect these cards in our browser’s dev tools. Open the dev tools by right-clicking on the cards and clicking Inspect or by clicking Shift+Ctrl+J (on Windows) or option+command+J (on Mac).

Inspect Jumia Website

From the above, we can see that a single product card is an a tag with a class of core. This has various div elements nested within with classes of name, prc, and tag _dsct. These divs contain the product name, price, and discount respectively. In Colly, we can use CSS selectors to select these elements and extract the tags.

Now, let’s define the structure of a single product. Above your main method, add the following code:

type Product struct {
   Name     string
   Image    string
   Price    string
   Url      string
   Discount string
}

Here, we are defining a struct to hold the name, image (URL), price, URL, and discount of each product. Now, modify your main method to this:

func main() {
   c := colly.NewCollector()
   c.SetRequestTimeout(120 * time.Second)
   products := make([]Product, 0)

   // Callbacks

   c.OnHTML("a.core", func(e *colly.HTMLElement) {
       e.ForEach("div.name", func(i int, h *colly.HTMLElement) {
           item := Product{}
           item.Name = h.Text
           item.Image = e.ChildAttr("img", "data-src")
           item.Price = e.Attr("data-price")
           item.Url = "https://jumia.com.ng" + e.Attr("href")
           item.Discount = e.ChildText("div.tag._dsct")
           products = append(products, item)
       })

   })

   c.OnRequest(func(r *colly.Request) {
       fmt.Println("Visiting", r.URL)
   })

   c.OnResponse(func(r *colly.Response) {
       fmt.Println("Got a response from", r.Request.URL)
   })

   c.OnError(func(r *colly.Response, e error) {
       fmt.Println("Got this error:", e)
   })

   c.OnScraped(func(r *colly.Response) {
       fmt.Println("Finished", r.Request.URL)
       js, err := json.MarshalIndent(products, "", "    ")
       if err != nil {
           log.Fatal(err)
       }
       fmt.Println("Writing data to file")
       if err := os.WriteFile("products.json", js, 0664); err == nil {
           fmt.Println("Data written to file successfully")
       }

   })

   c.Visit("https://jumia.com.ng/")
}

Wow, a lot is going on here. Let’s take a look at what this code is doing.

First, we create an array of products and assign it to the products variable.

Next, we add two more callbacks: OnHTML and OnScraped.

The OnHTML callback runs when the web scraper receives an HTML response. It accepts two arguments: the CSS selector and the actual function to run. This callback selects the elements with the CSS selector and calls the function defined in the second parameter on the response.

The function gets passed the HTML element returned from the CSS selector and performs some operations on it. Here, we are selecting all a elements with a class name of core. Then we loop through the results and again select all divs nested within it with a class of name. From there, we create an instance of the Product struct and assign its name to be the text gotten from the div.

Running our program

Now that the code is all done, let’s run our program. Before we do this though, here’s the full code as a reference:

package main

import (
   "encoding/json"
   "fmt"
   "log"
   "os"
   "time"

   "github.com/gocolly/colly"
)

type Product struct {
   Name     string
   Image    string
   Price    string
   Url      string
   Discount string
}

func main() {
   c := colly.NewCollector()
   c.SetRequestTimeout(120 * time.Second)
   products := make([]Product, 0)

   // Callbacks

   c.OnHTML("a.core", func(e *colly.HTMLElement) {
       e.ForEach("div.name", func(i int, h *colly.HTMLElement) {
           item := Product{}
           item.Name = h.Text
           item.Image = e.ChildAttr("img", "data-src")
           item.Price = e.Attr("data-price")
           item.Url = "https://jumia.com.ng" + e.Attr("href")
           item.Discount = e.ChildText("div.tag._dsct")
           products = append(products, item)
       })

   })

   c.OnRequest(func(r *colly.Request) {
       fmt.Println("Visiting", r.URL)
   })

   c.OnResponse(func(r *colly.Response) {
       fmt.Println("Got a response from", r.Request.URL)
   })

   c.OnError(func(r *colly.Response, e error) {
       fmt.Println("Got this error:", e)
   })

   c.OnScraped(func(r *colly.Response) {
       fmt.Println("Finished", r.Request.URL)
       js, err := json.MarshalIndent(products, "", "    ")
       if err != nil {
           log.Fatal(err)
       }
       fmt.Println("Writing data to file")
       if err := os.WriteFile("products.json", js, 0664); err == nil {
           fmt.Println("Data written to file successfully")
       }

   })

   c.Visit("https://jumia.com.ng/")
}

In your terminal, run the command go run main.go.

go run main Command

Great! It works! Now, you should see a new file has been created called products.json.

products.json File

Open this file and you will see the scrape results.

Wrapping up

In this article, we have successfully built a web scraper with Go. We looked at how we can scrape product information from an ecommerce store. I hope you learned a lot and will be applying this in your personal projects.

Get set up with LogRocket's modern error tracking in minutes:

Visit https://logrocket.com/signup/ to get an app ID

Install LogRocket via npm or script tag. LogRocket.init() must be called client-side, not server-side

npm
Script tag

$ npm i --save logrocket 

// Code:

import LogRocket from 'logrocket'; 
LogRocket.init('app/id');

// Add to your HTML:

<script src="https://cdn.lr-ingest.com/LogRocket.min.js"></script>
<script>window.LogRocket && window.LogRocket.init('app/id');</script>

(Optional) Install plugins for deeper integrations with your stack:
- Redux middleware
- NgRx middleware
- Vuex plugin

Get started now

AI dev tool power rankings & comparison [July 2025 edition]

Which AI frontend dev tool reigns supreme in July 2025? Check out our power rankings and use our interactive comparison tool to find out.

Chizaram Ken

Jul 2, 2025 ⋅ 3 min read

How API client automation can save you hours in development

Learn how OpenAPI can automate API client generation to save time, reduce bugs, and streamline how your frontend app talks to backend APIs.

Lewis Cianci

Jul 1, 2025 ⋅ 7 min read

SOLID series: Understanding the Interface Segregation Principle (ISP)

Discover how the Interface Segregation Principle (ISP) keeps your code lean, modular, and maintainable using real-world analogies and practical examples.

Oyinkansola Awosan

Jun 30, 2025 ⋅ 7 min read

How HTML’s `<selectedcontent>` element improves dropdowns

is an experimental HTML element that gives developers control over how a selected option is displayed, using just HTML and CSS.

Temitope Oyedele

Jun 27, 2025 ⋅ 6 min read

View all posts

One Reply to "Building a web scraper in Go with Colly"

ffab says:

December 27, 2021 at 5:12 pm

Thanks for this tutorial!!! Awesome! There is a typo in the code after you define the structure of the data, instead of func there is just “unc”

Reply

Advisory boards aren’t only for executives. Join the LogRocket Content Advisory Board today →

Building a web scraper in Go with Colly

Introduction

An intro to Colly

Prerequisites

Diving into the code

Over 200k developers use LogRocket to create better digital experiences

Analyzing the Jumia website

More great articles from LogRocket:

Running our program

Wrapping up

Get set up with LogRocket's modern error tracking in minutes:

Stop guessing about your digital experience with LogRocket

Recent posts:

AI dev tool power rankings & comparison [July 2025 edition]

How API client automation can save you hours in development

SOLID series: Understanding the Interface Segregation Principle (ISP)

How HTML’s `<selectedcontent>` element improves dropdowns

One Reply to "Building a web scraper in Go with Colly"

Leave a ReplyCancel reply

Advisory boards aren’t only for executives. Join the LogRocket Content Advisory Board today →

Introduction

An intro to Colly

Prerequisites

Diving into the code

Over 200k developers use LogRocket to create better digital experiences

Analyzing the Jumia website

More great articles from LogRocket:

Running our program

Wrapping up

Get set up with LogRocket's modern error tracking in minutes:

Stop guessing about your digital experience with LogRocket

Recent posts:

AI dev tool power rankings & comparison [July 2025 edition]

How API client automation can save you hours in development

SOLID series: Understanding the Interface Segregation Principle (ISP)

​​How HTML’s <selectedcontent> element improves dropdowns

One Reply to "Building a web scraper in Go with Colly"

Leave a ReplyCancel reply

How HTML’s `<selectedcontent>` element improves dropdowns