Puppeteer is a Node.js module built by Google used to emulate the Chrome browser or Chromium in a Node environment.
From the Puppeteer API docs: Puppeteer is a Node library which provides a high-level API to control Chromium or Chrome over the DevTools Protocol.
So basically, Puppeteer is a browser you run on Node.js. It contains APIs that mimic the browser. These APIs enable you to carry out different operations, like:
In this post, we will learn how we can use Puppeteer to generate screenshots from a website URL.
Creating visuals of your webpage is quite easy using the Puppeteer Node module.
First, we install the puppeteer Node module:
npm i puppeteer
Then, we’ll create our .js file and require the “puppeteer” library.
const puppeteer = require("puppeteer")
Now, create a browser context and a new page:
const puppeteer = require("puppeteer") const browser = await puppeteer.launch({ headless: true }) const page = await browser.newPage()
Note: The Puppeteer lib is promise-based. That means its APIs mostly return promise objects.
With _const browser = await puppeteer.launch({ headless: true })_
, we created a new browser instance using the launch
API in the Puppeteer class instance, puppeteer. This actually launches a Chromium instance. The browser is an instance of a browser class.
The setting { headless: true }
means that the browser will be a headless instance of Chromium.
Notice that the launch returns a promise, which resolves to a browser instance. Here, that would be browser
.
_const page = await browser.newPage()_
Browsers can hold so many pages. So this newPage()
method in Browser
creates a new page in a default browser context. The page is an object of a page class.
Now, using the page
object, we will load or navigate to the webpage that we want to take a screenshot of:
await page.goto('https://medium.com')>/pre>
Here, we are loading the Medium home page. The method goto
will resolve when the load
event is fired by the browser, indicating that the page has successfully loaded. Now, with the page medium.com
loaded, we can then take the screenshot.
const screenShot = await page.screenshot({ path: "./", type: "png", fullPage: true })
The screenshot method of the page object does it all. It takes the screenshot of the current page.
The screenshot method takes in some configurations:
_path_
: This indicates the file path where we want to save the image. Here, we will be saving at the current working directory. If there is no path, the image will not be saved to disk.
_type_
: Indicates the type of image encoding to use either png or jpeg.
_fullPage_
: This will make the screenshot the full scrollable size of the page.
There are other settings, including:
_quality_
: The quality of the image, between 0-100. Not applicable to png images.
_omitBackground_
: Hides default white background and allows capturing screenshots with transparency.
_encoding_
: The encoding of the image can be either base64 or binary. Defaults to binary.
The screenshot method returns a promise that will resolve to either a buffer or base64 based on the encoding
property value in the settings. So in our own example here, the screenshot method will return a promise that will resolve to a binary. The screenshot variable will hold the binary image of the Medium frontpage.
With this, our code is complete:
// screenshot.js const puppeteer = require("puppeteer") ( async function() { const browser = await puppeteer.launch({ headless: true }) const page = await browser.newPage() await page.goto('https://medium.com') const screenShot = await page.screenshot({ path: "./", type: "png", fullPage: true }) }() )
We can run the file in our Node.js environment:
node screenshot.js
This will generate a screenshot of Medium and save it in our current directory.
You can substitute the https://medium.com
with the webpage of your choice you want to capture its screenshot.
Without specifying the fullPage
option while taking screenshots with Puppeteer, Puppeteer will simulate a browser window with a default size set to 800Ă—600.
We can change the screenshot size using the setViewport
API in the page class.
await page.setViewport({ width: 1200, height: 1500 })
This will generate a screenshot of the webpage with width 1200px and a height of 1500px.
We can clip a region of the page and take a screenshot of it. In other words, we can take a screenshot of a specific area on a webpage.
This is done by passing a clip
object to the page.screenshot(...)
method.
const screenShot = await page.screenshot({ path: "./", type: "png", clip: { ... } })
The clip
object has the following fields:
_x_
: The top-left of the webpage x-axis of the clip area_y_
: The top-left of the webpage y-axis of the clip area_width_
: The width the clipping area would take_height_
: The height of the clipping areaconst screenShot = await page.screenshot({ path: "./", type: "png", clip: { x: 0, y: 0, width: 360, height: 400 } })
The above code would take a screenshot of the webpage starting from the 0,0 => x,y axis and going 300px right and 400px down.
const screenShot = await page.screenshot({ path: "./", type: "png", clip: { x: 50, y: 60, width: 360, height: 400 } })
This will triangulate the 50,60 => x,y axis on the page and move 360px right and 400px down, and then take a screenshot of the clipped area.
We can omit the default white background Puppeteer gives us in the screenshots by passing the omitBackground
boolean option.
const screenShot = await page.screenshot({ ..., omitBackground: true })
This will take the screenshot of the webpage with a transparent background.
With our current load, a page is deemed to be fully loaded when the load event is fired. The load event is fired when the page is successfully loaded.
Puppeteer gives us more options apart from the load event to indicate to us when navigation is finished.
_domcontentloaded_
: This will tell Puppeteer to fire page load finished when the DOMContentLoaded
event is fired.
_networkidle0_
: This will ensure Puppeteer tells us that a page load is finished when there are no more than 0 network connections for at least 500ms.
_networkidle2_
: This will ensure Puppeteer tells us that a page load is finished when there are no more than 2 network connections for at least 500ms.
The default page load event is the load. We can set the above options in the page.goto(...)
call.
To set the domcontentload
page load event, we do this:
await .goto( "https://medium.com",{ waitUntil: 'domcontentloaded' });
The page load event is set in the waitUntil
field in the options passed to page.goto(...)
. Here, the page load will be deemed complete when the DOMContentLoaded
event is fired.
await .goto( "https://medium.com",{ waitUnitl: 'networkidle0' });
Here, the page load will be deemed complete when there are no network connections for at least 500ms.
await .goto( "https://medium.com",{ waitUnitl: 'networkidle2' });
Here, the page load will be deemed complete when there are no more than 2 network connections for at least 500ms.
To take the screenshot of our webpage in landscape mode, we will pass the isLandscape
option to the page.setViewport(...)
call.
page.setViewport({ ..., isLandscape: true })
This will take the screenshot of a webpage in a landscape mode.
We can extend this implementation to turn it into a service, like an online webpage to image service.
This service will let users take screenshots of their webpage or any selected webpage.
We will implement this service in Node.js.
Here is the backend code:
/** require dependencies */ const express = require("express") const cors = require('cors') const bodyParser = require('body-parser') const helmet = require('helmet') const path = require('path') const puppeteer = require("puppeteer") const app = express() let port = process.env.PORT || 5000 /** set up middlewares */ app.use(cors()) app.use(bodyParser.json({limit: '50mb'})) app.use(helmet()) app.get("/", (request, response) => { response.sendFile(path.join(__dirname, 'index.html')); }); app.get("/style.css", (request, response) => { response.sendFile(path.join(__dirname, 'style.css')); }); app.get('*', function(req, res) { res.sendFile(path.resolve(__dirname, 'index.html')); }); app.use('/static',express.static(path.join(__dirname,'static'))) app.use('/assets',express.static(path.join(__dirname,'assets'))) app.post('/api/screenshot', (req, res, next) => { const { url } = req.body var screenshot = takeScreenshot(url) res.send({result: screenshot }) next() }) async function takeScreenshot(url) { const browser = await puppeteer.launch({ headless: true, args: ['--no-sandbox'] }); const page = await browser.newPage(); await page.goto(url, { waitUntil: 'networkidle0' }); const screenShot = await page.screenshot({ path: "./", type: "png", fullPage: true }) await browser.close(); return screenshot; } /** start server */ app.listen(port, () => { console.log(`Server started at port: ${port}`); });
The takeScreenshot
function is the main code. We’ll get the URL of the webpage we want to screenshot from the request body and pass it to the takeScreenshot
function. A Puppeteer Browser instance is created, and we’ll navigate to the URL. The page is then screenshotted and the screenshot is returned back, which is then sent to the user.
Let’s see the frontend code:
<title>Webpage to Image</title> <style> </style> <body> <header> <div class="title-bar"><h2>Webpage to Image</h2></div> </header> <div class="container"> <div class="info close" id="info"> <h3>Info</h3> </div> <div class="wrapper"> <div class="div-input" style=""> <input id="webpageUrl" type="text" placeholder="Type your webpage URL" /> </div> <div class="div-button"> <button id="webpageButton" onclick="return generateImage(event)"> Generate Image </button> </div> </div> </div> </body> <script src="./axios/axios.min.js"></script> <script> webpageUrl.addEventListener("keydown", (evt) => { if(evt.key == "Enter") generateImage(evt) }) function generateImage(evt) { evt.preventDefault() showLoading(true) disableButton(true) if(webpageUrl.value.length === 0) { showLoading(false) info.innerHTML = ` <h3>Error</h3> Please, Type in the webpage URL. ` info.classList.add("info-danger") info.classList.remove("close") setTimeout(() => { info.classList.add("close") info.classList.remove("info-danger") }, 3000) } else axios.post("api/screenshot", { url: webpageUrl.value } ).then( res => { showLoading(false) const { result } = res.data const blob = new Blob([result], {type: 'image/png'}) const link = document.createElement('a') link.href = window.URL.createObjectURL(blob) link.download = `your-file-name.png` link.click() }).catch(err => { showLoading(false) // err info.innerHTML = ` <h3>Error</h3> ${err} ` info.classList.add("info-danger") info.classList.remove("close") setTimeout(() => { info.classList.add("close") info.classList.remove("info-danger") }, 3000) }) disableButton(false) } function showLoading(show) { if(show == true) { webpageButton.innerHTML = "wait..." } else { webpageButton.innerHTML = "Generate Image" } } function disableButton(disable) { if(disable == true) { webpageButton.setAttribute("disable", true) } else { webpageButton.removeAttribute("disable") } } </script> ...
We have input and a button. We type the webpage we want to screenshot in the input. When clicked, the button will call the backend API at “api/screenshot” and send the webpage URL in the input box along with it. The backend code at “api/screenshot” will generate the screenshot and send it back to the user. The resolved function in the Axios get call will run, and we will programmatically download the generated image to our storage.
It’s very simple.
There are more features you can add to this:
You can set the options based on different options in the Puppeteer APIs.
The sky is the limit.
Puppeteer is freaking awesome, no doubt.
It has so many APIs you can leverage to cook up some cool stuff. Find them here:
puppeteer/docs/api.md at v5.2.1 · puppeteer/puppeteer
Node.js API for Chrome . Contribute to puppeteer/puppeteer development by creating an account on GitHub.
If you have any question regarding this, feel free to comment, email, or DM me.
Deploying a Node-based web app or website is the easy part. Making sure your Node instance continues to serve resources to your app is where things get tougher. If you’re interested in ensuring requests to the backend or third-party services are successful, try LogRocket.
LogRocket is like a DVR for web and mobile apps, recording literally everything that happens while a user interacts with your app. Instead of guessing why problems happen, you can aggregate and report on problematic network requests to quickly understand the root cause.
LogRocket instruments your app to record baseline performance timings such as page load time, time to first byte, slow network requests, and also logs Redux, NgRx, and Vuex actions/state. Start monitoring for free.
Hey there, want to help make our blog better?
Join LogRocket’s Content Advisory Board. You’ll help inform the type of content we create and get access to exclusive meetups, social accreditation, and swag.
Sign up nowLearn how to implement one-way and two-way data binding in Vue.js, using v-model and advanced techniques like defineModel for better apps.
Compare Prisma and Drizzle ORMs to learn their differences, strengths, and weaknesses for data access and migrations.
It’s easy for devs to default to JavaScript to fix every problem. Let’s use the RoLP to find simpler alternatives with HTML and CSS.
Learn how to manage memory leaks in Rust, avoid unsafe behavior, and use tools like weak references to ensure efficient programs.