Everyone acquainted with data science knows that Jupyter Notebooks are the way to go. They easily allow you to mix Markdown with actual code, creating a lively environment for research and learning. Code becomes user-friendly and nicely formatted — write about it and generate dynamic charts, tables, and images on the go.
Writing Notebooks is so good that it is only natural to imagine that you might want to share them on the internet. Surely, you can host it in GitHub or even in Google Colab, but that will require a running kernel, and it’s definitely not as friendly as a good ol’ webpage.
Before we go any further, it’s important to understand that a Jupyter Notebook is nothing more than a collection of JSON objects containing inputs, outputs, and tons of metadata. It then constructs the outputs and can easily be converted into different formats (such as HTML).
Knowing that Notebooks can become an HTML document is all we need — what remains is finding a way to automate this process so a .ipynb
file can become a static page on the internet. My solution to this problem is to use GatsbyJS — notably, one of the best static site generators out there, if not the single best.
Gatsby easily sources data from different formats — JSON, Markdown, YAML, you name it — and statically generate webpages that you can host on the world wide web. The final piece then becomes: instead of transforming Markdown into a post, do the same with a .ipynb
file. The goal of this post is to walk you through this process.
A quick search on the web will show you gatsby-transformer-ipynb. Basically, this is a Gatsby plugin that is able to parse the Notebook file in a way that we can access it later in our GraphQL queries. It’s almost too good to be true!
And, in fact, it is. The hard work was done by the fine folks of nteract. However, the plugin hasn’t been maintained in a while, and things don’t simply work out of the box — not to mention the lack of customization that one would expect from a plugin.
I’ll spare you the boring stuff, but after fussing around the dark corners of GitHub, and with significant help from this post by Specific Solutions, I managed to create my own fork of gatsby-transformer-ipynb, which solves my problems and will suffice for the purpose of this post.
Note, however, that I have no intention of become an active maintainer, and most of what I’ve done was solely to get what I need to work — use it at your own risk!
Enough with the preambles, let’s get to some code.
Firstly, the source code for what we are going to build can be found here on GitHub. We’ll start by creating a Gatsby project. Make sure you have Gatsby installed, and create a new project by running:
gatsby new jupyter-blog cd jupyter-blog
Run gatsby develop
and go to http://localhost:8000/
to make sure everything is working fine.
Since Jupyter Notebooks will be the data source for our brand-new blog, we need to start adding content. Within your project folder, go to src
and create a notebooks
folder. We’ll make sure to read from this folder later.
It’s time to create our first Notebook. For the purposes of this tutorial, I’ll use this simple Notebook as a base. You can see the dynamic output in GitHub, but feel free to use whichever you want.
In any case, it’s worth mentioning that some rich outputs such as dynamic charts generated by Plotly may need extra care — let me know if you want me to cover that in a later post! To keep this post short, however, we’ll handle only static images, tables, and Markdown.
Now that you have a Gatsby project with data, the next step is to query it using GraphQL.
One of the biggest advantages of Gatsby is flexibility when sourcing data. Virtually anything you want can become a data source that can be used to generate static content.
As mentioned above, we’ll be using my own version of the transformer. Go ahead and install it:
yarn add @rafaelquintanilha/gatsby-transformer-ipynb
The next step is to configure the plugins. In gatsby-config.js
, add the following to your plugins
array (you can always check GitHub when in doubt):
... { resolve: `gatsby-source-filesystem`, options: { name: `notebooks`, path: `${__dirname}/src/notebooks`, ignore: [`**/.ipynb_checkpoints`], }, }, { resolve: `@rafaelquintanilha/gatsby-transformer-ipynb`, options: { notebookProps: { displayOrder: ["image/png", "text/html", "text/plain"], showPrompt: false, }, }, }, ...
Let’s break it down.
First, we add a gatsby-source-filesystem
option in the array. We are telling Gatsby to look for files in src/notebooks
, where our .ipynb
files live. Next, we are configuring the transformer and setting some props:
displayOrder
– MIME type of the outputs we are displayingshowPrompt
– whether the prompt is displayedWhile prompts make sense in Notebooks, in static pages, they lose their purpose. For that matter, we will hide them in order to have clear content.
Time to check whether everything went according to plan. Open GraphiQL by going to http://localhost:8000/___graphql
and run the following query:
query MyQuery { allJupyterNotebook { nodes { html } } }
Success! Note how the HTML of our notebooks was generated. All that is left is to inject this HTML into a React component and our process will be complete.
The worst is behind us now. The next step is to query this data in gatsby-node.js
so we can generate static pages for each Notebook in src/notebooks
.
Note, however, that we need to add additional metadata to our Notebook, e.g., author and post title. There are several ways of doing it, and the simplest is probably to take advantage of the fact that .ipynb
files are JSON and use their own metadata
field. Open the .ipynb
and add the info you need:
{ "metadata": { "author": "Rafael Quintanilha", "title": "My First Jupyter Post", "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4-final" }, "orig_nbformat": 2, "kernelspec": { "name": "python3", "display_name": "Python 3" } }, "nbformat": 4, "nbformat_minor": 2, "cells": [ ... ] }
Pro tip: If you’re using VS Code, opening the file will probably launch the Jupyter kernel. You can disable it in the configs to edit the raw content, but I usually just open the file with another editor (such as gedit or Notepad++).
The process now is exactly the same for any data source with Gatsby. We’ll query the data in gatsby-node.js
and pass the relevant info to a post template, which, in turn, will become a unique page in our domain.
Before getting to that, however, open gatsby-node.js
and add the following:
exports.onCreateNode = ({ node, actions }) => { const { createNodeField } = actions if (node.internal.type === "JupyterNotebook") { createNodeField({ name: "slug", node, value: node.json.metadata.title .split(" ") .map(token => token.toLowerCase()) .join("-"), }) } }
The above excerpt will, for every node created in GraphQL, check those that are a Jupyter Notebook and extend them with a new field, slug
. We are using a naive approach here, but you can use a robust library such as slugify. The new field will be queried and used to generate the post path. In the same file, add the following:
const path = require(`path`); exports.createPages = async ({ graphql, actions: { createPage } }) => { const blogPostTemplate = path.resolve(`src/templates/BlogPost.js`); const results = await graphql( ` { allJupyterNotebook() { nodes { fields { slug } } } } ` ); const posts = results.data.allJupyterNotebook.nodes; posts.forEach((post) => { createPage({ path: post.fields.slug, component: blogPostTemplate, context: { slug: post.fields.slug, }, }); }); };
This basically queries data by slug and sends them to BlogPost.js
. Let’s create it now:
import React from "react" import { graphql } from "gatsby" import SEO from "../components/seo" const BlogPost = ({ data: { jupyterNotebook: { json: { metadata }, html, }, }, }) => { return ( <div> <SEO title={metadata.title} /> <h1>{metadata.title}</h1> <p>Written by {metadata.author}</p> <div dangerouslySetInnerHTML={{ __html: html }} /> </div> ) } export default BlogPost export const query = graphql` query BlogPostBySlug($slug: String!) { jupyterNotebook(fields: { slug: { eq: $slug } }) { json { metadata { title author } } html } } `
And that’s it! Hop over to http://localhost:8000/my-first-jupyter-post
and see your Notebook as a static HTML page.
As you can see, a lot can be improved upon in terms of styling and design. This is beyond the scope of this post, but as a hint, you can use CSS Modules to enhance the layout and remove unnecessary stdout (text output that you don’t care about in a blog post). Create BlogPost.module.css
and add the following:
.content { max-width: 900px; margin-left: auto; margin-right: auto; padding: 40px 20px; } .content :global(.nteract-display-area-stdout), .content :global(.nteract-outputs > .cell_display > pre) { display: none; } .content :global(.nteract-outputs > .cell_display > img) { display: block; } .content :global(.input-container) { margin-bottom: 20px; } .content :global(.input-container pre.input) { border-radius: 10px !important; padding: 1em !important; } .content :global(.input-container code) { line-height: 1.5 !important; font-size: 0.85rem !important; } .content :global(.input-container code:empty) { display: none; } @media only screen and (max-width: 940px) { .content { max-width: 100%; padding-left: 20px; padding-right: 20px; box-sizing: border-box; } }
Now go back to BlogPost.js
and add the class to our div:
... import css from "./BlogPost.module.css" ... return ( <div className={css['content']}> ... </div> );
Note how much cleaner it looks now. The final result (with minor tweaks) is hosted in Netlify. All changes are in the source code.
Transforming Jupyter Notebooks into HTML pages is not complicated but does involve a lot of small steps and adjustments. Hopefully, this post is a guide on how to get started with it.
There are tons of changes and improvements that can be done, like supporting rich outputs (such as a dynamic chart), improving mobile experience, better metadata management, and more.
Notebooks are versatile and fun to work with, and automatically converting them into a webpage is a very nice feature of them.
Install LogRocket via npm or script tag. LogRocket.init()
must be called client-side, not
server-side
$ npm i --save logrocket // Code: import LogRocket from 'logrocket'; LogRocket.init('app/id');
// Add to your HTML: <script src="https://cdn.lr-ingest.com/LogRocket.min.js"></script> <script>window.LogRocket && window.LogRocket.init('app/id');</script>
Hey there, want to help make our blog better?
Join LogRocket’s Content Advisory Board. You’ll help inform the type of content we create and get access to exclusive meetups, social accreditation, and swag.
Sign up nowBuild scalable admin dashboards with Filament and Laravel using Form Builder, Notifications, and Actions for clean, interactive panels.
Break down the parts of a URL and explore APIs for working with them in JavaScript, parsing them, building query strings, checking their validity, etc.
In this guide, explore lazy loading and error loading as two techniques for fetching data in React apps.
Deno is a popular JavaScript runtime, and it recently launched version 2.0 with several new features, bug fixes, and improvements […]