Web development has taken a fascinating turn with the introduction of WebGPU, a new API that allows web applications to directly access a device’s Graphics Processing Unit (GPU). This development is significant, as GPUs excel at complex computations.
One project that illustrates the potential of WebGPU is WebGPT. It’s a simple application written in JavaScript and HTML, built to showcase the capability of the WebGPU API.
In this post, we’ll discuss why WebGPT is important and how to implement it both locally and in-browser. Let’s go!
Jump ahead:
- What are WebGPT and WebGPU?
- Implementing WebGPT
- Using custom WebGPT models
- Challenges and limitations of WebGPT
- The future of GPT and other transformer models
What are WebGPT and WebGPU?
Before we delve into the practical implementation of WebGPT, let’s briefly examine how it works under the hood.
WebGPT is a JavaScript and HTML implementation of a transformer model, a specific machine-learning model designed to process sequence data efficiently. In natural language processing (NLP), sequence data often refers to text, where the order of words and characters is crucial to their meaning; the parts of a sequence are as important as the whole.
Transformer models are machine learning models that excel at handling NLP sequence data. These models form the basis for many state-of-the-art natural language processing models, including GPT (Generative Pretrained Transformer).
WebGPT’s transformer model is designed to work with WebGPU, an API that allows web applications to access and use a device’s GPU. GPUs are particularly good at performing the type of parallel computations that machine learning models require, making them a powerful resource for WebGPT.
Before WebGPU, applications had to rely primarily on the device’s central processing unit (CPU) or older, less efficient APIs like WebGL. In contrast, WebGPT uses a transformer model explicitly designed to function in the browser using the WebGPU API.
When WebGPT receives input, it uses its transformer model to process the data. It can perform computations locally on the user’s device, thanks to the WebGPU API. Then, results are returned directly in the browser, leading to fast and efficient execution.
Bringing such powerful machine learning models to the browser has profound implications for web development, including:
- Real-time data processing: There’s potential for real-time data processing with minimal latency when computations can be done on the client side. This could transform user experiences across a range of applications, from interactive tools and games to real-time analytics
- Enhanced privacy: Since data processing happens locally on the user’s device, sending potentially sensitive data to a server is unnecessary. This could be a game-changer for applications that deal with personal or sensitive data, bolstering user trust and privacy
- Cost efficiency: Companies could save on server costs by shifting the computational load from the server to the client side. This could make advanced machine learning features accessible to smaller companies or individual developers
Implementing WebGPT
WebGPT is designed to be simple to use: it only requires a set of HTML and JavaScript files to function. However, since WebGPU is a fairly new technology, you need a browser compatible with WebGPU.
As of July 2023, Chrome v113 supports WebGPU. An alternative is to install Chrome Canary or Edge Canary to ensure compatibility.
Running WebGPT in the browser
You can try out WebGPT directly on its demo website at https://www.kmeans.org. Loading model weights remotely can be slower than loading them locally, so for a more responsive experience, it’s recommended to run WebGPT locally when possible.
Running WebGPT locally
To run WebGPT locally, follow these steps:
- Clone the WebGPT repository: You can clone the repository by running the following in your terminal:
git clone https://github.com/0hq/WebGPT.git
- Install Git LFS: After cloning the repository, you need to download the model files using Git LFS, a Git extension that allows you to store large files in a Git repository. Install Git LFS on your local machine, then navigate to the WebGPT directory in your terminal and run:
git lfs install
- Download the model files: After that, run the following command to download the model files:
git lfs pull
- Launch the WebGPT files on a local server: You can use a simple HTTP server for this or a tool like Live Server for Visual Studio Code
- Open the WebGPT page in your browser: Navigate to the URL of the local server you’re running WebGPT on. You should see a page that looks like this:
Our WebGPT page
Click any of the Load Model buttons to load the model weights. After that, you can enter text into the input box and click Generate to generate text based on the input.

Using custom WebGPT models
WebGPT has two built-in models: a small GPT-Shakespeare model and GPT-2 with 117 million parameters. If you want to use a custom model, check the other/conversion_scripts
directory in the repository for scripts to convert PyTorch models into a format that WebGPT can use.
Here’s what our directory looks like:

Challenges and limitations of WebGPU
Since WebGPT is built on WebGPU, it’s important to understand the challenges and limitations of WebGPU. While WebGPU is a promising technology, it’s still a relatively new API, and as such it has some challenges to overcome. Some of these include:
- Lack of browser support: Not all browsers currently support WebGPU, and even those that do may not have full support. This can make it difficult to develop and deploy WebGPU applications, let alone deploy them for public use
- Complexity: WebGPU is a complex API, and it can be difficult to learn and use. This can be a barrier to entry for developers who are not familiar with low-level graphics APIs
- Performance: WebGPU can be slower than WebGL in some cases, especially on older hardware. This is because WebGPU is an even lower-level API, and can take more time to compile shaders and set up the graphics pipeline
As the API matures and more browsers support it, we can expect to see these challenges addressed. In the meantime, tools like WebGPT can help with the experimentation and adoption of WebGPU.
The future of GPT and other transformer models
GPT and similar models are primarily run on servers due to their high computational demands; however, WebGPT demonstrates that these models can be run directly in the browser, offering a performance that can potentially rival server-based setups.
With the capabilities offered by technologies like WebGPU and projects like WebGPT, we could expand our use of transformer models like GPT by quite a bit. As the technology matures and optimization improves, we could see even larger models running smoothly in the browser.
This could increase the availability of advanced AI features in web applications, from more sophisticated chatbots to robust, real-time text analysis and generation tools, and even accelerate research and development in transformer models. By making it easier and cheaper to deploy these models, more developers and researchers will have the opportunity to experiment with and improve upon them.
Conclusion
Bringing advanced machine learning models to the browser through WebGPU opens up many opportunities for developers, and it presents a vision of a future where web applications are more powerful, responsive, and privacy-conscious.
While the technology is still relatively new and has challenges to overcome, such as optimizing performance and ensuring stability with larger models, the potential benefits are significant. As developers start to embrace and experiment with these tools, we can expect to see more impressive implementations like WebGPT and new web applications that leverage in-browser machine learning.
Give WebGPT a try and let me know what you think in the comments. If you would like to keep in touch, consider subscribing to my YouTube channel and following me on LinkedIn or Twitter. Keep building!
Are you adding new JS libraries to improve performance or build new features? What if they’re doing the opposite?
There’s no doubt that frontends are getting more complex. As you add new JavaScript libraries and other dependencies to your app, you’ll need more visibility to ensure your users don’t run into unknown issues.
LogRocket is a frontend application monitoring solution that lets you replay JavaScript errors as if they happened in your own browser so you can react to bugs more effectively.

LogRocket works perfectly with any app, regardless of framework, and has plugins to log additional context from Redux, Vuex, and @ngrx/store. Instead of guessing why problems happen, you can aggregate and report on what state your application was in when an issue occurred. LogRocket also monitors your app’s performance, reporting metrics like client CPU load, client memory usage, and more.
Build confidently — Start monitoring for free.