Build an AI chatbot with FastChat and JavaScript

See how LogRocket's Galileo AI surfaces the most severe issues for you

No signup required

Check it out

close this ✕

A recent survey by Salesforce indicates that 23 percent of service companies are currently using chatbots, and that percentage is expected to more than double in the next 18 months. FastChat is a platform for training, serving, and evaluating large language models that are behind many of these chatbots.

In this article, we’ll explore how to use FastChat to implement a simple AI chatbot in a JavaScript web app, which is just one of the functionalities that FastChat makes available within its unique architecture.

Jump ahead:

Setting up FastChat
Choosing a suitable language model
Launching the FastChat controller
Creating the UI
Building a simple chatbot web app

Setting up FastChat

FastChat is available as an open source Python library. We’ll install it using pip:

pip3 install "fschat[model_worker,webui]"

The above command installs fschat along with two optional modules: model_worker, to handle the different models that can be served by FastChat, and webui, to host a web interface for interacting with the chatbot model.

The FastChat platform provides access to multiple chatbot models, and each model is assigned to a worker (hence the model_worker name). The chatbot models are accessible through a fully OpenAI-compatible API server and can be used directly with the openai-python library.

This is a pretty smart design; you can test out different models with your chatbot just by changing the URL you use to address different workers. Most importantly, any pre-existing application that uses the OpenAI API can easily leverage a FastChat server without any modification in the source code.

Choosing a suitable language model

The “large” in large language model (LLM) is a hint that we’ll need large amounts of memory and computational power to train a new model for generative inference. Fortunately, there’s a pretty extensive list of language models that are compatible with FastChat.

To choose the model that’s most suitable for your project’s requirements, consider availability of hardware resources (GPUs, GPU memory, and/or CPU memory) and how much money you want to invest for a cloud service.You should also consider the requirements of the system you are going to design in terms of the kind of generation you are aiming for.

Here’s a comparison of some commercially-available LLMs:

GPT-3: This versatile language model from OpenAI excels in a wide range of natural language processing tasks. Its strength lies in generating coherent and contextually relevant text, making it suitable for tasks like content generation, text completion, chatbots, and creative writing. Its expansive knowledge base allows it to answer questions and provide explanations across various domains
BERT (Bidirectional Encoder Representations from Transformers): This model, developed by Google, is known for its understanding of context and context-dependent word meanings. Its bidirectional training enables it to capture nuances in language, making it a great choice for tasks requiring deep comprehension. This model is particularly effective for tasks like sentiment analysis, named entity recognition, text classification, and answer generation
T5 (Text-to-Text Transfer Transformer): This model, also from Google, is designed to handle a wide range of natural language processing tasks through a unified framework of “text-to-text” conversion. It treats every task like a text generation problem, including text summarization, translation, classification, and more. This approach simplifies the model’s architecture and makes it adaptable for various tasks

For the tutorial portion of this article, we’ll use the google/flan-t5-large model. It is inspired by Google’s T5 model, but runs comfortably on an average laptop and is powerful enough to fruitfully interact with.

To start, install the model using the following command:

python3 -m fastchat.serve.model_worker --model-path google/flan-t5-large

This command sets up a model_worker that will download the model weights and deploy a suitable API to interact with the specified model. The above command line takes --device cpu as an additional parameter, allowing the model to run on the CPU and rely only on RAM instead of the VRAM of your GPU.

Next, use the following command to run the model interactively right in the command line:

python3 -m fastchat.serve.cli --model-path google/flan-t5-large --device cpu

Launching the FastChat controller

The controller is a centerpiece of the FastChat architecture. It orchestrates the calls toward the instances of any model_worker you have running and checks the health of those instances with a periodic heartbeat.

This mechanism is well illustrated in the following diagram from the FastChat GitHub repository:

FastChat Controller

The web interface to FastChat workers is a particular Gradio server that hosts all the UI components needed to interact with the models. The controller is the component that exposes the OpenAI API interface and interacts with the different workers, as shown in the above diagram.

The controller mechanism is agnostic toward the kind of worker you use: both on-premise and cloud workers are acceptable. Use the following command to launch the controller that will listen for the model_worker to connect:

python3 -m fastchat.serve.controller

The controller will use the http://localhost:21001/ virtual server to handle the connection as long as a new model_worker is available.

Creating the UI

The fastest way to build the chatbot’s UI is to use the webui module. This module uses Gradio, a library of web components designed to simplify the deployment of UI for interacting with AI pipelines and chatting with chatbots.

The webui module, available in the gradio_web_server.py on the FastChat repository, provides a simple UI that will address the correct model_worker:
FastChat Chat Bot UI

The web app will be available on http://localhost:7860/ and will handle the interaction with the model.

Building a simple chatbot web app

Let’s build a simple web app, written in JavaScript, that uses the OpenAI API hosted by FastChat.

As you might expect, we will need the model_worker, as well as a controller to handle the worker. This time, we’ll use the openai_api_server to host the OpenAI API. This will be a full-fledged OpenAI API but, since we’ll host it on our hardware with a pre-trained model, it will not require the API key.

The code for this example is available on GitHub. The three Python scripts must each be run in a different shell window.

First, launch the controller:

python3 -m fastchat.serve.controller

Then, launch the model worker(s):

python3 -m fastchat.serve.model_worker --model-path google/flan-t5-large

Finally, launch the RESTful API server:

python3 -m fastchat.serve.openai_api_server --host localhost --port 8000

Now, let’s test the API server. Once you have the OpenAI API server up and running you’ll be able to interact with it on http://localhost:8000.

A quick curl will be enough to test it:

curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
    "model": "google/flan-t5-large",
    "messages": [{"role": "user", 
                  "content": "Hello! What is your name?"}]
}'

In the /node-webui directory you’ll find a single index.html file containing the smallest amount of code required to interact with the OpenAI API through a user interface using simple JavaScript:

AI Generated Response Example

Here’s the relevant code:

const apiKey = "YOUR_OPENAI_API_KEY"; 
...
try {
 const response = await fetch('http://localhost:8000/v1/completions', {
  method: 'POST',
  headers: {
   'Content-Type': 'application/json',
   'Authorization': `Bearer ${apiKey}`
 },
 body: JSON.stringify({
  model: "flan-t5-large",
  prompt: input,
  max_tokens: 50
 })
});

The UI may not be very interesting, but the relevant part here is the interaction with the OpenAI API. As you can see, there is a simple invocation of the API by passing the standard API parameters.

In particular, the model parameter must contain the name of the model we want to address. This will be passed to the controller in order to pass the call to the relevant worker_model.

Over 200k developers use LogRocket to create better digital experiences

Learn more →

In the header there is another parameter, the Authorization field, which in the standard OpenAI contains the API key we’d have to buy. In the case of this demo, that API key is just a string that will not be used in any way since the model only runs on our infrastructure (which in my case, is my laptop).

Conclusion

FastChat is an amazing package that is designed to benchmark, interact, and experiment with a plethora of LLMs. It offers a quick way to host a chat web interface using Gradio, a standard framework for building machine learning web apps.

FastChat is useful for testing the model, but in the most general case of hosting your own application, you can also have a proper OpenAI API that will accommodate any pre-existing code compatible with it.

FastChat’s architecture is also particularly efficient in terms of expandability. It’s possible to host more LLM in order to test them side by side on the same input or to orchestrate specialized LLMs for a specific task.

LogRocket: Debug JavaScript errors more easily by understanding the context

Debugging code is always a tedious task. But the more you understand your errors, the easier it is to fix them.

LogRocket allows you to understand these errors in new and unique ways. Our frontend monitoring solution tracks user engagement with your JavaScript frontends to give you the ability to see exactly what the user did that led to an error.

LogRocket records console logs, page load times, stack traces, slow network requests/responses with headers + bodies, browser metadata, and custom logs. Understanding the impact of your JavaScript code will never be easier!

Try it for free.

Build a React AI image generator with Hugging Face Diffusers

Build a React-based AI image generator app offline using the Hugging Face Diffusers library and Stable Diffusion XL.

Andrew Baisden

May 29, 2025 ⋅ 10 min read

Gemini 2.5 and the future of AI reasoning for frontend devs

Get up to speed on Google’s latest breakthrough with the Gemini 2.5 model and what it means for the future of frontend AI tools.

Chizaram Ken

May 29, 2025 ⋅ 5 min read

Exploring the top Rust web frameworks

In this article, we’ll explore the best Rust frameworks for web development, including Actix Web, Rocket, Axum, warp, Leptos, Cot, and Loco.

Abiodun Solomon

May 28, 2025 ⋅ 11 min read

How to use the CSS `cursor` property

A single line of CSS can change how users feel about your UI. Learn how to leverage the cursor property to signal intent, improve interaction flow, and elevate accessibility.

Chizaram Ken

May 28, 2025 ⋅ 6 min read

View all posts

Advisory boards aren’t only for executives. Join the LogRocket Content Advisory Board today →

Build an AI chatbot with FastChat and JavaScript

See how LogRocket's Galileo AI surfaces the most severe issues for you

No signup required

Setting up FastChat

Choosing a suitable language model

Launching the FastChat controller

Creating the UI

Building a simple chatbot web app

Over 200k developers use LogRocket to create better digital experiences

Conclusion

LogRocket: Debug JavaScript errors more easily by understanding the context

Stop guessing about your digital experience with LogRocket

Recent posts:

Build a React AI image generator with Hugging Face Diffusers

Gemini 2.5 and the future of AI reasoning for frontend devs

Exploring the top Rust web frameworks

How to use the CSS `cursor` property

Leave a ReplyCancel reply

Advisory boards aren’t only for executives. Join the LogRocket Content Advisory Board today →

See how LogRocket's Galileo AI surfaces the most severe issues for you

No signup required

Setting up FastChat

Choosing a suitable language model

Launching the FastChat controller

Creating the UI

Building a simple chatbot web app

Over 200k developers use LogRocket to create better digital experiences

Conclusion

LogRocket: Debug JavaScript errors more easily by understanding the context

Share this:

Stop guessing about your digital experience with LogRocket

Recent posts:

Build a React AI image generator with Hugging Face Diffusers

Gemini 2.5 and the future of AI reasoning for frontend devs

Exploring the top Rust web frameworks

How to use the CSS cursor property

Leave a ReplyCancel reply

How to use the CSS `cursor` property