Real-time AI in Next.js: How to stream responses with the Vercel AI SDK

See how LogRocket's Galileo AI surfaces the most severe issues for you

No signup required

Check it out

Response streaming is one of the simplest but most effective ways to improve the user experience in AI-powered applications. Instead of making users wait for a lengthy and fully generated response, you can stream the output token by token and display it as it’s being produced.

Real-Time AI In Next.js How To Stream Responses With The Vercel AI SDK

This is the same effect you see when using ChatGPT or Gemini, where the text appears gradually, almost as if the AI is typing in real time. It makes your app feel faster and more interactive.

In this tutorial, you’ll learn how to stream AI-generated responses in a Next.js app using the Vercel AI SDK. We’ll start with real-time text streaming, add a smooth typing effect to make it feel more natural, and then extend it to include the model’s reasoning. You’ll see how this works across various AI providers, including OpenAI, Gemini, and Anthropic. We’ll also cover how to handle edge cases, such as network interruptions, and when streaming is actually worth using.

🚀 Sign up for The Replay newsletter

The Replay is a weekly newsletter for dev and engineering leaders.

Delivered once a week, it's your curated guide to the most important conversations around frontend dev, emerging AI tools, and the state of modern software.

Prerequisites

To follow along, you’ll need:

Node.js and npm installed on your system
Basic familiarity with React and Next.js App Router
An API key from any AI provider that supports streaming, such as Gemini, OpenAI, or Anthropic.

Gemini is recommended for this tutorial because you can quickly create a free API key without needing to enter credit card details. However, if you already have an OpenAI, Claude, or Grok key, you can easily adapt the code for those providers.

Once you have your API key ready, let’s take a quick look at what actually happens behind the scenes when an AI response is streamed.

What happens when you stream an AI response

When you send a message to a large language model (LLM), it doesn’t always wait to finish the entire response before sending something back. Instead, most models stream the output in small pieces as it is generated, and it’s all made possible through a protocol called Server-Sent Events (SSE).

With SSE, your backend receives a continuous stream of data events like this:

data: {"delta": "Hello"}
data: {"delta": " there"}
data: {"delta": "!"}
data: [DONE]

Each delta represents a small chunk of text generated by the model. You can collect these chunks, append them together, and forward them to your frontend in real time, so the user sees the text appear gradually instead of waiting for the full response.

While each AI provider has its own approach to handling streaming, the core pattern remains the same. The Vercel AI SDK builds on this idea and removes all the complicated parts. Instead of writing custom logic for each provider’s streaming format, it gives you a unified SDK that works everywhere. You write your streaming code once, and it runs seamlessly with OpenAI, Gemini, Anthropic, and others.

With that covered, let’s set up the project and start building our own streaming chat app.

Setting up the project

Let’s start by creating a new Next.js application by running the following command:

npx create-next-app ai-app

During setup, ensure that you select the App Router mode and Tailwind CSS. You can select other configuration options according to your preference.

Next, install the necessary dependencies by running the command below:

npm install ai @ai-sdk/react @ai-sdk/google @ai-sdk/openai zod

This command will install the core AI SDK, its React hooks, provider adapters for Gemini and OpenAI, and Zod for validation.

Create a new .env file in your project root directory and add your Gemini or OpenAI API key as shown below:

OPENAI_API_KEY=YOUR_OPENAI_KEY
GOOGLE_GENERATIVE_AI_API_KEY=YOUR_GEMINI_KEY

To add a streaming API route, create a new api/chat/route.ts file inside the default /app directory and paste the following code into it:

import { google } from "@ai-sdk/google";
import { streamText, UIMessage, convertToModelMessages } from "ai";

export const maxDuration = 30;

export async function POST(req: Request) {
  const { messages }: { messages: UIMessage[] } = await req.json();

  const result = streamText({
    model: google("gemini-2.5-flash"),
    messages: convertToModelMessages(messages),
  });

  return result.toUIMessageStreamResponse();
}

This route exposes a POST endpoint that receives the chat history from the client as JSON. Inside the handler, the streamText function sends that history to the model and starts receiving chunks of the response as they’re generated. Instead of waiting for the full output, it immediately begins streaming those chunks back to the browser as an HTTP stream.

The call to toUIMessageStreamResponse() is what makes this stream compatible with the React side. It converts the raw event data received from the provider into a structured format that the frontend can understand and render in real-time.

Over 200k developers use LogRocket to create better digital experiences

Learn more →

If you’re using OpenAI instead of Gemini, you need to import the OpenAI adapter and change the model in the streamText() function as shown below:

import { openai } from "@ai-sdk/openai";
const result = streamText({
  model: openai("gpt-4o"), // ←
  ...
});

Other providers like Anthropic or Grok also work the same way once you install their adapter. For more details, refer to the AI SDK documentation.

To proceed, open the default app/page.js file and replace its content with the following code:

"use client";

import { useChat } from "@ai-sdk/react";
import { useState, useEffect } from "react";

export default function Chat() {
  const [input, setInput] = useState("");
  const { messages, sendMessage } = useChat();
  return (
    <div className="flex flex-col w-full max-w-md py-24 mx-auto stretch">
      {messages.map((message) => (
        <div key={message.id} className="whitespace-pre-wrap">
          {message.role === "user" ? (
            <div className="text-right">You: </div>
          ) : (
            <div className="text-left">AI: </div>
          )}
          {message.parts.map((part, i) => {
            switch (part.type) {
              case "text":
                return (
                  <div
                    className={`${
                      message.role === "user" ? "text-right" : "text-left"
                    }`}
                    key={`${message.id}-${i}`}
                  >
                    {part.text}
                  </div>
                );
            }
          })}
        </div>
      ))}

      <form
        onSubmit={(e) => {
          e.preventDefault();
          sendMessage({ text: input });
          setInput("");
        }}
      >
        <input
          className="fixed bottom-0 w-full max-w-md p-2 mb-8 border border-zinc-300 rounded shadow-xl"
          value={input}
          placeholder="Say something..."
          onChange={(e) => setInput(e.currentTarget.value)}
        />
      </form>
    </div>
  );
}

This code creates the chat interface and connects it to the streaming API route. The useChat() hook from @ai-sdk/react manages all the chat state. It stores messages and handles sending new ones to the /api/chat endpoint we created earlier. When the user submits a message, the hook automatically streams the model’s response back and updates the UI in real time.

With this setup done, you already have a fully functional AI chat. To see it in action, start your development server with:

npm run dev

Then open http://localhost:3000 in your browser. Type something into the input box and hit Enter. The AI’s response will start coming in as it’s generated:

Fully Functional AI Chat

The streaming is already working under the hood, but it’s not yet noticeable. In the next step, we’ll add a smooth typing effect to make the streaming feel more natural and dynamic.

How to add a typing effect to streamed responses

To proceed, create a new file app/components/StreamingText.js and paste the code below:

"use client";

import { useState, useEffect } from "react";

export default function StreamingText({
  text,
  isStreaming = false,
  speed = 50,
}) {
  const [displayedText, setDisplayedText] = useState("");
  const [currentIndex, setCurrentIndex] = useState(0);

  useEffect(() => {
    if (text !== displayedText + text.slice(currentIndex)) {
      setDisplayedText("");
      setCurrentIndex(0);
    }
  }, [text]);

  useEffect(() => {
    if (currentIndex < text.length) {
      const timer = setTimeout(() => {
        setDisplayedText((prev) => prev + text[currentIndex]);
        setCurrentIndex((prev) => prev + 1);
      }, speed);

      return () => clearTimeout(timer);
    }
  }, [currentIndex, text, speed]);

  return (
    <span className="inline-block">
      {displayedText}
      {isStreaming && currentIndex < text.length && (
        <span className="typing-cursor">|</span>
      )}
    </span>
  );
}

This component keeps track of two things: the portion of text that’s already displayed and the current index of the next character to show. Whenever the text changes (for example, when a new message starts streaming), it resets the index and begins revealing the text one character at a time based on the defined speed value. While the AI is still generating text, it also shows a simple blinking cursor to make the typing effect more noticeable.

Next, open the default app/global.css file and add the animation for the cursor:

@keyframes typing-cursor {
  0%,
  50% {
    opacity: 1;
  }
  51%,
  100% {
    opacity: 0;
  }
}

.typing-cursor {
  animation: typing-cursor 1s infinite;
}

Now, open app/page.js and replace its content with the code below:

"use client";

import { useChat } from "@ai-sdk/react";
import { useState, useEffect } from "react";
import StreamingText from "./components/StreamingText";

export default function Chat() {
  const [input, setInput] = useState("");
  const { messages, sendMessage, status } = useChat();

  return (
    <div className="flex flex-col w-full max-w-md py-24 mx-auto stretch">
      {messages.map((message) => (
        <div key={message.id} className="whitespace-pre-wrap">
          {message.role === "user" ? (
            <div className="text-right">You: </div>
          ) : (
            <div className="text-left">AI: </div>
          )}
          {message.parts.map((part, i) => {
            switch (part.type) {
              case "text":
                return (
                  <div
                    className={`${
                      message.role === "user" ? "text-right" : "text-left"
                    }`}
                    key={`${message.id}-${i}`}
                  >
                    {message.role === "assistant" ? (
                      <StreamingText
                        text={part.text}
                        isStreaming={
                          status === "in_progress" &&
                          messages[messages.length - 1]?.id === message.id
                        }
                      />
                    ) : (
                      part.text
                    )}
                  </div>
                );
            }
          })}
        </div>
      ))}

      <form
        onSubmit={(e) => {
          e.preventDefault();
          sendMessage({ text: input });
          setInput("");
        }}
      >
        <input
          className="fixed dark:bg-zinc-900 bottom-0 w-full max-w-md p-2 mb-8 border border-zinc-300 dark:border-zinc-800 rounded shadow-xl"
          value={input}
          placeholder="Say something..."
          onChange={(e) => setInput(e.currentTarget.value)}
        />
      </form>
    </div>
  );
}

Here, we’re updating the message rendering logic to use the StreamingText component for the AI’s latest response. The check:

status === "in_progress" && messages[messages.length - 1]?.id === message.id

ensures that only the most recent AI message shows the typing animation while it’s streaming. Older messages are rendered instantly, so the chat history remains fast and smooth.

Once you save your changes and reload the browser, send a new message to the AI. This time, the response will appear as a live typing animation instead of a static block of text: AI Live Typing Animation Response

Our setup keeps things clean and scalable. The typing effect is fully contained within its own component, which means you can easily reuse or disable it without touching your core logic. The streaming still occurs through the Vercel AI SDK; the <StreamingText /> component only determines how the stream appears to the user.

Now that we’ve improved the streaming experience, let’s take it a step further by showing not just what the model says, but also how it thinks.

How to stream thinking-model reasoning and the final response

Reasoning is one of the major advancements that make newer AI models far more capable than the older ones. Instead of jumping straight to an answer, they can think through a problem and correct themselves before responding. While this reasoning shouldn’t be mixed with the final response, exposing it can be useful for debugging and transparency in how the AI reached its conclusion.

With just a few changes, the Vercel AI SDK enables us to stream reasoning alongside the response stream, allowing us to expand our current chat interface to display both in real-time while maintaining a clean and user-friendly main response.

Handling edge cases

It’s normal for certain errors to occur while the AI is generating a response, such as internet disruption, hitting the model’s token limit, or the provider timing out. Vercel AI SDK provides built-in error handling to deal with these cases gracefully.

On the client, you can access the error value directly from the useChat() hook. When something goes wrong, this value contains a descriptive message about what happened:

const { messages, sendMessage, status, error } = useChat();

return (
  <div>
    {error && (
      <div className="text-sm text-red-500 mb-2">
        {error.message || "Something went wrong. Please try again."}
      </div>
    )}
  </div>
);

This provides clear feedback instead of silently failing when something unexpected occurs during the streaming process.

When and when not to stream (with typing effect)

Streaming changes how people experience AI. It makes responses feel natural and real-time, rather than static. However, that doesn’t mean it’s a good fit for all types of AI-native applications.

Streaming AI responses, especially with the type effect, work best when the user benefits from seeing the AI think or create in real-time. For example, in chatbots or personal assistants, it keeps the conversation flowing instead of showing a loading spinner. In AI coding tools, it helps users see code as it’s being written, so they can follow along or stop early. In creative tools, such as story or blog generators, it gives the feeling of co-writing with the AI, rather than waiting for a block of text to appear.

However, streaming isn’t worth it when users only care about the final result. For example, if your app generates structured data, such as JSON or SQL, performs summarization, or fetches factual answers, streaming adds unnecessary complexity. It’s simpler and faster to show the complete response once it’s ready.

A simple way to think about it is to stream when real-time feedback improves the experience and skip it when it doesn’t add any real benefit.

Conclusion

In this tutorial, we explored how to stream AI responses in Next.js using the Vercel AI SDK. You learned how to add real-time streaming with a typing effect, display the model’s reasoning separately, handle errors gracefully, and decide when streaming actually makes sense.

From here, you can extend your app with features such as a toggle to turn reasoning on or off, rendering markdown as formatted HTML, or even adding support for additional AI providers. You can also find the full source code for this tutorial on GitHub.

Thanks for reading!

LogRocket: Full visibility into production Next.js apps

Debugging Next applications can be difficult, especially when users experience issues that are difficult to reproduce. If you’re interested in monitoring and tracking state, automatically surfacing JavaScript errors, and tracking slow network requests and component load time, try LogRocket.

LogRocket captures console logs, errors, network requests, and pixel-perfect DOM recordings from user sessions and lets you replay them as users saw it, eliminating guesswork around why bugs happen — compatible with all frameworks.

LogRocket's Galileo AI watches sessions for you, instantly identifying and explaining user struggles with automated monitoring of your entire product experience.

The LogRocket Redux middleware package adds an extra layer of visibility into your user sessions. LogRocket logs all actions and state from your Redux stores.

Modernize how you debug your Next.js apps — start monitoring for free.

#nextjs

How to fix React routing loopholes with the React Router Middleware

Learn how React Router’s Middleware API fixes leaky redirects and redundant data fetching in protected routes.

Ikeh Akinyemi

Nov 13, 2025 ⋅ 3 min read

How I used Mastra to build a prize-winning RAG agent

A developer’s retrospective on creating an AI video transcription agent with Mastra, an open-source TypeScript framework for building AI agents.

Chinwike Maduabuchi

Nov 13, 2025 ⋅ 12 min read

Ensuring frontend data integrity with TanStack DB transactions

Learn how TanStack DB transactions ensure data consistency on the frontend with atomic updates, rollbacks, and optimistic UI in a simple order manager app.

Emmanuel John

Nov 13, 2025 ⋅ 11 min read

The Replay (11/12/25): Stop making these `useEffect` mistakes

Discover what’s new in The Replay, LogRocket’s newsletter for dev and engineering leaders, in the November 5th issue.

Matt MacCormack

Nov 12, 2025 ⋅ 33 sec read

View all posts

Advisory boards aren’t only for executives. Join the LogRocket Content Advisory Board today →

Real-time AI in Next.js: How to stream responses with the Vercel AI SDK

See how LogRocket's Galileo AI surfaces the most severe issues for you

No signup required

Prerequisites

What happens when you stream an AI response

Setting up the project

Over 200k developers use LogRocket to create better digital experiences

How to add a typing effect to streamed responses

How to stream thinking-model reasoning and the final response

More great articles from LogRocket:

Handling edge cases

When and when not to stream (with typing effect)

Conclusion

LogRocket: Full visibility into production Next.js apps

Stop guessing about your digital experience with LogRocket

Recent posts:

How to fix React routing loopholes with the React Router Middleware

How I used Mastra to build a prize-winning RAG agent

Ensuring frontend data integrity with TanStack DB transactions

The Replay (11/12/25): Stop making these `useEffect` mistakes

Leave a ReplyCancel reply

Advisory boards aren’t only for executives. Join the LogRocket Content Advisory Board today →

See how LogRocket's Galileo AI surfaces the most severe issues for you

No signup required

🚀 Sign up for The Replay newsletter

Prerequisites

What happens when you stream an AI response

Setting up the project

Over 200k developers use LogRocket to create better digital experiences

How to add a typing effect to streamed responses

How to stream thinking-model reasoning and the final response

More great articles from LogRocket:

Handling edge cases

When and when not to stream (with typing effect)

Conclusion

LogRocket: Full visibility into production Next.js apps

Stop guessing about your digital experience with LogRocket

Recent posts:

How to fix React routing loopholes with the React Router Middleware

How I used Mastra to build a prize-winning RAG agent

Ensuring frontend data integrity with TanStack DB transactions

The Replay (11/12/25): Stop making these useEffect mistakes

Leave a ReplyCancel reply

The Replay (11/12/25): Stop making these `useEffect` mistakes