Building AI apps that remember: Mem0 vs Supermemory

See how LogRocket's Galileo AI surfaces the most severe issues for you

No signup required

Check it out

Large Language Models (LLMs) enable fluent, natural conversations, but most applications built on top of them remain fundamentally stateless. Each interaction starts from scratch, with no durable understanding of the user beyond the current prompt. This becomes a problem quickly. A customer support bot that forgets past orders or a personal assistant that repeatedly asks for preferences delivers an experience that feels disconnected and inefficient.

This article focuses on long-term memory for LLM-powered applications and why it is essential for building stateful, context-aware AI systems. We compare two emerging open-source libraries, mem0 and Supermemory, which approach memory management in fundamentally different ways while working with the same underlying LLMs.

🚀 Sign up for The Replay newsletter

The Replay is a weekly newsletter for dev and engineering leaders.

Delivered once a week, it's your curated guide to the most important conversations around frontend dev, emerging AI tools, and the state of modern software.

How LLMs store context

LLMs maintain context by conditioning each response on the messages provided in the current request. In most applications, this takes the form of a conversation history passed as an ordered array of messages, where each message includes a role (system, user, or assistant) and associated text. The model generates its next response by attending to this entire sequence.

System messages are typically used to establish the assistant’s role or high-level behavior at the start of the conversation, while user and assistant messages capture the ongoing exchange. This approach works well for short-lived, single-session interactions, where all relevant context can be included directly in the prompt.

However, conversation history alone does not scale. As interactions grow longer or relevant context exists outside the current session, passing the full message history becomes inefficient and error-prone, setting the stage for approaches like retrieval and long-term memory.

The emergence of Retrieval-Augmented Generation (RAG)

This limitation led to Retrieval-Augmented Generation (RAG), a pattern that augments an LLM with external data at inference time. Relevant documents or records are retrieved based on the user’s query and injected into the prompt as additional context, allowing the model to generate responses grounded in up-to-date or user-specific information.

RAG Tool Calling

RAG helps by injecting relevant external data at inference time, but it still relies on the model to correctly interpret and carry that context forward. It also struggles with long-lived, user-specific preferences, such as tone, communication style, or dietary constraints, which is where dedicated memory systems become necessary:

How memory is different from RAG

RAG is fundamentally a retrieval mechanism, not a memory system. It assumes that relevant context can be searched for at query time and that retrieved documents accurately reflect the user’s current constraints and preferences. This assumption breaks down when preferences evolve gradually across interactions.

Fetching a user’s order history is relatively straightforward. You can expose an API for this data and invoke it through an LLM function call. By contrast, handling user preferences introduces a different class of problems:

How do we store these details in a structured manner
How do we retrieve these details effectively when needed
How do we make sure that the irrelevant or outdated information is pruned from the memory

The last problem is the most difficult to solve. Let’s try to understand this with an example. Here are some user interactions with an LLM:

October 2025
user: I love eating pistachio ice cream.

November 2025
user: I have been diagnosed with diabetes and need to avoid sugary foods.

December 2025
user: I have been recommended to follow a dairy-free diet by my nutritionist.

January 2026
user: I’m in the mood for a dessert. Any suggestions?

Given this conversation history, a naïve RAG setup using semantic search might enthusiastically suggest pistachio ice cream as a snack option. Unfortunately, that recommendation ignores two critical constraints: the user has diabetes and is following a dairy-free diet. In other words, the retrieval worked, but the memory failed.

This gap between retrieval and memory is what systems like mem0 and Supermemory are designed to address.

Why do in-house solutions fail?

At first glance, managing user memory in-house may seem straightforward. A common approach is to store user preferences as JSON in a NoSQL database and periodically run batch jobs to summarize or prune that data. In practice, this approach quickly runs into fundamental limitations:

Lack of structure: Unstructured JSON makes it difficult to reliably extract and retrieve the signals that actually matter at inference time.
Inconsistency: As preferences accumulate, detecting and resolving outdated or conflicting signals becomes increasingly error-prone.
Scalability: Feeding growing memory histories back into an LLM quickly hits context window and token limits.
Cost: Manual pruning and summarization often require multiple LLM calls, increasing latency and operational overhead.

In practice, these constraints make general-purpose storage a poor fit for long-term memory. Purpose-built memory systems address these issues with semantic representations, relevance scoring, and built-in pruning mechanisms.

Introduction to mem0

mem0 approaches long-term memory as a collection of explicit memory items managed through APIs. Developers are responsible for deciding when to add memories, how they should be structured, and when they should be retrieved. This design emphasizes control and observability, making memory behavior easy to inspect, debug, and audit over time.

Key capabilities

Categorization: Memory items are automatically classified based on content, with support for custom project-level categories to improve organization and retrieval.
Inference: Memories can be stored as raw text or in an inferred form, where structured signals are extracted using an LLM-based inference engine.
Graph relationships: In addition to semantic similarity, mem0 supports explicit relationships between memory items, allowing related memories to be retrieved together.

Taken together, these capabilities allow applications to build a structured, queryable memory layer on top of unstructured conversation data, while retaining fine-grained control over how memory evolves.

Mem0 MCP

mem0 also offers an MCP integration that exposes memory operations such as adding and searching memories as callable tools. This simplifies integration with MCP-compatible clients, but it requires the application to run within an MCP environment, which can limit its use in non-MCP workflows:

mem0’s MCP tool interface, showing the set of explicit memory management operations (add, search, update, and delete) exposed to agents and applications.

Overall, mem0 is well-suited for teams that want fine-grained control over how memory is captured, structured, and retrieved.

Introduction to Supermemory

Supermemory takes a higher-level approach to memory management by organizing long-term context around user profiles. Instead of managing individual memory items directly, developers provide a user identifier and allow Supermemory to automatically infer, update, and retrieve relevant user context.

User profiles

A user profile is an automatically maintained set of facts about a user, built incrementally from their interactions over time. The profile is continuously updated and selectively injected into the LLM’s context based on the current query. This reduces the need to search for individual memories on every request, while still allowing targeted memory retrieval when needed.

Static profile: This is the part of the user profile that contains stable facts about the user that rarely changes, like the user’s place of birth, favorite color, education, etc.
Dynamic profile: This holds temporary updates on the user that can change frequently. This could be anything from current work, recent activities, trips, etc.

Developers do not need to manually maintain user profiles. Once a user ID is provided, Supermemory automatically handles the following steps:

Ingests conversation content and extracts text from any associated documents or images
Derives structured signals from the ingested data and establishes relationships between them
Updates the user’s profile, including both static and dynamic attributes, based on the extracted intelligence

In our LLM chat application, this means we do not need to explicitly separate memories into static and dynamic profiles. Instead, we integrate the tools exposed by Supermemory into the AI SDK and allow Supermemory to manage memory ingestion, updating, and retrieval end to end.

Memory router

Supermemory also provides us with an alternative approach to integrate long-term memory into LLM applications using the memory router.

In this approach, our API request is proxied to Supermemory instead of hitting the LLM provider directly. Supermemory then takes care of fetching the relevant user profile and memories and adding them to the context before forwarding the request to the LLM provider.

This approach is easier to integrate as it does not require any changes to the existing application code. But it does add some latency to the LLM response time as the request has to go through Supermemory first. Also, there is no manual control over what memories are added to the context. Supermemory decides that based on its own algorithms. Another downside of this approach is the faster consumption of Supermemory tokens, as every LLM request is proxied via Supermemory.

Supermemory MCP

Similar to mem0, Supermemory also provides an MCP (Memory Context Protocol) integration that can be used to integrate memory management into LLM applications seamlessly. Let us now look into explicit integrations where we get a more fine-grained control, and that can work irrespective of whether we are using an MCP client or not.

Where they differ

At a high level, mem0 and Supermemory solve the same problem but optimize for different tradeoffs. mem0 exposes memory as explicit, inspectable objects and gives developers precise control over memory lifecycle and retrieval. Supermemory abstracts memory into user profiles and prioritizes automatic context injection with minimal application logic.

The choice between them depends less on model compatibility and more on how much control, transparency, and automation an application requires.

Hands-on: Integration with Vercel AI SDK

As AI SDK by Vercel is the standard way to build LLM applications, we will look into how to integrate both mem0 and Supermemory into a Next.js application using ai-sdk. Some code snippets are mentioned below, but you can see the full code in this GitHub repo.

Setting up a Next.js app with ai-sdk

First, let us create a Next.js app.

pnpm create next-app@latest llm-memory-app
cd llm-memory-app

Then, let us install the packages required to make ai-sdk work.

pnpm add ai @ai-sdk/react @ai-sdk/openai zod

We also need to create a .env file in the root of the project to store our OpenAI API key.

OPENAI_API_KEY=your_openai_api_key

Then, create a new index.js file in the pages folder and a route to handle the /chat route. With these changes, we should be able to see a chat UI that responds to user messages using the OpenAI model.

With that in place, let us now integrate Supermemory and mem0 into this.

Supermemory ai-sdk integration

With the chat application in place, we will now integrate Supermemory into this application. First, we need to install the Supermemory ai-sdk package.

pnpm add @supermemory/tools

Next, we need to create a Supermemory account and get the API key. The key can be obtained from the dashboard after creating a Supermemory account and a project (which is free). Once we have the API key, we need to import the Supermemory tools compatible with ai-sdk and supply it to the tools parameter of the streamText call.

This is what our final /chat route looks like with Supermemory integrated:

import { streamText, convertToModelMessages, stepCountIs } from 'ai';
import { openai } from "@ai-sdk/openai";
import { supermemoryTools } from '@supermemory/tools/ai-sdk'

const customerId = "test-user";

export async function POST(req) {
  const { messages } = await req.json();

  const result = streamText({
    model: openai("gpt-4o"),
    messages: await convertToModelMessages(messages),
    stopWhen: stepCountIs(5),
    tools: supermemoryTools(process.env.SUPERMEMORY_API_KEY, {
      containerTags: [customerId]
    }),
  });

  return result.toUIMessageStreamResponse();
}

This integration will start to add memories to Supermemory based on the conversations happening via this chat app. It will also maintain user profiles based on these conversations. We’ve hardcoded the customerId variable to a static value. In a real-world application, this should be dynamically generated based on the logged-in user.

Once enough memories have been stored, they will automatically be retrieved and added to the context while responding to user queries. Here is an example response from the LLM after storing some memories about the user when the user asks about suggestions to buy a gift for themselves:

{
  "chunks": [
    {
      "content": "User's favorite color is teal.",
      "position": 0,
      "isRelevant": true,
      "score": 0.6585373093441402
    }
  ],
  "createdAt": "2026-01-16T06:51:26.585Z",
  "documentId": "3kQFHH2D1S5NnLRP75upr5",
  "metadata": {},
  "score": 0.6585373093441402,
  "title": "User's Favorite Color: Teal",
  "updatedAt": "2026-01-16T06:51:26.585Z",
  "type": "text",
  "content": "User's favorite color is teal."
},
{
  "chunks": [
    {
      "content": "User is a sneakerhead.",
      "position": 0,
      "isRelevant": true,
      "score": 0.6341202695226393
    }
  ],
  "createdAt": "2026-01-16T06:51:27.903Z",
  "documentId": "K3pD6jzuB75ATS1EeXfDkX",
  "metadata": {},
  "score": 0.6341202695226393,
  "title": "Sneakerhead Profile Overview",
  "updatedAt": "2026-01-16T06:51:27.903Z",
  "type": "text",
  "content": "User is a sneakerhead."
}

Based on the user query, the LLM decided to call the tool to fetch relevant memories. The same response got added to the message array as a tool response. Based on that, the LLM generated the final response to the user:

Final Response to User

With the Supermemory tools integrated, we can effectively manage long-term memory and user profiles in our chat application.

mem0 ai-sdk integration

Next, let’s integrate mem0 into our chat application. It is a little more verbose, because we don’t have an official ai-sdk tool package for mem0 yet. But we can use the mem0 function utilities to achieve the same result.

There are 3 utilities that mem0 provides us to integrate with ai-sdk:

retrieveMemories: This function can be used to fetch memories in text format. The response of this can be supplied as the system prompt.
addMemories: This function can be invoked whenever the LLM finds a need to add memories for the user. The function takes a list of messages that the LLM can supply while invoking it based on the parts needed to save.
getMemories: This function can be used to retrieve specific memories about a user based on a query. The LLM can invoke this function as needed to fetch relevant memories.

We will modify the ai-sdk chat route as follows to integrate mem0:

import { streamText, convertToModelMessages, tool, stepCountIs } from 'ai';
import { openai } from "@ai-sdk/openai";
import { z } from 'zod';
import { addMemories, retrieveMemories, getMemories } from "@mem0/vercel-ai-provider";

const userId = "test-user";

export async function POST(req) {
  const { messages } = await req.json();

  const memories = await retrieveMemories('Fetch user details', { user_id: userId });

  const result = streamText({
    model: openai("gpt-4o"),
    messages: await convertToModelMessages(messages),
    stopWhen: stepCountIs(5),
    tools: {
      addMemories: tool({
        description: 'Add relevant memories for the user preferences, traits etc',
        inputSchema: z.object({
          user: z.string().describe('The user to add memories for'),
          messages: z.array(z.object({
            role: z.enum(['user', 'assistant', 'system']).describe('The role of the message part'),
            content: z.string().describe('The content of the message part'),
          })).describe('The relevant messages to add as memories'),
        }),
        execute: async ({ user, messages }) => await addMemories(messages, { user_id: user }),
      }),
      getMemories: tool({
        description: 'Add relevant memories for the user preferences, traits etc',
        inputSchema: z.object({
          user: z.string().describe('The user to add memories for'),
          prompt: z.string().describe('The prompt to retrieve memories for'),
        }),
        execute: async ({ user, prompt }) => await getMemories(prompt, { user_id: user }),
      }),
    },
    system: memories,
  });

  return result.toUIMessageStreamResponse();
}

We first retrieve the memories about the user using the retrieveMemories function and supply that as the system prompt. Next, we manually integrate the addMemories and getMemories functions as tools that can be invoked by the LLM as needed. We specified the inputs that the tools expect using zod schemas. We also set up the execute function to call the respective mem0 functions with the required parameters.

With that in place, whenever the user says something that needs to be remembered, the LLM invokes the tool as shown below: add memory section

Once the processing has completed, the status of the memory moves out of the PENDING state.

After that, whenever the user asks something that needs to fetch relevant memories (even after several days in a completely new session), the LLM invokes the getMemories tool as shown below: get memory section

This shows that the memory about the user liking pistachio ice cream has been successfully retrieved from mem0 and persisted for longer durations. In the future, if the user mentions a change of preference, the LLM can invoke the addMemories tool again to update the memory in mem0.

Conclusion

Long-term memory is quickly becoming a baseline requirement for serious LLM applications. As we’ve seen, techniques like RAG help with retrieval, but they struggle to capture evolving, user-specific context, and rolling your own memory system introduces a lot of hidden complexity. Tools like mem0 and Supermemory offer practical ways to solve this today, each with a different philosophy around control and automation. As AI apps move beyond one-off chats and toward long-lived user relationships, choosing the right memory layer will increasingly shape how useful and reliable those experiences feel.

Why engineering knowledge disappears as teams scale (and how to fight it)

Discover five practical ways to scale knowledge sharing across engineering teams and reduce onboarding time, bottlenecks, and lost context.

Marie Starck

Mar 4, 2026 ⋅ 6 min read

The Replay (3/4/26): Eng knowledge gaps, OpenClaw, and more

Discover what’s new in The Replay, LogRocket’s newsletter for dev and engineering leaders, in the March 4th issue.

Matt MacCormack

Mar 4, 2026 ⋅ 27 sec read

Open Claw, AI agents, and the future of developer workflows

Paige, Jack, Paul, and Noel dig into the biggest shifts reshaping web development right now, from OpenClaw’s foundation move to AI-powered browsers and the growing mental load of agent-driven workflows.

PodRocket

Mar 2, 2026 ⋅ 47 sec read

Headless UI Alternatives: Radix Primitives, React Aria, Ark UI

Headless UI alternatives: Radix Primitives vs. React Aria vs. Ark UI vs. Base UI

Check out alternatives to the Headless UI library to find unstyled components to optimize your website’s performance without compromising your design.

Amazing Enyichi Agu

Mar 2, 2026 ⋅ 10 min read

View all posts

Advisory boards aren’t only for executives. Join the LogRocket Content Advisory Board today →