Skip to main content

Documentation Index

Fetch the complete documentation index at: https://upstash.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Large language models are stateless: once a request returns, the model forgets everything. To build an agent that remembers who a user is and what happened in past conversations, you need to store that context yourself and feed it back into the prompt. In this tutorial we build a small but complete agent memory layer on Upstash Redis, with two tiers:
  • Working memory: the running conversation for the current session, stored in a single Redis key with a TTL so it expires on its own.
  • Long-term memory: durable facts about the user (preferences, events, decisions) stored as JSON documents and recalled with Redis Search full-text queries.
On every turn the agent recalls relevant long-term memories, answers using those plus the recent conversation, then remembers any new facts worth keeping.
This tutorial uses OpenAI for the chat and fact-extraction calls, but the memory layer itself is model-agnostic, so swap in any LLM you like.

Prerequisites

  • An Upstash Redis database (the REST URL and token).
  • An OpenAI API key.
Install the dependencies:
npm install @upstash/redis openai
Set your environment variables:
UPSTASH_REDIS_REST_URL="https://..."
UPSTASH_REDIS_REST_TOKEN="..."
OPENAI_API_KEY="sk-..."

Step 1: Create the long-term memory index

Long-term memories are JSON documents stored under the memory: prefix. We index the text field for full-text recall, and keep userId and kind as exact-match keywords so we can scope a search to a single user. createdAt is a sortable number we can use to favor recent memories. Create the index once (e.g. in a setup script), not on every request.
// setup.ts
import { Redis, s } from "@upstash/redis";

const redis = Redis.fromEnv();

try {
  await redis.search.createIndex({
    name: "memories",
    dataType: "json",
    prefix: "memory:",
    schema: s.object({
      text: s.string(),        // full-text searchable fact
      userId: s.keyword(),     // exact-match owner
      kind: s.keyword(),       // "preference" | "event" | "fact" ...
      createdAt: s.number(),   // epoch ms, sortable
    }),
  });
} catch {
  // Index already exists, safe to ignore when re-running setup.
}

Step 2: Working (short-term) memory

Working memory is just the recent message history for a session. We store it as a single JSON value with a one-hour TTL and cap it to the last 20 messages so the prompt stays small. When the session goes quiet, Redis expires the key for us.
// memory.ts
import { Redis } from "@upstash/redis";

const redis = Redis.fromEnv();

export type Message = { role: "user" | "assistant"; content: string };

const SESSION_TTL = 60 * 60; // 1 hour
const MAX_MESSAGES = 20;

export async function loadHistory(sessionId: string): Promise<Message[]> {
  return (await redis.get<Message[]>(`chat:${sessionId}`)) ?? [];
}

export async function saveHistory(sessionId: string, messages: Message[]) {
  const trimmed = messages.slice(-MAX_MESSAGES);
  await redis.set(`chat:${sessionId}`, trimmed, { ex: SESSION_TTL });
}

Step 3: Recall relevant memories

To answer well, the agent needs the long-term facts that relate to the current message. We run a full-text query against the memories index, scoped to the user with the userId keyword. Redis Search ranks matches by relevance, so we take the top few.
const memories = redis.search.index({ name: "memories" });

export async function recall(
  userId: string,
  query: string,
  limit = 5,
): Promise<string[]> {
  const results = await memories.query({
    filter: { text: query, userId },
    limit,
  });

  // No memories yet → the index may not exist → results is null
  return (results ?? []).map((r) => r.data.text as string);
}
To bias recall toward recent memories, you can boost the score with the createdAt field using a score function, or sort with orderBy / order_by. We keep plain relevance ranking here for simplicity.

Step 4: Remember new facts

After each exchange we ask the model to pull out durable facts, the things worth remembering across sessions, not small talk. Each fact becomes a JSON document under the memory: prefix, so the index picks it up automatically. Because full-text search gives us a cheap similarity check, we deduplicate before writing: if a very similar memory already exists for this user, we skip it.
import OpenAI from "openai";

const openai = new OpenAI();

// Heuristic: full-text scores are unbounded, so this threshold is tuned by feel.
const DEDUPE_SCORE = 8;

async function alreadyKnown(userId: string, text: string): Promise<boolean> {
  const hits = await memories.query({ filter: { text, userId }, limit: 1 });
  return !!hits?.length && hits[0].score > DEDUPE_SCORE;
}

export async function remember(userId: string, conversation: Message[]) {
  const completion = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    response_format: { type: "json_object" },
    messages: [
      {
        role: "system",
        content:
          "Extract durable facts about the user worth remembering across " +
          "sessions (preferences, decisions, personal details). Ignore " +
          'small talk. Respond as JSON: {"facts": ["..."]}. Empty if none.',
      },
      { role: "user", content: JSON.stringify(conversation) },
    ],
  });

  const { facts } = JSON.parse(completion.choices[0].message.content ?? '{"facts":[]}');

  for (const text of facts as string[]) {
    if (await alreadyKnown(userId, text)) continue;
    const id = crypto.randomUUID();
    await redis.json.set(`memory:${userId}:${id}`, "$", {
      text,
      userId,
      kind: "fact",
      createdAt: Date.now(),
    });
  }
}

Step 5: The chat loop

Now we wire it together. Each turn: recall relevant memories, build a prompt from those plus the working memory, call the model, persist the updated history, and remember new facts.
export async function chat(userId: string, sessionId: string, input: string) {
  const [history, recalled] = await Promise.all([
    loadHistory(sessionId),
    recall(userId, input),
  ]);

  const system =
    "You are a helpful assistant. Use the following remembered facts about " +
    `the user when relevant:\n${recalled.map((m) => `- ${m}`).join("\n") || "(none yet)"}`;

  const completion = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [
      { role: "system", content: system },
      ...history,
      { role: "user", content: input },
    ],
  });

  const reply = completion.choices[0].message.content ?? "";

  const updated: Message[] = [
    ...history,
    { role: "user", content: input },
    { role: "assistant", content: reply },
  ];

  await saveHistory(sessionId, updated);
  await remember(userId, updated); // fire-and-forget in production

  return reply;
}

Try it

Run two sessions for the same user. Even after the first session’s working memory expires, the facts learned there are recalled in the second:
await chat("user-1", "session-a", "I'm vegetarian and I love spicy food.");

// Redis Search indexes writes asynchronously, wait so the demo is deterministic.
await memories.waitIndexing();

// ...a brand new session...
const reply = await chat("user-1", "session-b", "Suggest a dinner for me.");
console.log(reply); // recalls "vegetarian" + "spicy" from long-term memory
Redis Search indexes writes asynchronously: a JSON.SET returns before the document is searchable. For a deterministic demo or test, call waitIndexing() / wait_indexing() to block until pending updates are applied. In a real app the next user turn normally arrives later than the indexing window, so an explicit wait isn’t needed.

How it fits together

  • Working memory lives under chat:{sessionId} with a TTL: fast to read, self-expiring, scoped to one conversation.
  • Long-term memory lives under memory:{userId}:{id} and is searchable across sessions through the memories index.
  • Recall uses full-text relevance to surface the facts that matter for the current message; remember extracts and deduplicates new ones.

Next steps

  • Add a kind such as "preference" vs "event" and filter recall by it.
  • Boost recent memories with a score function.
  • Summarize older working-memory messages instead of dropping them.
  • Stream the reply to a chat UI and animate it smoothly. See Smooth Text Streaming in AI SDK v5.
  • Learn more about what Redis Search can do in the Search docs.