·8 min read

AI SDK Powered by Upstash

Ali Tarık ŞahinAli Tarık ŞahinSoftware Engineer @Upstash

AI SDK v5 is out, and together with this new release it already took its place at the top three among the most popular libraries.

Personally, I am not an AI expert, I only use such models to realize the ideas that come to my mind from time to time. So I don't want to waste time on the hassle of configurations, settings, etc.

That's why the more abstraction they put, the more useful and production-ready tools we have. In this sense, they did a great job standardizing the interaction with AI models, and with the newest release enhanced the developer experience.

Speaking of abstractions and developer experience, Upstash products are also quite handy and focused on hiding the complexity from developers, providing cool SDKs.

Here in this blog, we're gonna go through how Upstash products go hand in hand with the AI SDK, and how to make the best out of AI SDK utilizing those.

Let's start!


1. Cache Everything (Upstash Redis)

The simplest win? Stop paying for the same response twice. AI SDK v5 has built-in caching via lifecycle callbacks, and it works perfectly with Upstash Redis.

Working with LLM calls, most of the time, means that you will have some time-consuming snippets in your code. This might come to unbearable points depending on the purpose of your application.

At that point, caching comes into play to mitigate the costs of such calls, and with a couple of extra lines, Upstash Redis will handle the caching for you, saving you from unnecessary calls when your project scales up.

Here is an example from AI SDK docs where you cache a one-hour-long LLM call:

app/api/chat/route.ts
import { openai } from "@ai-sdk/openai";
import { Redis } from "@upstash/redis";
import { convertToModelMessages, formatDataStreamPart, streamText } from "ai";
 
export const maxDuration = 30;
 
const redis = new Redis({
  url: process.env.UPSTASH_REDIS_REST_URL!,
  token: process.env.UPSTASH_REDIS_REST_TOKEN!,
});
 
export async function POST(req: Request) {
  const { messages } = await req.json();
 
  // Simple cache key - hash this for production
  const key = `chat:${JSON.stringify(messages)}`;
 
  // Check cache first
  const cached = await redis.get<string>(key);
  if (cached) {
    // Return cached response as a stream
    return new Response(formatDataStreamPart("text", cached), {
      headers: { "Content-Type": "text/plain" },
    });
  }
 
  // Not cached - stream from provider and save result
  const result = streamText({
    model: openai("gpt-4o"),
    messages: convertToModelMessages(messages),
    async onFinish({ text }) {
      await redis.set(key, text, { ex: 3600 }); // 1 hour TTL
    },
  });
 
  return result.toUIMessageStreamResponse();
}

Pro tip: Use smarter cache keys. Hash the messages, include the model name, and remember Redis functions are not restricted to set and get.


2. Rate Limiting (Upstash Rate Limit)

Nothing kills an AI app faster than hitting rate limits. Add protection at multiple levels:

Nowadays, rate limiting is a must-do in case your app goes beyond your expectation and gets viral. It is simple to implement and important to have control over your API routes.

Plus, Upstash Ratelimit has more features than just rate limiting. Here are some examples:

  • Caching: Blocking those who exceeded their limits on the spot
  • Timeout: Mitigate network issues, allowing users to pass regardless after some time
  • Analytics & Dashboard: Monitor your endpoint

And here is an example from AI SDK docs to showcase Upstash Ratelimit usage:

app/api/generate/route.ts
import { openai } from "@ai-sdk/openai";
import { Ratelimit } from "@upstash/ratelimit";
import kv from "@vercel/kv";
import { streamText } from "ai";
import { NextRequest } from "next/server";
 
// Allow streaming responses up to 30 seconds
export const maxDuration = 30;
 
// Create Rate limit
const ratelimit = new Ratelimit({
  redis: kv,
  limiter: Ratelimit.fixedWindow(5, "30s"),
});
 
export async function POST(req: NextRequest) {
  // call ratelimit with request ip
  const ip = req.ip ?? "ip";
  const { success, remaining } = await ratelimit.limit(ip);
 
  // block the request if unsuccessful
  if (!success) {
    return new Response("Ratelimited!", { status: 429 });
  }
 
  const { messages } = await req.json();
 
  const result = streamText({
    model: openai("gpt-3.5-turbo"),
    messages,
  });
 
  return result.toUIMessageStreamResponse();
}

Grant your LLM models access to any knowledge base you provide.

Providing a search component as a tool to models gives you the flexibility to utilize all Upstash Search facilities. Considering the semantic search capability, your LLM model will be able to fetch the most accurate results and will stop hallucinating.

Here's an example use case where a chatbot learns and remembers:

app/api/chat/route.ts
import { openai } from "@ai-sdk/openai";
import { Search } from "@upstash/search";
import {
  convertToModelMessages,
  stepCountIs,
  streamText,
  tool,
  UIMessage,
} from "ai";
import { z } from "zod";
 
const client = Search.fromEnv();
const index = client.index("knowledge-base");
 
export async function POST(req: Request) {
  const { messages }: { messages: UIMessage[] } = await req.json();
 
  const result = streamText({
    model: openai("gpt-4o"),
    messages: convertToModelMessages(messages),
    stopWhen: stepCountIs(5),
    tools: {
      addResource: tool({
        description: "add a resource to your knowledge base",
        inputSchema: z.object({
          resource: z
            .string()
            .describe("the content or resource to add to the knowledge base"),
        }),
        execute: async ({ resource }) => {
          index.upsert({
            id: crypto.randomUUID(),
            content: { resource },
          });
        },
      }),
      getInformation: tool({
        description:
          "get information from your knowledge base to answer questions",
        inputSchema: z.object({
          query: z.string().describe("the users question"),
        }),
        execute: async ({ query }) => index.search({ query }),
      }),
    },
  });
 
  return result.toUIMessageStreamResponse();
}

The magic is that the AI decides when to use these tools automatically. User mentions a new concept? It stores it. Asks about something from last week? It searches and finds it.

Pro tip: Upstash Search scales to massive datasets (they've indexed all of Wikipedia in 7 languages), so don't worry about hitting limits. Start small, let it grow.


4. Resumable Streams (Upstash Workflow & Redis)

Ever had your AI chat break mid-conversation because of a network hiccup or page refresh? Traditional LLM streams die when connections drop, forcing users to restart expensive generations.

The solution? Build streams that survive anything - network outages, page refreshes, even closing your laptop. This creates an incredible user experience where conversations continue seamlessly no matter what happens.

Here's how to build truly durable LLM streams using Upstash Workflow & Redis and and the AI SDK:

app/api/llm-stream/route.ts
import {
  MessageType,
  StreamStatus,
  type ChunkMessage,
  type MetadataMessage,
} from "@/lib/message-schema";
import { redis } from "@/utils";
import { openai } from "@ai-sdk/openai";
import { serve } from "@upstash/workflow/nextjs";
import { streamText } from "ai";
 
interface LLMStreamResponse {
  success: boolean;
  sessionId: string;
  totalChunks: number;
  fullContent: string;
}
 
export const { POST } = serve(async (context) => {
  const { prompt, sessionId } = context.requestPayload as {
    prompt?: string;
    sessionId?: string;
  };
 
  if (!prompt || !sessionId) {
    throw new Error("Prompt and sessionId are required");
  }
 
  const streamKey = `llm:stream:${sessionId}`;
 
  await context.run("mark-stream-start", async () => {
    const metadataMessage: MetadataMessage = {
      type: MessageType.METADATA,
      status: StreamStatus.STARTED,
      completedAt: new Date().toISOString(),
      totalChunks: 0,
      fullContent: "",
    };
 
    await redis.xadd(streamKey, "*", metadataMessage);
    await redis.publish(streamKey, { type: MessageType.METADATA });
  });
 
  const res = await context.run("generate-llm-response", async () => {
    const result = await new Promise<LLMStreamResponse>(
      async (resolve, reject) => {
        let fullContent = "";
        let chunkIndex = 0;
 
        const { textStream } = streamText({
          model: openai("gpt-4o"),
          prompt,
          onError: (err) => reject(err),
          onFinish: async () => {
            resolve({
              success: true,
              sessionId,
              totalChunks: chunkIndex,
              fullContent,
            });
          },
        });
 
        for await (const chunk of textStream) {
          if (chunk) {
            fullContent += chunk;
            chunkIndex++;
 
            const chunkMessage: ChunkMessage = {
              type: MessageType.CHUNK,
              content: chunk,
            };
 
            await redis.xadd(streamKey, "*", chunkMessage);
            await redis.publish(streamKey, { type: MessageType.CHUNK });
          }
        }
      },
    );
 
    return result;
  });
 
  await context.run("mark-stream-end", async () => {
    const metadataMessage: MetadataMessage = {
      type: MessageType.METADATA,
      status: StreamStatus.COMPLETED,
      completedAt: new Date().toISOString(),
      totalChunks: res.totalChunks,
      fullContent: res.fullContent,
    };
 
    await redis.xadd(streamKey, "*", metadataMessage);
    await redis.publish(streamKey, { type: MessageType.METADATA });
  });
});

What else can be achieved with workflow? As you know, serverless platforms like Vercel, Cloudflare, or others come with their limitations—you cannot run a task or function forever, right? You have to stop somewhere.

Upstash Workflow gives you a workaround for long-running tasks such as these LLM calls. Your application might be working on pipelined several calls, but if you run all of them continuously, you may face serverless limitations. But Upstash Workflow overcomes this issue and provides you a smooth experience in the integration.

You feel like you're building your API as usual, trusting Workflow to handle the rest. Check out the Agents API to see how Upstash Workflow and AI agents can work in harmony.


Wrapping Up

Building AI apps is the easy part. Making them production-ready? That's where most people get stuck.

AI SDK v5 handles the AI complexity—streaming, tool calling, multi-model support. Upstash handles the infrastructure complexity—caching, persistence, search, and workflows.

Together, they let you focus on what you want to build.

The result? AI apps that are fast, reliable, cost-effective, and actually work when real users hit them. No PhD in distributed systems required.

Start simple, add what you need, ship faster.


Further Reading

Want to dive deeper? Here are some resources to level up your AI app:

Resources: