·6 min read

Orchestrating Reliable Agents on Upstash Workflow

Cahid Arda OzCahid Arda OzSoftware Engineer @Upstash
https://upstash.com/blog/reliable-agents-subagents

An AI agent is a loop. It calls a model, runs a tool, feeds the result back, and calls the model again. The loop is quick to prototype and awkward to run reliably on serverless, and it gets more expensive the longer it runs. This post covers how Upstash Workflow makes that loop durable, and how subagents keep it from getting expensive.

There is a working demo behind everything below: examples/agent-workflows in the workflow-js repo.

Why Upstash Workflow for agents

A multi-agent run has two failure modes on serverless. It can take longer than a function is allowed to run, since a multi-step agent often needs minutes while functions are measured in seconds. And any single step can fail on a transient provider error or a rate limit, which loses the whole run.

Upstash Workflow turns the loop into a set of checkpointed steps that QStash orchestrates. Each step you wrap survives across invocations, so the loop is never held open inside one function call:

  • context.run executes a piece of work once and stores its result. On a retry or resume it replays the stored value instead of running again.
  • context.call hands an HTTP request, such as the call to your model provider, to QStash. Your function returns right away. QStash makes the request, applies retries, timeouts, and flow control, then calls you back with the response. The model call stops counting against your function's runtime, and rate limits become a setting rather than an incident.

So you get agents that resume after failures, avoid function timeouts, and stay within provider rate limits, without writing that logic yourself.

@upstash/workflow-agents

You do not have to wire those primitives into an agent by hand. @upstash/workflow-agents (source) takes the AI SDK and makes it durable.

The mechanism is small. The package keeps the AI SDK's generateText loop, but it overrides the model's fetch so each model request goes through context.call, and it wraps each tool's execute in context.run. Your model calls and tool executions become durable steps, and you still write the agent the way you would with the AI SDK: a model, a system prompt, and a set of tools.

The thread keeps growing

Several users reported the same problem. An agent's state is its message history, and that history grows with every turn. Because each model call replays the whole conversation, two costs climb together as the agent works.

The first is bandwidth. Each durable step carries the accumulated messages through QStash, so a longer history means larger workflow payloads on every step. The second is tokens. Every model call re-sends the full transcript as input, so a twelve-step agent pays for its early messages a dozen times.

For one agent doing one focused job this is fine. It becomes a problem when you ask a single agent to research, reason, and write, because its context grows on each step and you pay for that growth on every step that follows.

Subagents and the orchestrator-worker setup

The fix is to split the work instead of holding it in one thread. A small orchestrator hands self-contained subtasks to workers, and each worker runs in its own context and returns a short result. The orchestrator never sees a worker's intermediate reasoning, only its answer, so the main thread stays small.

This is the orchestrator-workers pattern, and it fits Workflow directly. Each agent is its own workflow, and the orchestrator delegates with context.invoke. A worker's long transcript stays inside the worker's own run. It never enters the orchestrator's payloads or token count.

serveAgents

Workflow already lets you serve several workflows from one endpoint with serveMany. In the demo we wrapped it in a small SDK, defineAgent and serveAgents, so a multi-agent system reads like configuration.

You define each agent with its input schema and the agents it may delegate to:

const researcher = defineAgent({
  name: "researcher",
  description: "Gathers key facts about a topic.",
  input: z.object({ topic: z.string() }),
  background: "You are a thorough research assistant…",
});
 
const writer = defineAgent({
  name: "writer",
  description: "Turns research notes into polished prose.",
  input: z.object({ brief: z.string() }),
  background: "You are a skilled writer…",
});
 
const orchestrator = defineAgent({
  name: "orchestrator",
  description: "Coordinates research and writing.",
  input: z.object({ request: z.string() }),
  background: "Delegate to the researcher, then the writer…",
  subagents: [researcher, writer], // become typed context.invoke tools
});

defineAgent wraps createWorkflow and the agents runtime, and it turns each entry in subagents into a tool that delegates through context.invoke. You then serve all of them from one route:

export const { POST, trigger } = serveAgents({
  baseUrl: "https://your-app.com/api/agents",
  agents: [orchestrator, researcher, writer],
});

One serveAgents call gives you a single endpoint where each agent is addressable by name and agents can invoke each other. It is what serveMany already does, written in terms of agents.

A type-safe trigger

You normally start a workflow with client.trigger, which takes a URL and an untyped body. Nothing stops you from triggering the wrong route or sending the wrong shape.

serveAgents returns a trigger function that closes that gap. It accepts only known agent names, and it validates the input against that agent's Zod schema, at compile time and again at runtime, before dispatching:

// name is checked, { request } is validated against the orchestrator's schema
const { workflowRunId } = await trigger("orchestrator", {
  request: "Explain why the sky is blue.",
});
 
// caught at compile time: unknown agent and wrong input shape
await trigger("orchstrator", { topik: "..." });

The same schema guards both ends. The caller validates before sending, and the agent validates the payload it receives.

Watching it run

A long agent run is opaque while it happens, so the demo streams it. Each agent's plan, tool calls, and final answer show up in the browser as they run.

Live multi-agent run in the demo app

This view comes from Upstash Realtime, backed by Upstash Redis. A built-in log tool lets each agent post short notes, and a workflow middleware emits start, step, and finish events (see the Workflow + Realtime guide). Both go to a channel named after the run id, so every agent shares one feed. Realtime keeps the events in Redis streams and pushes them to the browser over SSE, where a typed useRealtime hook renders them as they arrive. The UI is in the example app.

What's next

defineAgent and serveAgents live in the example repo for now. They are a thin layer over @upstash/workflow-agents and serveMany that you can copy into your own app today. They also point at two changes we are considering: an agent-oriented API in the workflow-agents package, and the type-safe trigger pattern in the core Workflow package, so that client.trigger and serveMany get the same checks.