Orchestrating Reliable Agents on Upstash Workflow
An AI agent is a loop. It calls a model, runs a tool, feeds the result back, and calls the model again. The loop is quick to prototype and awkward to run reliably on serverless, and it gets more expensive the longer it runs. This post covers how Upstash Workflow makes that loop durable, and how subagents keep it from getting expensive.
There is a working demo behind everything below:
examples/agent-workflows
in the workflow-js repo.
Why Upstash Workflow for agents
A multi-agent run has two failure modes on serverless. It can take longer than a function is allowed to run, since a multi-step agent often needs minutes while functions are measured in seconds. And any single step can fail on a transient provider error or a rate limit, which loses the whole run.
Upstash Workflow turns the loop into a set of checkpointed steps that QStash orchestrates. Each step you wrap survives across invocations, so the loop is never held open inside one function call:
context.runexecutes a piece of work once and stores its result. On a retry or resume it replays the stored value instead of running again.context.callhands an HTTP request, such as the call to your model provider, to QStash. Your function returns right away. QStash makes the request, applies retries, timeouts, and flow control, then calls you back with the response. The model call stops counting against your function's runtime, and rate limits become a setting rather than an incident.
So you get agents that resume after failures, avoid function timeouts, and stay within provider rate limits, without writing that logic yourself.
@upstash/workflow-agents
You do not have to wire those primitives into an agent by hand.
@upstash/workflow-agents
(source) takes the
AI SDK and makes it durable.
The mechanism is small. The package keeps the AI SDK's generateText loop, but
it overrides the model's fetch so each model request goes through
context.call, and it wraps each
tool's execute in context.run.
Your model calls and tool executions become durable steps, and you still write
the agent the way you would with the AI SDK: a model, a system prompt, and a set
of tools.
The thread keeps growing
Several users reported the same problem. An agent's state is its message history, and that history grows with every turn. Because each model call replays the whole conversation, two costs climb together as the agent works.
The first is bandwidth. Each durable step carries the accumulated messages through QStash, so a longer history means larger workflow payloads on every step. The second is tokens. Every model call re-sends the full transcript as input, so a twelve-step agent pays for its early messages a dozen times.
For one agent doing one focused job this is fine. It becomes a problem when you ask a single agent to research, reason, and write, because its context grows on each step and you pay for that growth on every step that follows.
Subagents and the orchestrator-worker setup
The fix is to split the work instead of holding it in one thread. A small orchestrator hands self-contained subtasks to workers, and each worker runs in its own context and returns a short result. The orchestrator never sees a worker's intermediate reasoning, only its answer, so the main thread stays small.
This is the
orchestrator-workers pattern,
and it fits Workflow directly. Each agent is its own workflow, and the
orchestrator delegates with
context.invoke. A worker's
long transcript stays inside the worker's own run. It never enters the
orchestrator's payloads or token count.
serveAgents
Workflow already lets you serve several workflows from one endpoint with
serveMany. In the
demo we wrapped it in a small SDK, defineAgent and serveAgents, so a
multi-agent system reads like configuration.
You define each agent with its input schema and the agents it may delegate to:
const researcher = defineAgent({
name: "researcher",
description: "Gathers key facts about a topic.",
input: z.object({ topic: z.string() }),
background: "You are a thorough research assistant…",
});
const writer = defineAgent({
name: "writer",
description: "Turns research notes into polished prose.",
input: z.object({ brief: z.string() }),
background: "You are a skilled writer…",
});
const orchestrator = defineAgent({
name: "orchestrator",
description: "Coordinates research and writing.",
input: z.object({ request: z.string() }),
background: "Delegate to the researcher, then the writer…",
subagents: [researcher, writer], // become typed context.invoke tools
});defineAgent wraps
createWorkflow
and the agents runtime, and it turns each entry in subagents into a tool that
delegates through context.invoke. You then serve all of them from one route:
export const { POST, trigger } = serveAgents({
baseUrl: "https://your-app.com/api/agents",
agents: [orchestrator, researcher, writer],
});One serveAgents call gives you a single endpoint where each agent is
addressable by name and agents can invoke each other. It is what serveMany
already does, written in terms of agents.
A type-safe trigger
You normally start a workflow with
client.trigger, which
takes a URL and an untyped body. Nothing stops you from triggering the wrong
route or sending the wrong shape.
serveAgents returns a trigger function that closes that gap. It accepts only
known agent names, and it validates the input against that agent's
Zod schema, at compile time and again at runtime, before
dispatching:
// name is checked, { request } is validated against the orchestrator's schema
const { workflowRunId } = await trigger("orchestrator", {
request: "Explain why the sky is blue.",
});
// caught at compile time: unknown agent and wrong input shape
await trigger("orchstrator", { topik: "..." });The same schema guards both ends. The caller validates before sending, and the agent validates the payload it receives.
Watching it run
A long agent run is opaque while it happens, so the demo streams it. Each agent's plan, tool calls, and final answer show up in the browser as they run.

This view comes from Upstash Realtime,
backed by Upstash Redis. A built-in log tool
lets each agent post short notes, and a workflow middleware emits start, step, and
finish events (see the
Workflow + Realtime guide).
Both go to a channel named after the run id, so every agent shares one feed.
Realtime keeps the events in Redis streams and pushes them to the browser over
SSE, where a typed useRealtime hook renders them as they arrive. The UI is in
the example app.
What's next
defineAgent and serveAgents live in the
example repo
for now. They are a thin layer over
@upstash/workflow-agents
and serveMany
that you can copy into your own app today. They also point at two changes we are
considering: an agent-oriented API in the
workflow-agents package, and the
type-safe trigger pattern in the core
Workflow package, so that
client.trigger and
serveMany get the same checks.
Links
- Demo: examples/agent-workflows
- Package:
@upstash/workflow-agents - Patterns: orchestrator-workers, serveMany
- Steps: context.run, context.call
