March 13, 2026·5 min read

I Built 5 AI Agents to Review My PRs

Ilter KavlakSite Reliability Engineer @Upstash

We recently launched Upstash Box — a cloud computer for your agents with durable storage, serverless scaling, and usage-based pricing. Since day one of building the backend, the team has been pretty fired up about it.

Having Box changed how fast I can prototype. I can go from idea to working multi-agent flow much faster now. I could do that before too, but honestly I would postpone half of them.

I don't think agents are magic, but they are very useful on tired days. Some days are long, full of incidents and context switching, and you still need to ship.

And guess what? Bugs are watching us from the edge cases.

I care a lot about not shipping avoidable bugs. The problem is that after a long day, I miss things I normally would catch.

So I wrote my own PR review bot and called it Nitpick.

Why I built Nitpick

Agents were already part of my workflow for reviews, even for self-review. The first step was always the same: throw the change at multiple agents, compare outputs, validate findings, then manually triage everything.

This already helped a lot:

Fewer wrong findings
Better coverage of "what can go wrong?"
A second (third, fourth) brain when mine is tired

But the process itself was the problem. I was doing the same thing every time and it was boring me to death.

So I automated it. Mostly for myself.

Let me introduce you to Nitpick.

What Nitpick does

Nitpick runs a full PR review arena from your terminal.

You give it a GitHub PR, and it spins up multiple reviewer roles in parallel, runs scanners, verifies findings, lets you triage them, and gives you a final verdict with a merge recommendation.

It comes with:

5 AI reviewer roles: security, performance, architecture, testing, dx
3 automated scanners: secrets, linter, dependencies
PR summary and walkthrough before findings
A separate verifier agent to confirm/adjust/reject findings
Triage flow to accept or dismiss each finding
Markdown report with blockers, risk score, suggested commits
Optional GitHub PR review comments (--post-review)

So instead of review chaos, you get a repeatable flow.

The flow I wanted (and now have)

Pick repository and PR interactively (or pass the PR URL directly)
Choose reviewer roles
Run all reviewers and scanners in parallel
Read AI summary of what changed and where risk is concentrated
Triage findings one by one
Generate final verdict: merge / merge with caution / block
Optionally post review comments back to GitHub

That is basically it. Fast and repeatable.

How Nitpick uses Upstash Box

The hard part of running five reviewers at once is not the AI calls. It is giving each one its own isolated environment where it can clone the repo, read files, run linters, and do its thing without stepping on the others. That is what Box handles.

Each reviewer role gets its own Box. The security reviewer is digging through auth flows in one container while the performance reviewer is profiling hot paths in another. They do not share state, they do not block each other, and when they are done, their findings get collected and passed to the verifier.

The nice part is I did not have to think about any of the infra. Box is serverless — containers spin up when Nitpick needs them and go away when the review is done. I do not manage instances, I do not pay for idle time.

Nitpick is stateless between runs for now. Each review starts fresh, produces a markdown report, and optionally posts comments back to GitHub. But Box supports durable storage, so persisting review history across runs is something I want to explore next.

Why this matters to me

Nitpick is not about replacing engineering judgment. It is about backing me up when energy is low and I am rushing.

After a long day, your brain misses edges. Nitpick keeps looking anyway.

And the best part: it fits exactly how I already work. I was already inviting multiple agents into the process. I just stopped doing it manually and turned it into a tool.

Now the checks are done before I even think about running them.

I also added tiny graphics to make the process more fun than watching lines of text flow in your terminal. They do not add extra functionality, but they add a bit of joy.

Closing

If you also feel bugs are hiding in edge cases, Nitpick might be useful for you.

I built it because I wanted better review quality without adding more mental overhead. If it helps other teams ship safer and faster, even better.

My laziness still exists. Now it just has better tooling around it.

ai developer-tools code-review