Why Your Code Reviews Are Too Nice (And How to Fix That)
By Paul Preibisch
New here? What is BMAD?
BMAD Method is a free, open-source AI workflow framework for Claude Code. Instead of asking one AI everything, BMAD gives you a cast of specialized AI personas — each with a distinct role, perspective, and agenda — that work through problems together. You install it once with npx bmad-method@latest install and then invoke tools like this one with a slash command in Claude Code.
When a single AI reviews your code, it's one mind playing every role at once — security engineer, pragmatist, craftsman, adversary — and internally negotiating all of them before it speaks. The result is balanced, reasonable, and slightly soft. That's not a code review, it's a compromise. BMAD's Code Review Crew puts five genuinely different minds on your PR, each blind to the others, and lets them disagree.
The best first command
💬 Claude Code Chat ⭐ Best first run — independent agents, no consensus driftone real agent per reviewer, blind to the others
That opens five reviewers on your codebase. Here's what a session looks like after pointing them at an authentication PR:
🔒 Vex — Security Engineer
The JWT secret is pulled from process.env.JWT_SECRET at line 47, but there's no validation it's set before startup. An empty string means every token signed with "" is technically valid. I can forge one right now that passes this check.
😤 Grumbal — The Adversary
And the refresh token rotation at line 89 doesn't invalidate the old token on issue. If someone lifts a refresh token before it's used, both copies work until expiry. That's not a rotation, that's a copy.
🌶️ Boundary — Edge-Case Hunter
What happens if userId in the token payload is null? Line 112 passes it straight to the DB query with no null check. That's either a crash or an injection path depending on how your ORM handles null params.
🎯 Yui — The Craftsman
You've got validateToken() and verifyJWT() doing overlapping work — one calls the other but both have independent error-handling paths. Pick one.
🚢 Dana — The Pragmatist
Vex's startup validation is a must-ship fix, agreed. The refresh token issue is real but narrow — requires stealing a token in a specific window. File it. The null userId is a one-liner. Yui's refactor waits for tech debt week.
That's a code review. Five genuinely different minds, each attacking from their own angle, then arguing about what actually ships versus what gets filed. In subagent mode, each reviewer only sees your message — not what the others said. Their responses are woven together afterward. No single AI produces this — one mind can't simultaneously assume the code is broken and advocate for shipping the 80%.
The five reviewers
These aren't personality labels slapped on the same model. Each one has a distinct mandate and a different failure mode they're trained to find:
🔒
Security Engineer
Vex
Threat-models everything. Hunts injection, broken authz, leaked secrets, SSRF, supply-chain risk. Assumes every input is hostile and every dependency compromised until proven otherwise. Names the exploit path concretely — "here's how I'd own this box" — never hand-waves "might be insecure."
😤
The Adversary
Grumbal
Assumes the code is broken and his job is to prove it. Grumpy, blunt, zero praise sandwiches. Starts from "this will page someone at 3am" and works backward to the line that does it. Allergic to optimism and "should be fine."
🌶️
Edge-Case Hunter
Boundary
Walks every branch and boundary. Empty input, null, the off-by-one, the huge payload, the concurrent call, the unicode name, the timezone, the retry storm. Method-driven, not mean: "what happens when this is called twice at once?"
🎯
The Craftsman
Yui
Cares about simplicity, naming, and reuse. Allergic to cleverness and duplication. "You reimplemented something that already exists," "this name lies about what it does," "three nested abstractions where one would do." Wants the boring, obvious, maintainable version.
🚢
The Pragmatist
Dana
Counters the perfectionists so the room isn't a pile-on. "Does this actually matter to a user? Ship the 80%, file the rest." Pushes back on gold-plating and theoretical risks, forces everyone to rank what's real versus what's a nit.
Why five matters
Dana exists specifically to stop the others from blocking every PR. Without her, a crew of Vex, Grumbal, Boundary, and Yui would generate a backlog that never ships. The tension between Dana and the rest is the point — it forces prioritization, not just discovery.
Four modes — pick the right one
The same --party code-review-crew runs in four modes. The difference is whether each reviewer gets their own isolated agent or shares one mind:
session
One mind voices every persona inline. Zero overhead, fast, conversational. Good for quick takes and ideation — the default when you don't specify a mode.
auto
Voices inline for banter, spawns real independent agents when genuine divergence matters. BMAD decides when independent thinking changes the outcome.
subagent ⭐ recommended
Spawns a real agent for every substantive round — the opening banter included — so each persona thinks independently, not one mind voicing them all. Each agent only sees your message, not what the others said. Responses are woven together afterward. Best for code review.
agent-team
Persistent team that addresses each other directly. Claude Code only, highest fidelity, highest cost. The reviewers argue with each other, not just at you.
The other three commands, for when you want something faster:
The Code Review Crew finds problems in code you've already written. bmad-forge-idea finds the fatal crack in an idea before you write a line — when changing your mind is free.
It opens cold. No validation, no reassurance. You state the idea and the goal, and two voices join each branch of the interrogation:
🎭
Voice 1
From your roster
One persona from your installed BMAD agent pool or custom personas. Rotates every few turns — you get different angles as the interrogation deepens.
🔮
Voice 2
Conjured for this branch
A fresh persona the topic conjures — a hostile competitor, a skeptical CFO, a domain specialist, a historical persona or expert. Generated on the fly for the specific crack being tested.
Invoke it: type /bmad-forge-idea then describe your idea.
You
I want to add a free tier so more people try the product. Goal: harden the idea.
🔮 The Forge
Before we build the case — who is "more people"? Free tiers attract volume, not necessarily the buyers you want. What's the conversion rate you're assuming from free to paid, and what data do you have for that number?
🧑💼 CFO (generated for this branch)
Free users don't convert, they complain. Your support load triples, infrastructure costs climb, and the people who email you most will never pay. What's your cost per free user per month, and what's your payback model?
😤 Grumbal (from your roster)
You haven't defined what "free" means. Free-forever? Free trial? Feature-limited? These are completely different products with different conversion economics. You're not pressure-testing an idea, you're pressure-testing a category.
Two turns in and you've surfaced three unresolved questions: who you're targeting, what the conversion assumption is, and what "free tier" even means. Finding those in conversation costs nothing. Finding them after you've built the billing system costs weeks.
Three verdicts
The session ends with an HTML report stamped with one of three verdicts:
HARDENED
The idea survived the interrogation. The report includes a forged-idea.md — the hardened version with locked assumptions, surviving cracks, and what was stress-tested. A genuine artifact, not a templated summary.
KILLED
The idea did not survive. The report names the cause of death specifically — not "insufficient market validation" but the exact assumption that collapsed under pressure. Finding this cheaply is a win, not a failure.
Clearer
The idea wasn't killed and isn't hardened — but you think straighter now. The memlog stands as a record of the interrogation. No forged-idea.md needed; the clarity was the output.
What the HTML report contains
The verdict stamp with an inline SVG seal. Locked items — the assumptions that held under pressure. What was killed and why, in plain language. Cracks that held and the argument that sealed them. Persona credits by name, icon, and voice — a genuine keepsake, not a templated dump.
More in the BMAD 6.9.0 release
The Code Review Crew is one of five tools that shipped in 6.9.0. Each gets its own deep-dive.
Run a sprint's worth of PRs through the Code Review Crew with /bmad-party-mode --party code-review-crew --mode subagent. After a few sessions Grumbal has a running debt list, Vex has a map of your weak spots, Dana has a baseline for what actually ships versus what gets filed. That's a code review process worth having. For anything you're about to build first: run /bmad-forge-idea and let the CFO tell you why your conversion assumptions don't work before you build the billing system.
Comments
Loading comments…
Leave a comment
Want to work together?
If something here resonated, let's talk. I help teams build AI systems and automate workflows.
Comments
Loading comments…
Leave a comment