the landscape
Where this sits.
Every major spec-driven workflow converges on the same staged pipeline — what differs is what the gate between stages actually is.
human approval
A person clicking approve before the next stage runs.
LLM-run checklist
A checklist the LLM grades itself against.
exit-code on existence or schema
A script that fails when files are missing or malformed.
exit-code on content + evidence
A script that checks what’s in them — claims, citations, evidence.
Spec-driven development is a named, crowded category: a landscape tracker counts 30+ frameworks, Martin Fowler has a series on it, and the convergence claim above — research, plan, execute, review, ship — is the community’s own comparison’s conclusion, not this page’s. Blueprint’s seven stages are not the novelty — the question this read asks is which gate kind sits between anyone’s stages.
This is a source-level read of that question, sources read 2026-06-11: gate implementations from the open repos, vendor docs for the commercial tools (Kiro, Tessl), and one row — Superpowers — tabled from community data only, flagged as such. Star counts are approximate and point-in-time. If a row misrepresents a tool, file an issue and it gets fixed.
| Tool | Gate type | What its gate actually is | What it does best | Scope |
|---|---|---|---|---|
| GitHub Spec Kit ~111k stars | EXIT-CODE: EXISTENCEHUMAN APPROVAL | Existence script; human approve prompt; LLM checklists | Reach — the most community-adopted SDD tool; 30+ agent integrations, GitHub’s backing | Spec → code pipeline |
| BMAD-METHOD ~49k stars | LLM CHECKLIST | LLM-run checklists and agent handoffs | Breadth — the only other tool spanning research, briefs, and PRDs through implementation | Research → code pipeline |
| GSD (Get Shit Done) ~64k stars | LLM CHECKLISTEXIT-CODE: PARTIAL | Formal gate taxonomy, mostly LLM-enforced; a few exit-code scripts | The adversarial-verifier stance — the strongest distrust-the-agent posture in the field | Claude Code workflow |
| OpenSpec ~54k stars | EXIT-CODE: SCHEMA | CLI validator on spec structure; explicitly anti phase-gate | Visible dogfooding — its own changes/archive/ is a receipts trail; strong artifact discipline | Change-artifact pipeline |
| AWS Kiro commercial, GA | HUMAN APPROVAL | Property tests code-vs-requirements; stage flow IDE-guided human approval | Property-based testing — EARS requirements become an executable specification | Commercial IDE |
| Superpowers ~224k stars | UNVERIFIED | Not read at source level this pass | Distribution — on the official Claude marketplace since Jan 2026 | Claude Code skills |
| PRP (PRPs-agentic-eng) ~2.2k stars | CODE-QUALITY LOOP | Loops until green — type-check, lint, tests, build | Executable validation embedded in the artifact — closest philosophical neighbor here | Implementation slices |
| ruflo (Claude Flow) / SPARC ~59k stars | AGENT-JUDGED | Agent-evaluated criteria between phases, stored in memory tables | Orchestration scale — swarms, adaptive memory, plugins; gate results stored for traceability | Agent meta-harness |
| spec-workflow-mcp ~4.2k stars | HUMAN APPROVAL | Dashboard approval — genuinely blocks until approved | The only other tool here where “blocked until approved” is enforced | MCP server + dashboard |
| Taskmaster ~27k stars | NONE | None — task ordering, no stage blocking | Task decomposition — PRD into dependency-ordered tasks, plus a dedicated research model | Task layer in editors |
| Agent OS ~4.8k stars | NONE | None — different job | Standards — extracts the conventions already in your codebase and injects them contextually | Context / standards injection |
| SuperClaude ~23k stars | NONE | LLM self-scored confidence thresholds — not executable, not a gate | Configuration breadth — commands, cognitive personas, modes, MCP integrations in one framework | Claude Code configuration framework |
| Tessl commercial | HUMAN APPROVAL | Human approval between spec and implementation | The Spec Registry — “npm for specs,” aimed at API hallucination | Spec registry + framework |
| Blueprint (this site) this repo | EXIT-CODE: CONTENT | Exit-code Node scripts on artifact content + evidence; fact-check stage | Self-application receipts — the demo and the methodology pages are its own output. Caveats in the footnote | Product pipeline |
Also in the field, read but not tabled: Spec Kitty (~1.3k stars, spec → kanban → auto-merge CLI), Augment Intent (commercial “living specs” platform with a pre-PR Verifier), and “Hermes” (landscape chatter; no substantive repo surfaced). Named on the community’s shortlist but not read this pass: gstack (~109k stars). Sources read 2026-06-11.
Row by row, with sources.
GitHub Spec Kit
scripts/bash/check-prerequisites.sh exits non-zero when required artifacts (spec.md, plan.md, tasks.md) are missing — a real exit-code gate, on file existence. The workflow-engine gate step (src/specify_cli/workflows/steps/gate/__init__.py) is an interactive human approve/reject prompt that falls back to PAUSED in CI. /speckit.checklist and /speckit.analyze are LLM-interpreted markdown — “unit tests for English,” in its own docs’ framing. The full pipeline: constitution → specify → clarify → plan → tasks → analyze → implement.
BMAD-METHOD
v6 ships 12+ role agents (PM, Architect, Dev, UX), 34+ workflows, and web bundles for upfront research and PRD planning — the only tool in this read with product-pipeline breadth. Handoffs between agents are file-based and convention-enforced; checklists are prompts the agents run. No executable enforcement found in this read. The full pipeline: research → brief → PRD → architecture → stories → code.
GSD (Get Shit Done)
get-shit-done/references/gates.md defines a four-type gate taxonomy — pre-flight artifact checks, revision loops bounded at 3 iterations with stall detection, escalation to human, abort — with a matrix mapping phases to artifacts. agents/gsd-verifier.md instructs verifiers to falsify the SUMMARY.md narrative — “Do NOT trust SUMMARY.md claims” — goal-backward. Most gates run LLM-side; a few real Node exit-code scripts exist (bin/lib/ui-safety-gate.cjs documents exit 0/1/2). Verification is code-vs-claims, not document-vs-sources. Scope: requirements → plan → execute → verify, inside Claude Code.
OpenSpec
openspec validate --all --strict --json is a real, CI-runnable validator — of spec structure and schema, not evidence. The README explicitly positions against rigid phase gates; /opsx:verify (added 2026-02) is LLM-driven implementation-vs-spec checking. Its own repo carries openspec/changes/archive/ with dated change folders — the only visible dogfooding trail in this read, and it goes unmarketed. The artifact chain: proposal → spec deltas → tasks → archive.
AWS Kiro
The GA headline is property-based testing: EARS-notation requirements are translated into an executable specification, generating property tests that check the produced code against the spec (kiro.dev/docs/specs/correctness). The most credible executable-verification story in this read — and it verifies code against requirements, not stage progression; stage flow is IDE-guided with implicit human approval. Spec artifacts: requirements.md / design.md / tasks.md → code.
Superpowers
Not read at source level this pass — the row reflects the GitHub API star count and community reporting only (r/ClaudeCode threads track it as the de-facto default — the community’s closest thing to one — with rising verbosity complaints). Its gate mechanism is unverified here; corrections especially welcome on this row.
PRP (PRPs-agentic-eng)
/prp-ralph (Geoffrey Huntley’s Ralph technique, stop-hook driven) iterates until all validation commands pass — type-check, lint, tests, build — then emits a completion promise. Genuinely executable validation; what it gates is “loop until the code is green,” not stage progression or document content. The PRP artifact bundles a PRD + codebase intelligence + a runbook. No license file at read time.
ruflo (Claude Flow) / SPARC
Renamed from claude-flow. The ruflo-sparc plugin runs Specification → Pseudocode → Architecture → Refinement → Completion with “quality gates between each phase” — verified in the plugin README as orchestrator-agent-evaluated criteria (e.g. “covers all ACs”) whose results land in memory tables, not standalone exit-code scripts. ruflo verify is supply-chain verification of the installed bytes, not methodology receipts.
spec-workflow-mcp
The MCP server genuinely blocks the workflow until approval comes through the real-time dashboard (or VSCode extension), with revision rounds and implementation logs. The block is mechanical, not advisory; the check itself is a person. The pipeline: requirements → design → tasks. GPL-3.0, which also limits commercial embedding.
Taskmaster
Parses a PRD into dependency-ordered tasks (tasks.json), with expand/next/show and tagged workstreams; a research command pulls fresh information through a dedicated research model (Perplexity/xAI/OpenRouter) with project context. The research is informational, not evidentiary — no claims-traced-to-source step. Runs inside Cursor / Windsurf / similar editors. License NOASSERTION (MIT-with-Commons-Clause family).
Agent OS
Repositioned in 2026 around standards: Discover Standards (extract conventions from the codebase), Deploy Standards (inject contextually), Shape Spec, Index Standards. No gates of any kind, no artifact pipeline enforcement — that is not the job it is doing, and it does its job without ceremony.
SuperClaude
Its “confidence-based validation” is deep-research quality scoring — LLM-self-assessed source credibility 0.0–1.0 with a 0.6 threshold — not traced citations and not executable. No gates, no artifact pipeline, no self-application. Last push 2026-04-27 at read time; it no longer appears in the top tier of 2026 community comparison tables.
Tessl
Guy Podjarny (ex-Snyk), $125M from Index/Accel/GV/boldstart. The Spec Registry — 10k+ library specs — is in open beta; the Framework, in closed beta, targets the spec-anchored / spec-as-source end of Fowler’s ladder. Human approval sits between spec and implementation. A different bet than gated pipelines: specs as the source of truth.
Blueprint (this site)
The gates are Node scripts with exit codes: blueprint review fails CI when an artifact’s content — evidence paths, citations, required sections — doesn’t hold, and blueprint doctor exits non-zero on failed checks. Fact-check is a pipeline stage that traces document claims to research sources before deploy. The pipeline: research → prototype → fact-check → docs → deploy → iterate. As of wave 63 (2026-06-11) the consumer harness also carries an opt-in behavioral check: scenario_passes gates done-ness on recorded test evidence (Jest/Vitest/Playwright reporter output), fail-safe by contract — missing, stale, or skipped evidence can never read COMPLIANT. That is a state-derive check in the stamped harness, not one of the reviewers, so the reviewer count below is unchanged. Same honesty as every other row: the executable set is 15 of 18 reviewers (blueprint review --list), the stage skills only run in Claude Code, adoption is one team engagement in flight plus one independent adopter, and a passing gate means the artifact checks out — not that the thinking behind it was right. The gates verify artifacts and copy; they don’t make the work good.
What nobody else does (as of this read).
Fowler’s criticism of the field is that LLM-interpreted checklists create “a false sense of control” — there is “no 100% guarantee that they will be respected.” Across the source-read tools above (Superpowers excluded — its row is community data, not a source read), three things remained unclaimed in combination on 2026-06-11:
- Exit-code gates on artifact content and evidence.
Spec Kit’s
check-prerequisites.shis a real exit-code gate — on file existence. PRP’s loops are real — on code quality. The unclaimed slot is scripts that fail on what’s in the artifacts: missing evidence paths, uncited claims, stale counts. “Only executable gates” would be false; “executable gates on content, not just existence” is what this read supports. - A document-level fact-check stage. Kiro verifies code against EARS requirements; GSD’s verifier falsifies the agent’s own summary against the codebase. Both are code-level — and good. Nobody in this read traces document claims back to research sources before shipping.
- Self-application receipts as a feature. OpenSpec’s archive directory is a genuine dogfooding trail — the closest thing found — and it goes unmarketed. This site is Blueprint’s own pipeline output, gates included.
What several do better than Blueprint.
Kiro’s property tests verify the code itself — Blueprint’s gates stop at artifacts and copy. GSD’s adversarial-verifier stance is a stronger distrust posture than Blueprint’s reviewers hold. And community, everywhere: every tool above has more users than Blueprint — GSD reached ~64k stars in about five months; Blueprint’s adoption is one team engagement in flight and one independent adopter.
See the gates run on the demo, read the honest objections, or check the research behind this page: research/03-sdd-landscape-2026-06.md on GitHub.