the landscape

Where this sits.

Every major spec-driven workflow converges on the same staged pipeline — what differs is what the gate between stages actually is.

human approval

A person clicking approve before the next stage runs.

LLM-run checklist

A checklist the LLM grades itself against.

exit-code on existence or schema

A script that fails when files are missing or malformed.

exit-code on content + evidence

A script that checks what’s in them — claims, citations, evidence.

Spec-driven development is a named, crowded category: a landscape tracker counts 30+ frameworks, Martin Fowler has a series on it, and the convergence claim above — research, plan, execute, review, ship — is the community’s own comparison’s conclusion, not this page’s. Blueprint’s seven stages are not the novelty — the question this read asks is which gate kind sits between anyone’s stages.

This is a source-level read of that question, sources read 2026-06-11: gate implementations from the open repos, vendor docs for the commercial tools (Kiro, Tessl), and one row — Superpowers — tabled from community data only, flagged as such. Star counts are approximate and point-in-time. If a row misrepresents a tool, file an issue and it gets fixed.

Tool	Gate type	What its gate actually is	What it does best	Scope
GitHub Spec Kit ~111k stars	EXIT-CODE: EXISTENCEHUMAN APPROVAL	Existence script; human approve prompt; LLM checklists	Reach — the most community-adopted SDD tool; 30+ agent integrations, GitHub’s backing	Spec → code pipeline
BMAD-METHOD ~49k stars	LLM CHECKLIST	LLM-run checklists and agent handoffs	Breadth — the only other tool spanning research, briefs, and PRDs through implementation	Research → code pipeline
GSD (Get Shit Done) ~64k stars	LLM CHECKLISTEXIT-CODE: PARTIAL	Formal gate taxonomy, mostly LLM-enforced; a few exit-code scripts	The adversarial-verifier stance — the strongest distrust-the-agent posture in the field	Claude Code workflow
OpenSpec ~54k stars	EXIT-CODE: SCHEMA	CLI validator on spec structure; explicitly anti phase-gate	Visible dogfooding — its own `changes/archive/` is a receipts trail; strong artifact discipline	Change-artifact pipeline
AWS Kiro commercial, GA	HUMAN APPROVAL	Property tests code-vs-requirements; stage flow IDE-guided human approval	Property-based testing — EARS requirements become an executable specification	Commercial IDE
Superpowers ~224k stars	UNVERIFIED	Not read at source level this pass	Distribution — on the official Claude marketplace since Jan 2026	Claude Code skills
PRP (PRPs-agentic-eng) ~2.2k stars	CODE-QUALITY LOOP	Loops until green — type-check, lint, tests, build	Executable validation embedded in the artifact — closest philosophical neighbor here	Implementation slices
ruflo (Claude Flow) / SPARC ~59k stars	AGENT-JUDGED	Agent-evaluated criteria between phases, stored in memory tables	Orchestration scale — swarms, adaptive memory, plugins; gate results stored for traceability	Agent meta-harness
spec-workflow-mcp ~4.2k stars	HUMAN APPROVAL	Dashboard approval — genuinely blocks until approved	The only other tool here where “blocked until approved” is enforced	MCP server + dashboard
Taskmaster ~27k stars	NONE	None — task ordering, no stage blocking	Task decomposition — PRD into dependency-ordered tasks, plus a dedicated research model	Task layer in editors
Agent OS ~4.8k stars	NONE	None — different job	Standards — extracts the conventions already in your codebase and injects them contextually	Context / standards injection
SuperClaude ~23k stars	NONE	LLM self-scored confidence thresholds — not executable, not a gate	Configuration breadth — commands, cognitive personas, modes, MCP integrations in one framework	Claude Code configuration framework
Tessl commercial	HUMAN APPROVAL	Human approval between spec and implementation	The Spec Registry — “npm for specs,” aimed at API hallucination	Spec registry + framework
Blueprint (this site) this repo	EXIT-CODE: CONTENT	Exit-code Node scripts on artifact content + evidence; fact-check stage	Self-application receipts — the demo and the methodology pages are its own output. Caveats in the footnote	Product pipeline

Also in the field, read but not tabled: Spec Kitty (~1.3k stars, spec → kanban → auto-merge CLI), Augment Intent (commercial “living specs” platform with a pre-PR Verifier), and “Hermes” (landscape chatter; no substantive repo surfaced). Named on the community’s shortlist but not read this pass: gstack (~109k stars). Sources read 2026-06-11.

Row by row, with sources.

GitHub Spec Kit

scripts/bash/check-prerequisites.sh exits non-zero when required artifacts (spec.md, plan.md, tasks.md) are missing — a real exit-code gate, on file existence. The workflow-engine gate step (src/specify_cli/workflows/steps/gate/__init__.py) is an interactive human approve/reject prompt that falls back to PAUSED in CI. /speckit.checklist and /speckit.analyze are LLM-interpreted markdown — “unit tests for English,” in its own docs’ framing. The full pipeline: constitution → specify → clarify → plan → tasks → analyze → implement.

BMAD-METHOD

v6 ships 12+ role agents (PM, Architect, Dev, UX), 34+ workflows, and web bundles for upfront research and PRD planning — the only tool in this read with product-pipeline breadth. Handoffs between agents are file-based and convention-enforced; checklists are prompts the agents run. No executable enforcement found in this read. The full pipeline: research → brief → PRD → architecture → stories → code.

GSD (Get Shit Done)

get-shit-done/references/gates.md defines a four-type gate taxonomy — pre-flight artifact checks, revision loops bounded at 3 iterations with stall detection, escalation to human, abort — with a matrix mapping phases to artifacts. agents/gsd-verifier.md instructs verifiers to falsify the SUMMARY.md narrative — “Do NOT trust SUMMARY.md claims” — goal-backward. Most gates run LLM-side; a few real Node exit-code scripts exist (bin/lib/ui-safety-gate.cjs documents exit 0/1/2). Verification is code-vs-claims, not document-vs-sources. Scope: requirements → plan → execute → verify, inside Claude Code.

OpenSpec

openspec validate --all --strict --json is a real, CI-runnable validator — of spec structure and schema, not evidence. The README explicitly positions against rigid phase gates; /opsx:verify (added 2026-02) is LLM-driven implementation-vs-spec checking. Its own repo carries openspec/changes/archive/ with dated change folders — the only visible dogfooding trail in this read, and it goes unmarketed. The artifact chain: proposal → spec deltas → tasks → archive.

AWS Kiro

The GA headline is property-based testing: EARS-notation requirements are translated into an executable specification, generating property tests that check the produced code against the spec (kiro.dev/docs/specs/correctness). The most credible executable-verification story in this read — and it verifies code against requirements, not stage progression; stage flow is IDE-guided with implicit human approval. Spec artifacts: requirements.md / design.md / tasks.md → code.

Superpowers

Not read at source level this pass — the row reflects the GitHub API star count and community reporting only (r/ClaudeCode threads track it as the de-facto default — the community’s closest thing to one — with rising verbosity complaints). Its gate mechanism is unverified here; corrections especially welcome on this row.

PRP (PRPs-agentic-eng)

/prp-ralph (Geoffrey Huntley’s Ralph technique, stop-hook driven) iterates until all validation commands pass — type-check, lint, tests, build — then emits a completion promise. Genuinely executable validation; what it gates is “loop until the code is green,” not stage progression or document content. The PRP artifact bundles a PRD + codebase intelligence + a runbook. No license file at read time.

ruflo (Claude Flow) / SPARC

Renamed from claude-flow. The ruflo-sparc plugin runs Specification → Pseudocode → Architecture → Refinement → Completion with “quality gates between each phase” — verified in the plugin README as orchestrator-agent-evaluated criteria (e.g. “covers all ACs”) whose results land in memory tables, not standalone exit-code scripts. ruflo verify is supply-chain verification of the installed bytes, not methodology receipts.

spec-workflow-mcp

The MCP server genuinely blocks the workflow until approval comes through the real-time dashboard (or VSCode extension), with revision rounds and implementation logs. The block is mechanical, not advisory; the check itself is a person. The pipeline: requirements → design → tasks. GPL-3.0, which also limits commercial embedding.

Taskmaster

Parses a PRD into dependency-ordered tasks (tasks.json), with expand/next/show and tagged workstreams; a research command pulls fresh information through a dedicated research model (Perplexity/xAI/OpenRouter) with project context. The research is informational, not evidentiary — no claims-traced-to-source step. Runs inside Cursor / Windsurf / similar editors. License NOASSERTION (MIT-with-Commons-Clause family).

Agent OS

Repositioned in 2026 around standards: Discover Standards (extract conventions from the codebase), Deploy Standards (inject contextually), Shape Spec, Index Standards. No gates of any kind, no artifact pipeline enforcement — that is not the job it is doing, and it does its job without ceremony.

SuperClaude

Its “confidence-based validation” is deep-research quality scoring — LLM-self-assessed source credibility 0.0–1.0 with a 0.6 threshold — not traced citations and not executable. No gates, no artifact pipeline, no self-application. Last push 2026-04-27 at read time; it no longer appears in the top tier of 2026 community comparison tables.

Tessl

Guy Podjarny (ex-Snyk), $125M from Index/Accel/GV/boldstart. The Spec Registry — 10k+ library specs — is in open beta; the Framework, in closed beta, targets the spec-anchored / spec-as-source end of Fowler’s ladder. Human approval sits between spec and implementation. A different bet than gated pipelines: specs as the source of truth.

Blueprint (this site)

The gates are Node scripts with exit codes: blueprint review --target=

fails CI when an artifact’s content — evidence paths, citations, required sections — doesn’t hold, and blueprint doctor exits non-zero on failed checks. Fact-check is a pipeline stage that traces document claims to research sources before deploy. The pipeline: research → prototype → fact-check → docs → deploy → iterate. As of wave 63 (2026-06-11) the consumer harness also carries an opt-in behavioral check: scenario_passes gates done-ness on recorded test evidence (Jest/Vitest/Playwright reporter output), fail-safe by contract — missing, stale, or skipped evidence can never read COMPLIANT. That is a state-derive check in the stamped harness, not one of the reviewers, so the reviewer count below is unchanged. Same honesty as every other row: the executable set is 15 of 18 reviewers (blueprint review --list), the stage skills only run in Claude Code, adoption is one team engagement in flight plus one independent adopter, and a passing gate means the artifact checks out — not that the thinking behind it was right. The gates verify artifacts and copy; they don’t make the work good.

What nobody else does (as of this read).

Fowler’s criticism of the field is that LLM-interpreted checklists create “a false sense of control” — there is “no 100% guarantee that they will be respected.” Across the source-read tools above (Superpowers excluded — its row is community data, not a source read), three things remained unclaimed in combination on 2026-06-11:

Exit-code gates on artifact content and evidence. Spec Kit’s check-prerequisites.sh is a real exit-code gate — on file existence. PRP’s loops are real — on code quality. The unclaimed slot is scripts that fail on what’s in the artifacts: missing evidence paths, uncited claims, stale counts. “Only executable gates” would be false; “executable gates on content, not just existence” is what this read supports.
A document-level fact-check stage. Kiro verifies code against EARS requirements; GSD’s verifier falsifies the agent’s own summary against the codebase. Both are code-level — and good. Nobody in this read traces document claims back to research sources before shipping.
Self-application receipts as a feature. OpenSpec’s archive directory is a genuine dogfooding trail — the closest thing found — and it goes unmarketed. This site is Blueprint’s own pipeline output, gates included.

What several do better than Blueprint.

Kiro’s property tests verify the code itself — Blueprint’s gates stop at artifacts and copy. GSD’s adversarial-verifier stance is a stronger distrust posture than Blueprint’s reviewers hold. And community, everywhere: every tool above has more users than Blueprint — GSD reached ~64k stars in about five months; Blueprint’s adoption is one team engagement in flight and one independent adopter.

See the gates run on the demo, read the honest objections, or check the research behind this page: research/03-sdd-landscape-2026-06.md on GitHub.