A Spec-Driven AI Workflow for Production

Q: How do you keep AI output reliable when context resets?

Use a memory bank. Store project facts, constraints, and the active spec in files the AI reads at session start. When context resets (new session, context limit), the AI reloads the same ground truth instead of hallucinating or asking questions you already answered.

Q: What is wrong with vibe coding?

Nothing, if you are prototyping or exploring. Vibe coding - prompting an AI and accepting what it generates - is fast for throwaway work. The problem is production. Vibe-coded features break when requirements change because there is no spec to trace back to. You end up rewriting from scratch. Spec-driven development adds friction upfront but pays off when the codebase lives longer than a week.

Q: Do I need a memory bank, or is a README enough?

If the README stays current, it can work. Most READMEs do not. The memory bank is a forcing function: it is short, explicit, and updated on purpose.

Q: Why "one active spec"?

Because parallel specs create parallel context. Common failure mode: the assistant mixes constraints across specs. Humans will forget which constraints applied to which decision.

Q: What if the assistant is "mostly right"?

"Mostly right" is how regressions land. If it matters, require a diff you can review and a command you can run.

Q: What belongs in a spec vs in the memory bank?

Put intent in the spec (goal, constraints, acceptance, validation). Put facts and long-lived decisions in the memory bank (stack, routes, invariants, known tradeoffs).

Q: How do I keep the memory bank from turning into a second wiki?

Cap file sizes and enforce deletion. If something is stale or redundant, remove it.

Q: What if I need to work on multiple things in parallel?

Use a backlog, but keep only one active spec. Parallel work is how context mixes and regressions land.

Q: Should I run validation on every session, even for "docs only"?

Yes. For DAP iQ, publishing Markdown is a real build step.

Q: How strict should the validation ladder be?

As strict as the blast radius. For a content site: build + publish + HTTP checks. For payments: add tests and staged verification.

AI-assisted development breaks down once context resets become routine. Production work is interruption-heavy and constraint-heavy. If a workflow cannot survive that reality, it is not production-ready.

This is the opposite of "vibe coding" - the approach where you prompt an AI, accept whatever it generates, and hope for the best. Vibe coding optimizes for the first iteration. Spec-driven development optimizes for the next hundred.

You're reading Part 2 of 5 in the AI-assisted development series. Previous: Part 1: Understanding AI Code Assistants Next: Part 3: Security Boundaries for AI-Assisted Development in ASP.NET Core

This series moves from workflow > safety > performance > publishing, using DAP iQ as the working system.

Prerequisites

Audience: Engineers shipping production systems who want AI speed without losing review discipline.

Assumes: Basic familiarity with Git workflows and command-line tools.

You'll get: A spec template, memory bank structure, and validation ladder for reviewable AI-assisted diffs.

Overview

A spec-driven AI workflow is a loop where:

you write a short spec with constraints and acceptance criteria
the assistant produces a small diff
you validate it and checkpoint the result into durable project memory

In practice, this means keeping a small memory-bank/ folder, writing one spec at a time, and updating project memory after every meaningful change.

Default rule: Assume context will reset. Externalize decisions and constraints into small, durable artifacts. This pattern enforces one critical discipline: read all memory bank files at the start of every task. Skipping it is how context drift starts.

Key terms

Memory bank: A small set of files that hold project facts and current intent. The pattern originates from Cline's Memory Bank. Related patterns include CLAUDE.md files and Cursor's .cursorrules.
Hierarchical flow: Files build on each other. projectbrief anchors everything; productContext, systemPatterns, and techContext extend it; activeContext captures current state; progress tracks what works.
Active spec: The one spec you are implementing right now.
Checkpoint: A short update that records what changed and what is next.
Validation ladder: Commands and HTTP checks that prove the work.
Vibe coding: Prompting an AI and accepting what it generates without a spec. Fast for prototypes, fragile for production.

Quick start (10 minutes)

If you do nothing else, do this:

Verified on: ASP.NET Core (.NET 10), EF Core 10.

Create a folder called memory-bank/.
Add one file called activeContext.md.
Write down three bullets: current goal, constraints, what "done" means.
Before every AI session, paste those bullets into the prompt.
After every AI session, update that file with what changed and what is next.

This is deliberately small. It is also the difference between "fast today" and "reliable next month".

The core loop: Plan > Implement > Checkpoint

Treat the assistant as a compiler for intent, not a source of intent. The intent lives in a spec. The result is a diff.

Plan: write a small spec with acceptance criteria.
Implement: change code or content to match the spec.
Checkpoint: update project memory, then move the spec to done.

In DAP iQ, the loop is enforced with a small command vocabulary in AGENTS.md. The important part is not the file name. It is the invariant: there is one active spec and it moves when finished.

Minimal AGENTS.md example (commands)

This is an intentionally small command vocabulary. It keeps sessions consistent and makes "what happens next" obvious.

# Commands
INIT
STATUS
ASK <question>
PLAN <feature>
IMPLEMENT
BACKLOG <item>
CHECKPOINT
UPDATE MEMORY

# Rules
- Memory first: read memory-bank before work.
- Exactly one active spec in memory-bank/specs/.
- No edits during PLAN.
- Implement only the active spec.

Start-of-day routine (5 minutes)

Run INIT.
Read memory-bank/activeContext.md and memory-bank/progress.md.
Pick one spec to plan or implement.

End-of-session routine (5 minutes)

Update memory-bank/activeContext.md with what changed.
Update memory-bank/progress.md with what works/what's left.
Run CHECKPOINT to archive the spec.

Reference implementation

This is the minimum set of artifacts that makes the workflow real.

A spec template you can copy/paste

Use this as the input to the assistant. It forces scope, touched files, and a rollback plan.

# Spec: <short name>

## Summary
<1-2 sentences>

## Constraints
- Stack: <framework/runtime>
- Security: <guardrails>
- SEO: <canonical/meta/structured data rules>
- Performance: <cache/query rules>

## Scope
In scope:
- <bullet>

Out of scope:
- <bullet>

## Files to touch
- <path>

## Acceptance criteria
- <observable check>

## Validation
- Commands:
  - <command>
- HTTP checks:
  - <url>

## Rollback
<how to revert safely>

Worked example spec (real, not templated)

This is what a small, production-ready spec looks like.

# Spec: Exempt cached feeds from rate limiting

## Summary
Prevent 429s for crawlers by exempting cached, read-only feed/sitemap endpoints from rate limiting.

## Constraints
- Stack: ASP.NET Core MVC (.NET 10)
- Security: do not trust spoofable IP headers
- SEO: preserve canonical URLs and feed/sitemap routes
- Performance: keep OutputCache enabled on feeds and sitemap

## Files to touch
- src/DapIq.Website/Program.cs
- memory-bank/activeContext.md

## Acceptance criteria
- /insights/feed stays 200 under repeated requests (no 429)
- /sitemap.xml stays 200 under repeated requests (no 429)
- Rate limiting still applies to normal MVC routes

## Validation
- Commands:
  - dotnet build -c Release
  - cd src/DapIq.Website && dotnet run --launch-profile http
- HTTP checks:
  - curl -I http://localhost:5000/insights/feed
  - curl -I http://localhost:5000/sitemap.xml

## Rollback
Revert the Program.cs change and rebuild.

Memory bank file contracts

Keep it boring. Boring is what survives.

Keep the memory bank short on purpose:

activeContext.md target: <= 25 lines
progress.md target: <= 50 lines
systemPatterns.md target: <= 100 lines If a file grows past the cap, split it or delete stale content.

File	Purpose	Must contain
`projectbrief.md`	Mission and non-negotiables	Stack, constraints, audience
`productContext.md`	Why the product exists	Problems solved, UX goals
`techContext.md`	How to run/build/deploy	Commands, infra assumptions
`systemPatterns.md`	Architecture decisions	Patterns you do not re-litigate
`activeContext.md`	What we are doing now	Current phase, recent changes, next steps
`progress.md`	What works vs what is left	Checklists, known issues

When to update memory

The Cline pattern defines four update triggers:

When you discover a new project pattern worth preserving.
After implementing significant changes.
When explicitly asked to update memory.
When context needs clarification before continuing.

If none of these apply, do not update. Frequent small updates beat occasional large ones.

Checkpoint examples

If you cannot summarize what changed, you did not finish.

Good checkpoint (one line):

"Added OutputCache to series feeds and exempted from rate limiting; validated with curl."

Bad checkpoint:

"Did a bunch of caching stuff."

Copy/paste artifact: one-page spec template

Use this as a PR description or as the top of a spec file.

Goal:

Constraints (non-negotiable):
- <stack constraints, invariants, "no new packages", etc>

Files allowed to change:
- <explicit paths, keep this tight>

Acceptance criteria:
- <observable outcomes>

Validation:
- <commands to run, expected outputs>

Notes / decision log:
- <decision: ..., rationale: ...>

Guardrails that keep AI-assisted development grounded

Guardrails are not policies. They are things the repo, the compiler, and your review process can enforce.

Examples from DAP iQ:

One active spec at a time.
No repository pattern. AppDbContext is the repository.
Dark mode only. No theme toggle.
Validate with real commands, not "looks good".

If you do not constrain the solution space, the assistant will widen it. That creates long diffs and unclear ownership.

Turn prompts into diffs, not discussions

Ask for outputs that are easy to verify.

For code:

require a patch
require a file list
require "why" in terms of acceptance criteria

For content:

require consistent headings
require a nav block
require ASCII-only output if your pipeline is strict

For DAP iQ, "verify" usually means:

git status -sb
dotnet build

curl -I http://localhost:5000/series/ai-assisted-development
curl -I http://localhost:5000/sitemap.xml
curl -I http://localhost:5000/insights/feed

Surviving context window limits

When your conversation fills the context window, the assistant starts forgetting early context. The memory bank pattern handles this explicitly:

Request "update memory bank" before the window fills.
Start a fresh conversation.
Tell the assistant to read memory bank files first.

The memory bank becomes a handoff document. Everything important survives in files, not chat history.

For teams building deeper integrations, the Model Context Protocol (MCP) provides a standardized way for AI assistants to access external tools and data sources. MCP reduces the "N x M" integration problem to "N + M" by providing a universal interface. Major AI providers (Anthropic, OpenAI, Google) have adopted it. If your workflow needs persistent access to databases, APIs, or file systems across sessions, MCP is the infrastructure layer that enables it.

Common failure modes

Treating the chat as the spec, then losing it.
Asking for "best practices" and getting generic advice.
Allowing scope creep because the assistant can type faster than you can reason.
Skipping validation because the diff "looks right".
Letting the context window fill without checkpointing.

Checklist

Write a spec with acceptance criteria.
Copy in the constraints that matter (stack, routing, SEO, security).
Keep the diff small enough to review in one sitting.
Run dotnet build (or the closest equivalent) before calling it done.
Update memory so the next AI-assisted development pass starts from facts.

FAQ

How do you keep AI output reliable when context resets?

Use a memory bank. Store project facts, constraints, and the active spec in files the AI reads at session start. When context resets (new session, context limit), the AI reloads the same ground truth instead of hallucinating or asking questions you already answered.

What is wrong with vibe coding?

Nothing, if you are prototyping or exploring. Vibe coding - prompting an AI and accepting what it generates - is fast for throwaway work. The problem is production. Vibe-coded features break when requirements change because there is no spec to trace back to. You end up rewriting from scratch. Spec-driven development adds friction upfront but pays off when the codebase lives longer than a week.

Do I need a memory bank, or is a README enough?

If the README stays current, it can work. Most READMEs do not. The memory bank is a forcing function: it is short, explicit, and updated on purpose.

Why "one active spec"?

Because parallel specs create parallel context. Common failure mode: the assistant mixes constraints across specs. Humans will forget which constraints applied to which decision.

What if the assistant is "mostly right"?

"Mostly right" is how regressions land. If it matters, require a diff you can review and a command you can run.

What belongs in a spec vs in the memory bank?

Put intent in the spec (goal, constraints, acceptance, validation). Put facts and long-lived decisions in the memory bank (stack, routes, invariants, known tradeoffs).

How do I keep the memory bank from turning into a second wiki?

Cap file sizes and enforce deletion. If something is stale or redundant, remove it.

What if I need to work on multiple things in parallel?

Use a backlog, but keep only one active spec. Parallel work is how context mixes and regressions land.

Should I run validation on every session, even for "docs only"?

Yes. For DAP iQ, publishing Markdown is a real build step.

How strict should the validation ladder be?

As strict as the blast radius. For a content site: build + publish + HTTP checks. For payments: add tests and staged verification.

How does this differ from the original memory bank pattern?

Cline defines the memory bank structure and read/update discipline. This workflow adds:

A spec system with active/backlog/done states.
A command vocabulary (INIT, PLAN, IMPLEMENT, CHECKPOINT).
Explicit validation ladder requirements.
The "one active spec" constraint to prevent parallel drift.

The Cline pattern is the foundation. This workflow adds production discipline on top.

How does this compare to GitHub Spec Kit?

GitHub Spec Kit is GitHub's open source toolkit for spec-driven development. It uses a similar philosophy: specifications become executable artifacts that drive implementation.

Spec Kit workflow: Specify > Plan > Tasks > Implement. This workflow: Plan > Implement > Checkpoint.

Key differences:

Spec Kit uses a CLI (specify init, /specify, /plan, /tasks) and generates structured directories per feature.
This workflow uses a command vocabulary and a single active spec with backlog/done states.
Spec Kit has a "constitution" file for immutable architectural principles. This workflow uses systemPatterns.md in the memory bank.
Spec Kit is agent-agnostic (Copilot, Claude Code, Gemini). This workflow is also agent-agnostic but optimized for session-based work.

Both approaches solve the same problem: making AI output predictable by constraining it with structured specifications. Choose based on your team's tooling preferences.

What to do next

Read Part 3: Security Boundaries for AI-Assisted Development in ASP.NET Core. Browse the AI-assisted development series for the full sequence. If you want a workflow you can sustain, start by making AI-assisted development produce verifiable artifacts.

For implementing RAG (Retrieval-Augmented Generation) patterns with your memory bank, see EF Core Vector Search for Semantic AI.

If you want to talk about applying this workflow to your system, reach out via Contact.

References

Cline Memory Bank - The original memory bank pattern this workflow extends.
GitHub Spec Kit - GitHub's open source toolkit for spec-driven development with AI.
Spec-Driven Development with AI (GitHub Blog) - Introduction to spec-driven development philosophy.
Model Context Protocol - Open standard for AI assistant integrations with external tools and data.
GitHub Copilot Documentation
Git Documentation (Reference Manual)
.NET CLI overview
dotnet build
dotnet test
Configuration providers in .NET (environment variables)
Safe storage of app secrets in development in ASP.NET Core
The Twelve-Factor App

Author notes

Decisions:

Keep exactly one active spec at a time. Rationale: prevents parallel drift and conflicting context.
Treat commands as part of the workflow. Rationale: "run this" is a better contract than "this should work".
Prefer constraints over clever prompts. Rationale: constraints reduce hallucinated architecture.
Reference related approaches (Cline, GitHub Spec Kit, MCP). Rationale: readers benefit from knowing the ecosystem; this workflow is one option, not the only option.

Observations:

Before: context resets caused repeated debates about patterns and constraints.
After: memory bank + spec discipline reduced rework and made changes reviewable.
Observed: validation steps became consistent, which kept regressions visible.
The Cline memory bank pattern provided a solid foundation; the spec system and command vocabulary were the missing pieces for production use.