AI code assistants are useful. They are also easy to misuse. This article defines AI-assisted development in production terms and sets boundaries that make it survivable.
You're reading Part 1 of 5 in the AI-assisted development series. Previous: none Next: Part 2: A Spec-Driven AI Workflow That Holds Up in Production
This series moves from workflow -> safety -> performance -> publishing, using DAP iQ as the working system.
Common questions this answers
- What are AI code assistants actually good at (and bad at)?
- What constraints make AI-assisted development safe in production?
- What validation should you run before you trust the diff?
Definition (what this means in practice)
AI-assisted development is a workflow where you provide explicit constraints and context, and the assistant returns a small diff you can review. It works when each change ends with real validation commands and produces predictable artifacts.
In practice, this means starting every AI session with a written spec, treating the output as a draft diff, and running validation commands before calling it done.
Terms used
- AI-assisted development: spec in, diff out, validate.
- Constraint: a non-negotiable rule (stack, security boundary, SEO invariant).
- Artifact: something reviewable (diff, table, checklist, command output).
- Validation ladder: a short set of commands and checks that prove behavior.
- Blast radius: how many systems can be impacted by the change.
- Scope creep: changes outside the agreed file list and acceptance criteria.
Reader contract
This article is for:
- Engineers who own production outcomes.
- Anyone reviewing AI-generated diffs.
You will leave with:
- A definition of AI-assisted development you can enforce.
- A diff review rubric (with a table).
- Prompt patterns that produce reviewable patches.
- A validation ladder that keeps you honest.
This is not for:
- Prompt roulette.
- "Just ship it" coding.
Copy/paste artifact: validation ladder by blast radius
Use the smallest validation that can fail loudly, then widen the blast radius.
| Blast radius | When to use | Validation examples |
|---|---|---|
| Single function/module | refactor, small bug fix | unit tests, targeted build |
| Single app/service | endpoint change | dotnet build, dotnet test, run app + one route |
| Cross-cutting web behavior | routing, caching, headers | curl/HTTP 200/301 checks, cache headers, forwarded headers checks |
| Data model / migrations | EF model change | migration script review, apply to dev DB, run critical queries |
| Production-like behavior | risky changes | canary-like checks, load/perf smoke tests |
Why this exists
DAP iQ is a real ASP.NET Core system. The goal is to apply AI-assisted development to real systems without lowering engineering standards.
Default rule
Treat the assistant like a fast implementer that needs supervision and explicit constraints.
Quick start (10 minutes)
If you want immediate value, do this before your next AI-assisted change:
Verified on: ASP.NET Core (.NET 10), EF Core 10.
- Write a 5-bullet spec (goal, constraints, files, acceptance, validation).
- Ask the assistant for a diff, not for "the best approach".
- Refuse diffs that touch files you did not list.
- Run one real command that would fail if the change is wrong.
- Capture the decision in a durable place (not chat).
What AI-assisted development means here
AI-assisted development is a workflow where:
- The input is constraints, context, and a spec.
- The output is a small diff you can review.
- Every change ends in a validation step.
It is not "prompt until it compiles". It is an engineering loop that keeps intent and verification visible.
The five diff types AI is allowed to touch
This is the boundary that makes AI-assisted development safe. If a diff crosses these lines, it needs human ownership.
- Mechanical edits: rename, move, reorder, format (no behavior change).
- Repetitive glue: wiring, mapping, small adapters.
- Local refactors: one module, one responsibility, no new dependencies.
- Testable bug fixes: clear reproduction, clear validation.
- Small feature increments: behind a constraint, with explicit acceptance criteria.
Anything else is a human task:
- architecture decisions
- security boundaries
- performance strategy
- domain model changes
How to keep AI diffs small in large repos
Small diffs are not a style preference. They are how you keep review quality high and avoid hidden scope creep.
Practical heuristics:
- Max files touched: 3-7 for a routine change (more requires an explicit reason).
- Max LOC changed: about 200-400 for a single PR (more requires splitting).
- Max responsibilities: one behavior change at a time.
Diff size thresholds (use these as review gates):
| Size | Rough threshold | Review rule |
|---|---|---|
| Small | <= 200 LOC, <= 7 files | Accept if constraints and validation are explicit |
| Medium | <= 500 LOC | Requires an explicit reason and a tighter spec |
| Large | > 500 LOC | Split into smaller PRs unless there is a hard blocker |
Split the work when any of these are true:
- a diff mixes refactor + behavior change
- a diff changes runtime behavior and also updates docs/content
- a diff crosses a trust boundary (input, output encoding, auth, headers, network)
Safe vs unsafe changes (micro diff examples)
These are intentionally small. They show the kind of changes you should accept or reject.
Example 1: safe (tighten a guardrail)
slug = Slug.Normalize(slug);
if (!Slug.IsValid(slug))
{
return NotFound();
}
Example 2: unsafe (new trust assumption)
-var clientIp = httpContext.Connection.RemoteIpAddress?.ToString();
+var clientIp = httpContext.Request.Headers["X-Forwarded-For"].ToString();
Example 3: unsafe (raw rendering without a boundary)
-@Model.Html
+@Html.Raw(Model.Html)
Where AI code assistants earn their keep
Assistants are good at high-signal tasks where the shape is known.
- Boilerplate and repetitive glue
- Refactors with clear intent
- Finding call sites and repeated patterns
- Drafting small, reviewable diffs
In DAP iQ, the best wins came from diff-sized work. Examples: tightening a middleware pipeline, shaping an EF Core query, or standardizing cache policies.
Where they fail in real systems
The failures are predictable. They happen where judgment matters.
- Architecture and boundaries
- Security and trust assumptions
- Performance and operational tradeoffs
- Domain logic with missing context
AI-assisted development is safe only when you assume the assistant will be wrong in these areas and you review accordingly.
A review rubric for AI-generated diffs
The rubric is simple: the diff must be reviewable, explainable, and verifiable. Use this table in PR review.
| Check | What you look for | Red flag |
|---|---|---|
| Scope | Touches only intended files | "While I was here" edits |
| Size | Can be reviewed in one sitting | Large refactors without a spec |
| Constraints | Matches stack and patterns | New libraries or patterns slipped in |
| Security | No new trust assumptions | Reads X-Forwarded-For directly, raw HTML rendering |
| Performance | No surprise queries/allocations | New Includes everywhere, no .AsNoTracking() |
| Correctness | Clear acceptance criteria | Vague "should work" language |
| Validation | Command(s) to prove it | No runnable checks |
| Observability | Logs do not leak secrets | Raw headers / PII in logs |
| SEO (for web) | Canonical/meta unchanged unless intentional | URL changes or broken JSON-LD |
| Rollback | Easy to revert | Deep entanglement |
Two DAP iQ examples that matter
Example 1: Markdown is rendered through a pipeline that disables raw HTML. That reduces XSS risk when content is stored as Markdown.
var pipeline = new MarkdownPipelineBuilder()
.DisableHtml()
.UseAdvancedExtensions()
.UseAutoLinks()
.Build();
Example 2: caching is expressed as named policies. That makes performance intent reviewable.
builder.Services.AddOutputCache(options =>
{
options.AddPolicy("Default12Hours", b => b.Expire(TimeSpan.FromHours(12)));
options.AddPolicy("VaryByPage12Hours", b => b.Expire(TimeSpan.FromHours(12)).SetVaryByQuery("page"));
options.AddPolicy("Detail6Hours", b => b.Expire(TimeSpan.FromHours(6)));
});
The workflow boundary: prompts are not a plan
Prompts are input. They are not durable project memory.
If you want consistent results, write a spec, keep one active task, and checkpoint decisions. That is the difference between "fast today" and "reliable next month".
Prompt patterns that produce reviewable diffs
Prompts are only useful if they result in an artifact you can review. These templates are designed to produce small diffs.
Prompt anti-patterns (what causes real failures)
Avoid prompts that hide intent or invite scope creep:
- "refactor everything" (invites unreviewable diffs)
- "best practices" (invites generic, ungrounded advice)
- "make it scalable" (invites architecture changes without constraints)
- "improve performance" without a measurement goal (invites cargo-cult caching/indexes)
- "fix security" without a threat model surface list (invites random header/toggle changes)
If you cannot name the files and validations up front, you are not ready to ask for implementation.
Template 1: patch-only
You are modifying an existing codebase.
Output only a unified diff.
Files allowed: <list>
Constraints: <bullets>
Acceptance criteria: <bullets>
Validation commands: <bullets>
Template 2: file list first
First: list files you will change and why (1 sentence each).
Then: provide the patch.
Do not touch any other files.
Template 3: smallest change wins
Prefer the smallest set of changes that meets acceptance.
If you want to refactor, propose it separately and do not implement it.
Template 4: security boundary callout
For every change that touches input, auth, headers, or rendering:
call out the trust boundary and how it is enforced.
Template 5: validation ladder
End your answer with a validation ladder (commands + expected outcomes).
If you cannot propose validation, do not propose the change.
Template 6: incremental complexity
Implement the simplest version first.
When that works, list what would need to change for: <specific extension>.
Do not implement the extension unless I ask.
Template 7: existing patterns
Find similar patterns in this codebase first.
Match the existing style. Do not introduce new conventions.
If no similar pattern exists, stop and ask before implementing.
Template 8: dependency audit
Before implementing:
1. List any new NuGet packages required
2. List any new using statements
3. List any new interfaces or abstractions
Do not proceed if you need to add new dependencies unless I approve.
Template 9: error handling explicit
For every code path that can fail:
1. Name the failure mode
2. Show the error handling
3. Explain what the user sees
Do not use generic catch blocks.
Template 10: performance impact
For this change, identify:
1. Any new database queries (with expected row counts)
2. Any new allocations in hot paths
3. Any new network calls
If you cannot estimate impact, flag it.
Template 11: rollback friendly
Design this change so it can be reverted in one commit.
Do not mix migrations with behavior changes.
Do not mix refactors with new features.
Template 12: test first
Before implementing, write the test that will prove this works.
Show me the test. Wait for approval before implementing.
Maintainability of AI-generated code
AI-generated code must be maintained by humans. Code that compiles today becomes technical debt if it cannot be understood, modified, or debugged tomorrow.
Signs of maintainable AI-generated code
| Characteristic | Why it matters | How to check |
|---|---|---|
| Follows existing patterns | Reduces cognitive load | Compare to similar files |
| Uses project conventions | Consistent with codebase | Review naming, structure |
| Minimal dependencies | Easier to update/remove | Check new imports |
| Clear intent | Can understand without context | Read without prompt |
| Testable | Can verify correctness | Unit tests exist |
| No magic numbers | Configurable behavior | Review hardcoded values |
Signs of unmaintainable AI-generated code
// BAD: Generated code that will cause maintenance pain
public async Task<IActionResult> Process(Request r)
{
// What is 42? Why these specific values?
if (r.Type == 42 || r.Status == "X" || r.Priority > 7)
{
// Why is this special-cased?
var result = await _svc.DoThing(r.Data, true, false, 3);
return r.Format == "json" ? Json(result) : View(result);
}
// ... 200 more lines of similar code
}
// GOOD: Same logic, maintainable
private const int HighPriorityThreshold = 7;
public async Task<IActionResult> Process(Request request)
{
if (ShouldFastTrack(request))
{
var result = await ProcessHighPriority(request);
return FormatResponse(request.Format, result);
}
// ...
}
private static bool ShouldFastTrack(Request request) =>
request.Type == RequestType.Urgent ||
request.Status == RequestStatus.Escalated ||
request.Priority > HighPriorityThreshold;
Maintainability checklist for AI-generated code
Before merging any AI-generated code, verify:
- Naming: Can you understand what it does from names alone?
- Structure: Does it match existing code organization?
- Constants: Are magic numbers extracted and named?
- Comments: Are complex decisions explained?
- Error handling: Are failures handled explicitly?
- Tests: Does test coverage match the complexity?
- Dependencies: Are new dependencies justified?
Long-term maintenance patterns
Pattern 1: Extract and name
When reviewing AI-generated code, extract:
- Repeated conditions into named methods
- Magic values into constants
- Complex expressions into variables
Pattern 2: Document the "why"
AI explains what. You document why.
Add comments for:
- Business rules that drove decisions
- Edge cases that were considered
- Alternatives that were rejected
Pattern 3: Test coverage proportional to complexity
Simple CRUD: integration test sufficient
Complex logic: unit tests for each branch
Security-sensitive: multiple test types
When to reject AI-generated code for maintainability
Reject diffs that:
- Introduce patterns inconsistent with the codebase
- Add complexity without clear benefit
- Cannot be understood without the original prompt
- Mix multiple concerns in one method
- Have no tests for complex branches
A validation ladder that keeps you honest
The validation ladder scales with risk. For DAP iQ style changes, this is typical:
git status -sb
cd src/DapIq.Publisher && dotnet run -- ../../content
dotnet build DapIq.Website.sln -c Release
Then verify the outputs that matter:
/series/ai-assisted-development- the touched article route
/sitemap.xml/insights/feed
Copy/paste (example):
curl -I http://localhost:5000/series/ai-assisted-development
curl -I http://localhost:5000/sitemap.xml
curl -I http://localhost:5000/insights/feed
If you are using https locally, adjust the scheme and port.
Common failure modes
- Treating the chat as the spec, then losing it.
- Accepting a large diff because it compiles.
- Shipping security assumptions you did not review.
- Skipping validation because the assistant sounded confident.
Checklist
- Define AI-assisted development as "spec in, diff out, validate".
- Keep diffs small enough to review.
- Treat security-sensitive edits as high risk.
- Run a real validation step before calling it done.
- Capture decisions so the workflow survives context resets.
FAQ
Are AI code assistants "good" at software engineering?
They are good at producing plausible code. Software engineering is the part where you decide what should exist and why. Treat generation as typing speed, not judgment.
Should I let AI write tests?
Yes, but treat tests as code. Review them like you would review production logic. Bad tests create false confidence.
What is the single best constraint?
"Diff out". If you cannot get a small patch, your prompt is not a plan and the scope is wrong.
How do I prevent dependency creep?
State it explicitly. "No new packages." Then reject diffs that add them.
When should I not use an AI code assistant?
Do not use it for work where you cannot state the constraints, list the files, and name validation. High-risk examples: authentication, authorization, cryptography, payment flows, and anything that crosses trust boundaries.
What should I never paste into a prompt?
Secrets, tokens, private keys, and production connection strings. Also avoid pasting raw customer data or internal incident details.
What is the fastest way to catch scope creep?
Require a file list first, then require a patch. If the diff touches files outside the list, reject it and restate scope.
Are mechanical refactors safe to accept?
Sometimes, but only if they stay mechanical. If a refactor changes behavior and structure at the same time, split it.
What to do next
Read Part 2: A Spec-Driven AI Workflow That Holds Up in Production. Browse the AI-assisted development series for the full sequence. If you want to apply the workflow to your project, reach out via Contact.
References
- ASP.NET Core Middleware Fundamentals
- ASP.NET Core Output Caching Middleware
- Safe storage of app secrets in development in ASP.NET Core
- ASP.NET Core security topics (secure authentication flows)
- Secure coding guidelines (.NET)
- OWASP Code Review Guide
- GitHub Copilot Documentation
Author notes
Decisions:
- Use AI code assistants for diff-sized work, not architecture. Rationale: architecture needs durable context and human judgment.
- Disable raw HTML in Markdown rendering. Rationale: narrower XSS surface for content.
- Use named cache policies. Rationale: makes performance intent reviewable in AI-assisted development diffs.
Observations:
- Before: it was easy to confuse "generated code" with "reviewed code".
- After: treating output as a diff made review and validation consistent.
- Observed: the most reliable gains came from constrained, testable tasks.