This is not a framework. It is a reference. Eight structural forces — covered in depth across Articles 07 through 10 — mapped against all nine phases of the SDLC, with an honest annotation for each intersection: what the impact level is, what specifically happens, and which tools are relevant. Keep it open in another tab.
- Eight forces × nine SDLC phases = 72 annotated intersections: impact level, what happens, and the relevant tools at each.
- Use the sticky phase navigation to jump; each force pair links to its deep-dive article (07–10).
- Leverage concentrates: DDD peaks at design, AI eval at test, AI-aware SRE at monitor. Invest first where your weakest phase meets a high-impact force.
Not every force matters equally at every phase. The leverage of Domain-Driven Design is highest at design and changes — not at monitoring. The leverage of AI-Aware SRE is almost entirely at monitor — with modest contributions at design and deploy. Understanding where each force creates its highest value tells you where to invest first.
| Force ↓ / Phase → | ◎ Analyze |
◈ Design |
◇ Develop |
◉ Test |
▣ Build |
▲ Deploy |
◐ Monitor |
◆ Deliver |
↻ Change |
|---|---|---|---|---|---|---|---|---|---|
| F01 · 👥 Stakeholders | |||||||||
| F02 · 🏛️ DDD | |||||||||
| F03 · ⚡ Full-Stack | |||||||||
| F04 · 🗄️ Polyglot Data | |||||||||
| F05 · 🔄 Five-Track CICD | |||||||||
| F06 · 🔗 Middleware | |||||||||
| F07 · 🧪 Test Suite | |||||||||
| F08 · 🔭 AI-Aware SRE |
The analysis phase now produces three artifacts from one session: user stories (UX), structured agent specs (implementation), and compliance checklists (reviewers). Most teams produce one. AI can generate all three — but only if the process is designed for it.
Domain discovery during analysis is the foundation for AI agent delegation. Bounded context mapping here prevents the Big Ball of Mud from being AI-generated at scale in development.
Analysis must cover all six tracks: FE, BE, Middleware, Data, AI/ML, Platform. Requirements scoped to one track create integration surprises in development.
Data requirements analysis must identify which data is transactional (relational), document-oriented (NoSQL), semantic (vector), or real-time (streaming). Getting this wrong here means the wrong DB in design.
Identifying which artifact types a feature touches is underrated. "Add semantic search to mobile" is a five-artifact feature. Many teams discover this in build, not analysis.
Cache strategy and queue topology should be identified in analysis — which operations are expensive enough to cache? — but rarely are. The most consistently skipped analysis artifact.
Test strategy as a requirement artifact: defining which of the four test layers applies to each requirement, and what AI eval acceptance criteria look like, is an analysis task few teams do.
Reliability requirements — uptime targets, hallucination rate SLOs, agent audit requirements — are requirements, not engineering footnotes. Rarely captured in analysis, consistently regretted in production.
Use cases must be written for users AND agents. Agents need unambiguous boundary conditions. Reviewers need traceable design decisions. Design docs that serve all three are rare and valuable.
DDD design artifacts — context maps, aggregate definitions, domain event catalogs, ubiquitous language glossaries — become the prompt context for every AI agent in development. The quality of design is the ceiling of AI output quality.
Six-track design produces six artifact sets: component library, API contracts, middleware topology, data models, embedding strategy, and pipeline design. Each artifact enables parallel AI agent execution in development.
The polyglot data architecture decision: what data lives in PostgreSQL vs MongoDB vs Redis vs vector store vs streaming — and how it flows between them. This is now as important as API design. Getting it right enables every AI feature downstream.
CICD pipeline design is a design-phase artifact alongside API design. Which tracks exist? What are their build dependencies? How does a cross-artifact feature coordinate release? Decided in design, not discovered in production.
Cache strategy design: what data is hot (Redis), what LLM responses are cacheable semantically (GPTCache), what operations are async (BullMQ), what workflows need durable execution (Temporal). Designed in phase 2 saves weeks in phase 3.
Test architecture design: what golden datasets exist for AI eval? What performance SLOs are required? What contract tests define bounded context boundaries? Designed upfront, not retrofitted after the first production incident.
Observability as a design artifact: what traces, metrics, and logs does each component emit? What are the AI SLOs (hallucination rate, retrieval precision, cost-per-workflow)? What does the runbook for a qualitative AI failure look like?
Developers implement for user needs, within agent-generated code that must respect domain boundaries, while leaving audit trails that reviewers can trace to original requirements.
Each bounded context is implemented by an AI agent given the domain model as explicit context. Ubiquitous language prevents naming drift. Domain invariants must be made explicit or AI will violate them. In teams I have worked with, clear DDD models cut AI code review by half or more.
Six tracks run in parallel with AI agents on isolated git worktrees. Senior engineers own the interface contracts between tracks. Code review is primarily contract and domain invariant review, not syntax review.
Data engineering development across all four stores: schema design, CDC configuration, embedding pipeline, streaming topology, transformation models. AI agents write dbt models and pipeline code but require explicit data contracts to operate within.
Pipeline-as-code developed alongside application code. Engineers write GitHub Actions workflows, Fastlane configs, and dbt pipeline definitions as first-class development artifacts, not DevOps afterthoughts.
Semantic cache implementation: cache LLM responses by embedding similarity, not exact string match. AI task queue with context-aware retry — failed AI tasks carry the failure context into the next attempt's prompt. The highest-ROI layer for AI cost reduction.
Test code developed alongside feature code: unit, integration, contract, AI eval, and data quality tests written as part of the development definition of done — not as a separate QA phase.
Instrumentation code — LLM call traces, agent action logs, PII redaction middleware — developed alongside feature code. Platform engineering abstractions reduce the burden on product engineers.
Test cases must validate behavior for human users, correctness for agent consumers, and traceability for compliance reviewers. Three distinct test artifact types from the same test suite.
Domain invariant testing: does this operation violate an aggregate rule? Domain event tests validate that the right events fire for state changes. Contract tests validate bounded context boundaries.
Each track has its own test primitives: component tests (FE), contract tests (BE), throughput tests (Middleware), data quality tests (Data), eval tests (AI/ML), chaos tests (Platform). All integrated into one CI pipeline.
Data quality tests validate each store: relational integrity, document schema validation, vector index freshness tests, streaming latency SLOs. Failures here cause silent AI quality degradation downstream.
All four test layers integrated into CICD as mandatory gates. Not three separate tools managed by three teams — one unified pipeline with functional, non-functional, AI eval, and data quality running on every PR.
Cache hit rate under load, semantic cache precision, queue throughput under AI task spike, Temporal workflow reliability under failure — middleware has unique non-functional test requirements.
The core phase. Four layers: Functional (unit, integration, E2E, contract), Non-Functional (load, security, accessibility, chaos), AI Eval (faithfulness, relevance, hallucination rate), Data Quality (schema, freshness, distribution). All mandatory gates.
Chaos engineering tests: what happens when the LLM API is slow? When the vector index is stale? When an agent loops? SRE reliability requires testing failure modes invisible in happy-path tests.
Build artifacts should include reviewer-facing changelogs, agent-facing API specs, and user-facing release notes — generated automatically from the same commit.
Each bounded context is its own independently deployable artifact. The domain model is the build boundary. DDD-structured codebases have cleaner build dependency graphs for AI to reason about.
Five artifact types in one pipeline: Web (Vite), Mobile (Xcode/Gradle via Fastlane), Backend (Docker/Helm), Data Pipeline (dbt Cloud), AI/ML (MLflow). Coordinated release with cross-artifact dependency awareness is the unsolved frontier.
Data pipeline is a build artifact with its own versioning, testing, and deployment. Embedding model versions must be pinned and packaged alongside application artifacts — they're not infrastructure, they're code.
The core phase for this trend. Five artifact types plus infrastructure: Web (Vite), Mobile (Fastlane), Backend (Docker/Helm), Data Pipeline (dbt Cloud), AI/ML (MLflow) — plus cloud infrastructure as a sixth artifact type, provisioned via Terraform/Terragrunt and applied on merge through Atlantis. Cross-artifact dependency management for coordinated release remains the industry's genuinely unsolved problem.
Middleware configuration is infrastructure-as-code. Cache topology, queue configuration, and workflow definitions are versioned build artifacts with the same rigor as application code.
Test pipeline optimization: parallel execution, smart test selection (only tests affected by changes), result caching, and sharding for long AI eval suites. Four test layers in CI requires deliberate pipeline engineering to keep feedback loops fast.
Observability collectors, exporters, and sampling configurations are build artifacts deployed as sidecars. Zero-gap telemetry from first deployment — not added after the first production incident.
Staged rollouts serve users. MCP endpoint versioning serves agents. Compliance-gated deploys serve reviewers. Each consumer needs different deployment controls.
Bounded contexts deploy independently. Domain event schema changes require careful versioning — a domain event is a public API for other contexts, not an internal detail.
Each track has different deployment mechanics: CDN (Web), App Store (Mobile), Kubernetes (Backend), Airflow (Data), serverless GPU (AI/ML). Feature flags manage the staggered reality of coordinated multi-track release.
Vector index warm-up before traffic switch, CDC replication lag monitoring during DB migrations, streaming topic migration strategies — data deployment has uniquely complex rollout requirements that differ from application deployment.
Five artifact types plus infrastructure: cloud provisioning (Terraform/Terragrunt via Atlantis) completes first — application services deploy into the infrastructure it defines. Each artifact type has its own deployment mechanism and rollback strategy. Feature flags coordinate the staggered reality of multi-track release when App Store review (2–7 days) blocks synchronisation.
Managed middleware services reduce operational burden. Cache warm-up before traffic cutover prevents degradation on new deployments. Queue draining before shutdown prevents task loss.
Smoke tests validate basic functionality post-deploy. Canary deployments with automated rollback triggered by AI eval metric degradation — not just error rate — protect against qualitative regressions traditional health checks miss.
Progressive delivery with AI-aware rollback triggers: if hallucination rate SLO breaches during canary deployment, automated rollback fires — not just if HTTP error rate increases.
Monitor user satisfaction (NPS, error rates), agent reliability (API contract violations, MCP health), and reviewer metrics (compliance coverage, audit trail completeness) as three separate SLO tracks.
Domain event throughput per bounded context provides business-meaningful monitoring metrics beyond infrastructure metrics. Domain-aligned dashboards make on-call faster because incidents map to business concepts.
Six-track monitoring: FE (Core Web Vitals, crash rates), BE (latency, error rates), Middleware (hit rates, queue depth), Data (freshness, quality), AI/ML (eval metrics), Platform (CICD success rate, infra costs).
Data freshness (how old is retrieved data?), embedding drift (are old embeddings diverging from current data?), retrieval precision (is RAG finding the right context?), query latency per store — these data layer SLOs determine AI feature quality.
CICD health monitoring: build success rates per track, deployment frequency, lead time for changes, MTTR — the DORA metrics applied to all five tracks independently and as a coordinated system.
Cache hit rate, miss rate, eviction rate, cost-per-hit — the economic metrics for semantic cache ROI. Queue depth, processing time, dead letter rate — the AI task reliability story. These are the metrics that justify AI infrastructure investment.
Test pass rate trends, flaky test rates, AI eval metric trends over time, and data quality score trends as production monitoring complements. Test degradation in CI often predicts production incidents.
The core phase — three monitoring layers, all active. Layer 1: cloud infra & APM — CPU, memory, network, service throughput (Datadog APM, Grafana+Prometheus, CloudWatch/Azure Monitor/GCP Cloud Monitoring). Layer 2: AI observability — LLM traces, retrieval precision, hallucination signals, cost per workflow, prompt version drift (LangFuse, Arize, OpenTelemetry). Layer 3: business value — task completion rate, user satisfaction, AI-assisted resolution rate. AI observability extends above APM — not instead of it. The SRE runbook for qualitative AI failure — where the AI is confident and wrong — remains the gap nobody has filled.
Delivery is complete only when all three stakeholders are satisfied: users have a working feature, agents have stable contracts, reviewers have a compliance trail. Most teams declare done after one.
Delivered features documented in ubiquitous language are understandable to domain experts, improving feedback quality and enabling non-technical reviewers to validate correctness — not just engineers.
Delivered features that work on web but not mobile, or have AI features dependent on stale data, are not done. Six-track delivery criteria is the new definition of feature completeness.
The delivered product's AI features are only as good as the data infrastructure beneath them. Fresh, high-quality polyglot data architecture enables features impossible with single-store approaches: semantic search, personalization, real-time AI context.
Coordinated delivery across five artifact types — ensuring backend API is deployed before the mobile app that depends on it, embedding index is warm before semantic search is enabled — is a delivery orchestration problem.
Intelligent middleware enables features that would be unusably slow synchronously: background document processing, batch embedding generation, async RAG pipeline execution. Delivered as async-first APIs.
Test coverage reports, AI eval metric baselines, and data quality scores are delivery artifacts — evidence for reviewers that the feature was validated against all four quality dimensions before shipping.
Delivered alongside the feature: agent audit trails (compliance and debugging), per-workflow cost reports (ROI visibility), and AI SLO dashboards (stakeholder trust). Enterprise buyers ask for all three in procurement now.
Change requests from users, agents, and reviewers must all route through the same structured requirement process — not three separate channels that create three inconsistent requirement artifacts.
DDD provides structured change management: which bounded contexts are affected? Which domain events need versioning? Which consumers need migration? Without DDD, AI-assisted refactoring produces unpredictable cascades.
Large-scale changes spanning multiple tracks — new data model that changes the API that changes the UI that changes the mobile app — can be delegated to parallel AI agents operating on isolated branches with explicit cross-track contracts.
Continuous embedding refresh via CDC means vector indices stay current as source data changes — automatically. Schema evolution across four store types requires careful sequencing and rollback planning traditional migration tooling wasn't designed for.
Pipeline changes are themselves software that goes through the same review and testing as application code. Prompt version changes in AI/ML pipeline trigger automated eval runs before promotion — change is data-driven, not intuition-driven.
Semantic cache invalidation when underlying data or embedding models change is harder than TTL-based cache. Cache-breaking migrations for AI workloads require careful coordination between data pipeline and cache layer.
Every change to an AI feature requires updating the golden dataset and re-running evals. Prompt changes, model version changes, and embedding model changes all trigger eval pipelines before promotion.
Every prompt change, model version change, or agent workflow change is treated as an experiment with an eval gate. AI SRE owns the promotion criteria: what eval improvement justifies the change? What condition triggers reversion?
High-impact intersections concentrate in three zones: Design (DDD, Full-Stack, Polyglot Data), Test/Build (Five-Track CICD, Test Suite), and Monitor (Polyglot Data, Middleware, AI-Aware SRE). These are the phases where the investment compounds — or where the debt accumulates. Everything else is medium. There are very few genuine lows.