Forces 03 and 04 cover what happens before the commit — how code is structured across
tracks, and how data is stored and retrieved. Forces 05 and 06 start after the commit.
Force 05 governs how five different artifact types move from developer to production
— and names the cross-artifact coordination problem that is genuinely unsolved in 2026.
Force 06 governs the middleware layer carrying AI task payloads — where, in deployments
I have seen, the difference between designing it well and ignoring it can be close to
half the inference bill.
// TL;DR — what you'll take away
- One feature can span five artifact types; coordinating their release order is the genuinely unsolved delivery problem of 2026.
- Infrastructure is the dependency-0 artifact: Terraform, Terragrunt, and Atlantis make it a pipeline, not a console.
- Semantic caching can cut inference costs 40–70%, and queue design for AI payloads — priority lanes, retry-with-context, DLQ instrumentation — is a first-class architectural decision.
◉
// Companion overview
All 8 Forces Reshaping How Software Gets Built — reference card for the full landscape. This article covers Forces 05 and 06.
force 05
For most of the last decade, "shipping" meant one thing: a container image went through
CI, tests passed, and it landed in Kubernetes. One artifact type, one portable pipeline
pattern — a largely solved problem. In 2026 a single feature routinely spans five artifact
types with different build tools, test frameworks, deployment mechanisms, and rollback
strategies — and the coordination between them is where engineering time now disappears.
The Five Artifact Types
Artifact 01
Web App
Next.js · Vercel · CDN
Fastest delivery. Deploy in seconds, roll back in seconds. No external gating.
Artifact 02
Mobile App
Fastlane · App Store · Play
The hard wall. 2–7 day review cycle regardless of how fast everything else moves. Cannot be accelerated.
Artifact 03
Backend Services
Docker · k8s · Helm
DDD microservices. Independent deployable but domain-coupled. Contract tests guard inter-service correctness.
Artifact 04
Data Pipelines
dbt Cloud · Airflow · Great Expectations
Schema migrations, transformation jobs, embedding generation. Failures here corrupt AI correctness downstream — silently.
Artifact 05
AI / ML Pipelines
MLflow · DVC · LangSmith
Model updates, vector index rebuilds, RAG pipeline changes. Validity depends on Artifact 04 being valid first.
One Feature, Five Builds
The practical consequence of five artifact types becomes visible the moment a feature
crosses track boundaries — which AI features almost always do. Consider a common
request: "Add semantic search to the product pages."
// Scenario: Add semantic search to the product pages
"Add semantic search to the mobile app and web — powered by our product catalogue."
Web
New search component, API calls to backend semantic endpoint. Build → test → deploy. Done in hours.
Mobile
New search screen, same API. Build → Fastlane → App Store submission.
⚠ 2–7 day review. Cannot be parallel with production rollout.
Backend
New semantic search service. Exposes vector retrieval endpoint consumed by both frontend tracks. Needs to be live before the web build ships.
Data Pipeline
Embedding generation job for the full product catalogue. Must complete and validate before the AI/ML pipeline has anything to serve. Schema contract with backend must hold.
AI / ML Pipeline
Vector index warm-up, similarity threshold tuning, RAG context configuration. Depends entirely on Data Pipeline artifact being valid and complete.
Each of these builds has passed its own tests. The failure mode is the dependency ordering:
web ships before backend is live, AI pipeline is promoted before data pipeline validates,
mobile users get the feature a week after web users but without a feature flag.
All five artifacts were individually correct. The coordinated release was not.
// The unsolved problem
Turborepo handles monorepo coordination within a track. GitHub Actions handles
orchestration between tracks. But the cross-artifact dependency graph for coordinated
release — ensuring the data pipeline artifact is valid before the AI/ML pipeline
artifact is promoted, ensuring the backend is live before the web build ships —
is genuinely unsolved at the tooling level in 2026. Teams that crack this compound
their delivery velocity dramatically. Most are building custom orchestration.
What You Can Control Now
The coordinated release problem is genuinely hard. Three practices compress delivery
today, without waiting for the tooling to catch up:
Define the dependency graph explicitly. Even without tooling that
enforces it, documenting which artifacts must be valid before others are promoted turns
an implicit failure mode into an explicit checklist. Most post-incident reviews reveal
that everyone assumed someone else was checking the dependency order.
Treat the data pipeline as a first-class artifact. It has its own CI
(dbt Cloud with data lineage), its own tests (Great Expectations, dbt tests), and its
own deployment gate. The AI/ML pipeline should have an automated check: "is the upstream
data pipeline artifact passing?" If not, the AI pipeline should not promote. This single
constraint eliminates the largest category of silent AI correctness failures.
Feature flags for coordinated cross-track releases. The mobile review
wall does not go away. But you can ship the backend and AI pipeline first, ship the web
frontend with a flag, submit the mobile build, and flip the flag when the mobile review
clears. The coordinated release is assembled in production rather than in the pipeline.
This is not elegant — it is the pragmatic answer to a constraint that has no tooling
solution yet.
The Foundation Pipeline: Cloud Infrastructure
Before the first container can be deployed and before the first Kafka topic can be created,
cloud infrastructure must exist. Most teams treat this as a one-time setup done by a senior
engineer clicking through a cloud console. In 2026, infrastructure is a pipeline artifact
with the same CI rights and responsibilities as application code — version-controlled,
peer-reviewed, and applied through automation.
// Infrastructure as Code pipeline — Terraform / Terragrunt / Atlantis
01
Terraform + Terragrunt — provision + DRY multi-environment config
Terraform defines cloud resources declaratively. Terragrunt adds DRY configuration management across dev / staging / prod environments and multiple cloud accounts. Without Terragrunt, Terraform module proliferation across environments becomes the infrastructure equivalent of copy-paste programming — identical state backends, provider configs, and variable files repeated per environment with manual drift.
02
PR → terraform plan → Atlantis review → apply on merge
Every infrastructure change goes through version control. terraform plan runs in CI on every PR — showing exactly which cloud resources will change before any engineer approves. Atlantis (or GitHub Actions with OIDC) applies on merge: non-production environments auto-apply, production requires an explicit approval gate. The cloud console becomes read-only. If a resource is not in Git, it should not exist.
03
Cloud-native IaC alternatives per provider
AWS CloudFormation / CDK (TypeScript-first infrastructure code, native integration with IAM and service limits), Azure Bicep (ARM replacement, cleaner declarative syntax, first-class AzureAD integration), GCP Deployment Manager / Config Connector (Kubernetes-native resource management for GCP). The tooling choice matters less than the principle: every cloud resource is a code change that goes through CI.
→
Infrastructure is the dependency-0 artifact
EKS clusters, RDS instances, MSK Kafka brokers, VPC configurations, and IAM roles must be provisioned before any other pipeline artifact can deploy to them. Including the infrastructure pipeline in the Force 05 coordinated release model resolves a specific category of failure: application pipelines succeed, container images are built and tested, but there is nothing in the cloud to deploy to — or the resource exists but with the wrong configuration, discovered at 2am.
// The AI-generated pipeline config opportunity
One place Force 05 does benefit from AI directly: pipeline configuration generation.
GitHub Actions workflows, Helm chart scaffolding, Terraform module composition —
these are high-structure, low-context tasks where AI produces reliable output.
The caution: AI-generated pipeline config should be reviewed by
someone who understands what it does, not just whether it runs. A pipeline that
deploys in the wrong order or skips a contract test silently is worse than one that
fails loudly.
// Force 05 tools · 2026
GitHub Actions
Turborepo / Nx
Fastlane
dbt Cloud
MLflow / DVC
Buildkite
ArgoCD
Terraform
Terragrunt
Atlantis
force 06
Force 06 is the most underrated force in the series — less discussed than vector
databases and agent orchestration, and rarely on the architecture diagram in early
design sessions. Yet at meaningful scale, the middleware design decision is worth
more than any model optimisation you will do.
The 40–70% Lever
40–70%
The range of LLM inference cost reduction reported in enterprise semantic-caching
deployments — and consistent with what I have seen. The exact number varies
by workload; the mechanism does not. At the volume of queries a mid-size
enterprise generates — support tickets, internal search, document summarisation,
code review assistance — a significant percentage of queries are semantically equivalent
even when the literal text differs. Semantic caching deduplicates them by meaning,
not exact string match. Most teams are not doing this. They are paying per call.
TTL Cache vs Semantic Cache
// What most teams do
TTL-Based Caching
Cache LLM responses by exact request hash. Expire after N minutes. A request for "summarise this contract" and "give me a summary of this contract" are two separate LLM calls, cached separately, billed separately.
Cache hit requires: exact same string → exact same cache key. Hit rate at production query diversity: low.
// What winning teams do
Semantic Caching
Embed the query. Retrieve semantically similar cached responses above a similarity threshold. "Summarise this contract" and "give me a summary of this contract" resolve to the same cached response if semantically above threshold.
Cache hit requires: semantic similarity above threshold → reuse cached response. Hit rate at production query diversity: in well-tuned deployments, 40–70% of calls never reach the model.
The invalidation challenge is real. A semantic cache entry is not stale based on
time — it is stale when the underlying data that informed the response changes.
For static content (documentation, policies, product specs), invalidating on document
update is enough. For dynamic content, the invalidation logic is part of the cache
design and must be defined upfront — teams that defer it consistently get it wrong
in production.
Queue Design for AI Workloads
The second half of Force 06 is less visible but equally structural. When queues carry
AI task payloads — not just simple message bodies, but agent instructions, conversation
context, tool call sequences, and multi-step workflow state — the standard queue design
assumptions break down.
Payload
AI tasks carry context, not just data. The payload includes the task instruction, relevant prior conversation state, tool permissions, and cost budget. Standard job queues were designed for lightweight serialised objects — not 4KB+ context payloads with structured metadata.
Priority
Real-time and batch AI tasks have fundamentally different latency requirements and cost profiles. A user-facing summarisation request (sub-2 second expectation) and a nightly document re-embedding job should not compete for the same queue depth. Priority lanes with separate consumer pools are a design requirement, not an optimisation.
Retry logic
Standard retry: wait and retry the same message. AI task retry: the failure context may have changed the optimal retry approach. A failed agent task that timed out on a complex step should retry with a simplified instruction, not an identical one. Retry-with-context is a design pattern most queue implementations require custom middleware to support.
Result cache
Completed AI task results should be cached at the queue level before the consumer processes them. If two different upstream services enqueue semantically equivalent tasks within a short window, only one LLM call should be made. This is queue-level deduplication by semantic similarity — the intersection of Force 06's two ideas.
// The dead letter queue insight
Standard DLQ thinking: failed messages are a problem to resolve. For AI tasks, the
failure context is often more valuable than the retry. The context that caused
the failure — the specific task instruction, the context window state, the tool call
that timed out — directly informs how the next attempt should be structured.
Teams that instrument their DLQs for AI tasks and surface the failure context to the
next attempt see significantly higher resolution rates than teams that retry identically.
The DLQ is not a dead end. It is a prompt engineering dataset.
Temporal for Durable Workflows
Multi-step agent workflows add a problem neither standard queues nor async/await
patterns handle well: surviving infrastructure failures mid-execution. If an agent is
partway through a 12-step document processing task and the worker crashes, the default
outcome is re-running from step one — paying again for every step already completed.
Temporal addresses this directly: it provides durable workflow execution with
guaranteed progress, activity retries with configurable policies, workflow versioning
for long-running tasks, and visibility into workflow state at any point. For AI agent
orchestration that involves expensive LLM calls, external API calls, and human review
gates, Temporal is the infrastructure choice that makes multi-step reliability possible
without building it from scratch.
// The design implication
Cache strategy and queue design are first-class architectural decisions in AI systems.
Most teams still treat them as infrastructure footnotes — something the platform team
handles. The gap between those two positions can be close to half your inference bill —
and every multi-step agent workflow that fails mid-execution and restarts from step one.
// Force 06 tools · 2026
Redis + GPTCache
BullMQ
Kafka
Temporal
Dapr
Upstash
RabbitMQ
the connection
Force 05 and Force 06 Are the Same Layer
Both forces govern what happens between the developer and the user: Force 05 the
delivery path from commit to production, Force 06 the execution path — how AI task
payloads move, how responses are cached, how failures are handled.
They interact directly. A Force 05 coordinated release of a new AI pipeline artifact
requires the semantic cache to be invalidated for affected response types — otherwise
the new model behaviour is masked by cached responses from the previous version.
A Force 06 queue priority lane design requires knowing which artifact types
are user-facing (real-time) and which are background (batch) — information that
comes from the Force 05 delivery model.
Teams that design Force 06 in isolation from Force 05 — that add semantic caching as
a performance optimisation without connecting it to the deployment pipeline — consistently
discover this the first time they ship a model update and users receive cached responses
from the old version for hours.
Forces 05 and 06 bring the delivery and runtime layers into the architecture review.
Both forces ask the same underlying question: what does the system need to do between
the moment code is written and the moment a user receives a response? The final two
forces — 07 and 08 — ask the inverse: how do you know if the answer the user received
was correct, and what happens to the system when it is not?
// tool references last reviewed · June 2026