The Build and the Store:
Forces 03–04

Forces 01 and 02 determine the quality ceiling before a line of code is written. Forces 03 and 04 operate during the build itself — in how code is structured across parallel workstreams, and in how the data beneath those workstreams is stored, retrieved, and made available to the AI systems running on top of it. Both forces represent the same underlying shift: the senior engineer is no longer primarily an implementor, and the database is no longer a single decision.

// TL;DR — what you'll take away
  • Full-stack in 2026 is six parallel tracks, and the senior engineer's job shifts from implementor to contract owner across them.
  • AI features need four data layers — relational, document, vector, streaming — and the design question is how data flows between them, not which one to pick.
  • Embedding freshness is the most common RAG production failure; a CDC pipeline fixes it structurally, not as a scheduled job.
// Companion overview
All 8 Forces Reshaping How Software Gets Built — reference card for the full landscape. This article covers Forces 03 and 04.
force 03
Force 03 / 08 Full-Stack Development

Six-Track Full-Stack AI Orchestration

Full-stack in 2026 is six parallel tracks. Most teams run two, treat three as "someone else's problem," and don't have the fourth. The teams compressing feature delivery from weeks to days are the ones where AI agents across all six tracks know about each other's contracts.

The term "full-stack developer" was always a compression — a way of saying "someone who can touch the whole system." In 2026, the whole system is considerably wider than it was in 2018. A modern production application does not have a frontend and a backend. It has six distinct technical tracks, each with its own toolchain, its own delivery cadence, and its own category of AI-assisted work.

The Six Tracks

Track 01
Frontend — Web
React · Next.js · TypeScript
UI components, routing, SSR, performance. AI generates components from design specs and interface contracts. Fastest iteration of any track.
Track 02
Frontend — Mobile
React Native · Swift · Kotlin
The hard wall. App Store review (2–7 days) does not accelerate regardless of how fast AI generates the code. The bottleneck is external.
Track 03
Backend — APIs + DDD
FastAPI · Go · domain services
Domain-bounded services with explicit API contracts. AI operates well here when bounded contexts are defined. This is where Force 02 pays its dividend.
Track 04
Data Engineering
dbt · Airflow · Spark
The unsexy blocker. Data engineering quality is the ceiling of every AI feature. Skipping it means your AI/ML track will eventually produce impressive garbage.
Track 05
AI / ML Track
LangChain · LlamaIndex · embeddings
RAG pipelines, agent orchestration, model integration. Only as good as the data engineering track that feeds it. The most visible track. Rarely the bottleneck.
Track 06
Platform — CICD + SRE
GitHub Actions · Terraform · k8s
Infrastructure-as-Code (Terraform/Terragrunt), CICD pipelines, reliability engineering. IaC is version-controlled and applied via PR-approved workflows (Atlantis). Cloud provisioning is the dependency-0 artifact — all other tracks deploy into the infrastructure this track defines.

Most teams in 2026 have Track 01, Track 03, and a partial Track 06. They treat Track 02 as a specialist concern, Track 04 as something they will "get to later," and Track 05 as the exciting new thing they are adding without Track 04 beneath it. This produces the classic AI feature outcome: something that looks impressive in a demo and degrades under real data.

// The coordination finding
Nearly all the AI dev tool adoption I have seen — across enterprises, startups, and my own teams — happens at the individual level. The teams compressing delivery are the ones where agents across tracks know about each other's contracts — not just the individual developer's productivity. The gain is not in faster typing. It is in parallel execution with shared context.

The Senior Engineer's New Job

The structural shift in Force 03 is a role change. When AI agents can implement within a bounded context reliably, the senior engineer's time is no longer primarily spent implementing. It is spent defining what the agent implements within.

// 2022 role
Implementor
Writes the code. Owns the logic. Reviews pull requests for correctness. Mentors junior developers on how to implement features. Spends most of the sprint writing.
// 2026 role
Contract Owner
Defines interface contracts between tracks. Sets the bounded context for agent tasks. Reviews code for domain invariant violations and contract compliance. Approves AI output at consequential gates. Spends most of the sprint designing what agents implement.

This is not a demotion — it is an amplification. A senior engineer who owns contracts across six tracks is accountable for more of the system than any single implementor could be. The skill required is different: systems thinking, interface design, and the ability to detect subtle domain violations in AI-generated code that looks correct but is architecturally wrong.

The practical tool for parallel multi-track agent orchestration in 2026 is Claude Code with git worktrees — which allows multiple agent sessions to work on separate tracks simultaneously without context collision. Each agent operates within its track's bounded context, and the contract between tracks is the coordination point the senior engineer owns.

// Force 03 tools · 2026
Claude Code + worktrees GitHub Copilot Workspace Cursor React + Next.js FastAPI + DDD dbt + Airflow LangChain / LlamaIndex GitHub Actions
force 04
Force 04 / 08 Data Architecture

Polyglot Data: SQL + Document + Vector + Streaming

The database decision is now four decisions. Relational for transactional consistency. Document for schema flexibility. Vector for semantic retrieval. Streaming for freshness. Most teams pick one and build AI features on top of it. The results are predictable.

The most common reason an AI feature fails in production is not the model. It is the data layer beneath it. The model is doing exactly what it was asked to do — it is retrieving from stale embeddings, hallucinating over gaps in a schema that was never designed for semantic retrieval, or returning results from a vector index that drifted from the source database three days ago.

These are data architecture problems, not AI problems. They are invisible until after the AI feature ships, at which point they are extremely expensive to fix because the schema, the retrieval logic, and the embedding pipeline are all entangled.

The Four Data Layers

SQL
Relational — Transactional Source of Truth
PostgreSQL (+ pgvector extension)
ACID transactions, complex joins, referential integrity. Where your business-critical data lives — orders, users, payments, domain aggregates. The authoritative source every other layer derives from.
// pgvector turns PostgreSQL into a lightweight vector store — good for <10M embeddings, avoids managing a separate vector service
☁ Managed: AWS RDS / Aurora PostgreSQL · Azure Database for PostgreSQL · GCP Cloud SQL / Cloud Spanner
DOC
Document — Flexible Schema Evolution
MongoDB
Nested structures, schema-free content, rapid iteration on data shapes. Where your product catalogue, user-generated content, configuration, and anything with variable structure lives. Does not fight you when the schema changes.
// AI features that generate structured-but-variable output (recommendations, summaries, metadata) store naturally in document form
☁ Managed: AWS DocumentDB (MongoDB-compatible) · Azure Cosmos DB (MongoDB API) · GCP Firestore / MongoDB Atlas
VEC
Vector — Semantic Retrieval for RAG
pgvector · Qdrant · Pinecone
Stores embeddings — numerical representations of meaning — and retrieves by semantic similarity rather than exact match. This is the layer that makes RAG possible: when a user asks a question, the vector store finds the most relevant context to inject into the prompt. Without this layer, your AI can only use training data.
// Embedding freshness is the most common RAG failure. Vectors that don't reflect current data produce authoritative-sounding wrong answers
☁ Managed: AWS OpenSearch Serverless (k-NN) / Bedrock Knowledge Bases · Azure AI Search (vector) · GCP Vertex AI Vector Search
STR
Streaming — Real-Time Freshness via CDC
Kafka · RisingWave · Debezium
Change Data Capture (CDC) streams database changes in real time to downstream consumers. In AI systems, this keeps vector indexes and materialized views current without batch reindexing jobs. When a record changes in PostgreSQL, the event streams to Kafka, the embedding pipeline regenerates the vector, and the RAG layer is current within seconds — not hours.
// Without CDC, your RAG results are as stale as your last reindex batch. With CDC, freshness becomes a pipeline property, not a scheduled job
☁ Managed: AWS MSK (Kafka) / Kinesis · Azure Event Hubs (Kafka API) · GCP Pub/Sub / Datastream (CDC)

The Design Question Is Flow, Not Choice

The single most important architectural insight for Force 04 is this: the decision is not which database to use. It is how data flows between all four layers, and what transformation logic sits between them.

// Data flow: source → transformation → retrieval
PostgreSQL
source of truth
CDC (Debezium)
change capture
Kafka
event stream
Embedding pipeline
vectorise changed records
Qdrant / pgvector
fresh semantic index
RAG retrieval
agent context
dbt sits across this flow as the transformation layer — normalising, enriching, and materialising views that feed multiple downstream consumers. It is the connective tissue that ensures each layer's input is clean.

Most teams encounter this architecture after the fact. They build the AI feature on PostgreSQL alone, discover that semantic search requires a vector layer, add pgvector, discover that their embeddings go stale, add a nightly batch reindex, discover that the nightly batch is causing RAG hallucinations during the day, and then — finally — design the CDC pipeline they should have started with.

// A common discovery conversation

"Our RAG feature works in testing. In production it gives wrong answers." The first question: when were the embeddings last updated? "We regenerate them nightly." And the data it is retrieving from — how often does that change? "Constantly. It is a product catalogue." The conversation ends with a CDC pipeline design and a three-week rebuild of the data layer that could have been designed correctly in the first sprint.

↳ see also · Article 10 — The Eval and the Runbook — stale embeddings fail silently in production; this is how you monitor and catch them.

When to Add Each Layer

Relational
Day one. Always. PostgreSQL is the non-negotiable foundation — your transactional data, your domain aggregates, your source of truth. Adding pgvector later is low-friction.
Document
When your schema is genuinely variable and evolves rapidly. Not as a default — as a deliberate choice for specific data types that fight relational structure. Avoid using it to escape data modelling discipline.
Vector
As soon as semantic search or RAG enters the roadmap. Start with pgvector if data volume is manageable (<10M vectors). Migrate to Qdrant or Pinecone when scale, performance, or metadata filtering requires it. Do not wait until it is a production incident.
Streaming
When data freshness becomes a correctness requirement. If users will notice stale RAG results, or if your AI features depend on real-time data, design the CDC pipeline in the same sprint as the feature — not the sprint after you discover the problem.
// Force 04 tools · 2026
PostgreSQL + pgvector MongoDB Qdrant Pinecone Redis (semantic cache) Kafka + Debezium RisingWave dbt
the connection

Why Force 03 and Force 04 Are the Same Problem

The six-track model and the polyglot data architecture are not separate concerns. They are the same architectural decision at different levels. The Data Engineering track (Track 04) is the layer that feeds the AI/ML track (Track 05). The streaming layer (Force 04) is what makes the data engineering track real-time. And the contract that the senior engineer owns across tracks (Force 03) includes the data contracts between the relational layer and the vector layer.

Teams that get Force 03 right but ignore Force 04 build fast and retrieve slow. Teams that get Force 04 right but ignore Force 03 have great data infrastructure serving agents that are not coordinated. Getting both right means the AI/ML track has fresh, semantically retrievable data, operating under contracts that the agent can respect — and a senior engineer who owns the interface between them.

Forces 03 and 04 are where the architectural decisions of Forces 01 and 02 get tested. A good domain model (Force 02) maps cleanly to bounded backend services (Force 03). A well-specified requirement (Force 01) produces a backend service that generates the right events for the data pipeline (Force 04). The next two forces — 05 and 06 — move from building and storing to shipping and routing: how AI is changing CI/CD pipelines and the middleware layer that connects everything in between.
// tool references last reviewed · June 2026