In 2017, a team at Google published a paper that most developers never read. Eight authors. Nine pages. A title that sounds like philosophy: "Attention Is All You Need." That paper retired the architecture that had dominated natural language processing for a decade — and replaced it with the transformer. If you have used a language model in the last three years, you have been living inside the consequences of that paper.
- Every core AI concept has a direct search-engine ancestor — tokenisation, attention, vector retrieval, RAG. You already know these problems under different names.
- The four-layer AI stack (LLM → RAG → MCP → Agents) and the three-role model that keeps it governable.
- A seven-item requirements checklist for delegating work to agents — and five things never to delegate.
But here is the part almost no one says when explaining AI to developers: you already understand the core concepts. You learned them in a different context, under different names. Search engine development — especially the decade between 2005 and 2015 — was applied machine learning at scale. The vocabulary was different. The problems were identical. And if you can connect the two, the entire AI stack stops being magic and starts being engineering.
The Year the Architecture Changed
Before 2017, machine learning systems processed language sequentially — reading a sentence word by word, carrying a kind of rolling memory forward. The approach worked. It did not scale. Long documents, complex instructions, multi-step reasoning: all degraded predictably as context grew. The model read everything, but attended to almost nothing.
The result: better language understanding, larger context windows, generalisation across domains. Everything you call "AI" today — GPT, Claude, Gemini, Mistral, Llama, Codex — runs on some variant of this architecture. The transformer is the Intel 8086 of this era: not the final word, but the foundation everything else is built on.
Search Was Your First Teacher
The search engine problems solved between 2005 and 2015 were not different problems from the ones AI solves now. They were the same problems, solved with less compute and simpler models. What changed in 2017 was the scale of the solution — not the shape of the problem.
Walk through the concepts side by side. If you built on search infrastructure, or just thought carefully about how search worked, you already have the mental model:
| Search (2005–2015) | What it solved | AI equivalent (2026) |
|---|---|---|
| Query tokenisation | Split raw text into units the system can operate on | LLM tokenisation |
| TF-IDF / BM25 ranking | Score documents by relevance to a query | Attention weights |
| Inverted index | Map terms to documents for fast retrieval | Vector database / embedding index |
| Query expansion | Retrieve related concepts, not just exact matches | RAG / context retrieval |
| PageRank / link graph | Score nodes by their connections, not just their content | Graph RAG / knowledge graphs |
| Personalised ranking | Adapt results to user context and history | Fine-tuning / RLHF / agent memory |
| Autocomplete | Predict the most likely next token given prior input | Next-token prediction (LLM core) |
| Session context | Maintain relevance across a multi-query session | Conversation context / in-context memory |
If you understand why inverted indexes exist, you understand why vector databases exist. If you understand what query expansion solved, you understand what RAG solves. If you understand PageRank, you understand why GraphRAG retrieves better than flat keyword search for connected data. The problem space did not change. The architecture did. The scale changed dramatically. But the instinct is the same.
Concepts are one thing; the journey is another. Here is what actually happens between a keystroke and a result — seven steps, each one a named piece of technology you can point at:
Most of what gets explained as AI magic is search engineering with better compute and a different name. Start there, and the rest becomes learnable.
The Four Layers Built on 2017
The transformer paper opened a capability gap. The industry spent the next six years filling it. The result is a four-layer stack that almost every AI deployment in 2026 sits inside. Each layer adds exactly one thing. Understand what each adds, and you understand the architecture. Treat any layer as a black box, and you cannot debug it when it fails — and it will fail.
That four-layer stack is a static picture. Here it is in motion — the same kind of lifecycle as search, from your prompt to the answer, with each component doing its one job:
Three Roles. Not Two.
Most teams structure AI deployment around two roles: the AI system and the human using it. That model fails in production at a predictable rate. The failure point is almost always the same: no one defined who reviews what before it mattered. By the time it matters, the cost is already paid.
There are three roles, and all three apply whether you are deploying a single Copilot instance or orchestrating ten autonomous coding agents across a sprint.
Domain-Driven Design and Agent Scope
The question that kills most agent deployments is not "which model should I use?" It is: "what exactly is this agent supposed to do?" Vague task assignment produces vague output. An agent asked to "improve the onboarding experience" has no way to succeed — the task has no defined input, no bounded context, no measurable completion criteria, and no clear failure mode. It will do something. That something will not be what you meant.
Domain-Driven Design provides the natural unit of agent work: the bounded context. A bounded context defines a domain area with its own language, its own data, and its own rules. It has explicit inputs and outputs. It has ownership. It has defined edge cases. These are exactly the properties an agent needs to operate reliably.
If the task can be described in one sentence with clear inputs, outputs, and a success condition → it is agent-ready.
If completing the task requires crossing multiple domain boundaries → it needs human orchestration before agents can handle any part of it.
If the failure mode cannot be defined in advance → it is not ready for agent execution, regardless of how sophisticated the model is.
If the agent's output would require organisational context an LLM does not have — cultural norms, relationship history, unwritten constraints — → a human must be in the decision path.
DDD was designed to manage complexity in large software systems by keeping domain concerns separate and explicit. The same complexity that DDD manages is the same complexity that defeats autonomous agents. A billing agent that accidentally touches user authentication data because the bounded context was not defined does not produce a billing bug. It produces an incident. The solution is the same in both cases: draw the boundary first.
What Agents Actually Need
The following is not a list of nice-to-haves. These are the minimum requirements for an AI agent operating in a real system. If any of these are absent, the deployment will eventually fail — the only variable is when, and what the cost is.
-
01Bounded ContextThe agent's scope must be defined before deployment. What domain does it operate in? What data can it access? What actions can it take? What is explicitly out of scope? Without this, the agent is free to interpret the task — which means it will, incorrectly, in ways you will not predict.
-
02Defined Output FormatAgents do not improvise presentation. The expected output — structured JSON, a markdown document, a file change, an API call — must be specified. Ambiguous output requirements produce outputs that are technically correct and practically useless.
-
03Least-Privilege AccessThe agent should have access only to what it needs for the assigned task. Over-permissioned agents are not just a security risk — they are a reliability risk. Access to more data means more surface area for hallucination and unintended side effects.
-
04Rollback or Dry-Run CapabilityAny agent that writes to production systems must have a mechanism to preview or undo its actions. A dry run is not optional engineering polish. It is the difference between a recoverable mistake and an incident. No exceptions for agents touching live data.
-
05Human Review GateAt every consequential action — write operations, communications, deployments, financial transactions — a human reviewer must be in the loop. Define the review checkpoints before deployment, not after the first incident forces you to.
-
06Cost Awareness Per RunEvery agent call runs on a meter. Multi-step agents multiply token costs across every tool call, every retrieval, every intermediate generation. Know the expected cost per run before deployment. Set a ceiling. Monitor it. Agents with unbounded loops and no cost cap are how $500/month tools become $50,000 incidents.
-
07Explicit Failure Definition"Done" must be defined. So must "failed." An agent without a clear failure state will run indefinitely — generating costs, producing output, and reporting success with equal confidence regardless of what it actually produced. Define the exit condition. Both of them.
What Not to Delegate to Agents
The list of things agents should not do describes operational realities that apply to any autonomous system — not AI limitations. Model capability is rarely the limiting factor. Whether the task structure supports reliable autonomous execution almost always is.
- Anything without a definition of done. If you cannot describe success in advance, the agent cannot reach it. It will produce output — it will not produce the right output.
- Decisions requiring organisational context an LLM does not have. Culture, relationship history, political dynamics, unwritten constraints — these are not in the training data. They cannot be retrieved. They must be held by a human in the loop.
- Actions that compound on failure. Database migrations, bulk record updates, mass communications, production deployments without a review gate. One step in the wrong direction multiplied across ten thousand rows is not a bug. It is a crisis.
- Any task where the only reviewer is the agent itself. Agent-side self-review is a pre-check. It reduces noise. It does not replace external review. Self-certified output reaching production without human approval is a process failure, not an AI capability.
- Anything you have never done manually and fully understood. Agents accelerate processes you already control. They do not substitute for understanding a process you have never owned.
The most expensive AI failures in 2025–2026 share a pattern: autonomous agents with broad permissions, no human review gate, and unclear completion criteria. Some teams discovered this after burning through investor runway at rates that would have been unthinkable before AI tooling made it technically possible to run ten agents simultaneously. The model was not the problem. The requirements were absent.