You Already Know AI —
You Just Called It Search

In 2017, a team at Google published a paper that most developers never read. Eight authors. Nine pages. A title that sounds like philosophy: "Attention Is All You Need." That paper retired the architecture that had dominated natural language processing for a decade — and replaced it with the transformer. If you have used a language model in the last three years, you have been living inside the consequences of that paper.

// TL;DR — what you'll take away
  • Every core AI concept has a direct search-engine ancestor — tokenisation, attention, vector retrieval, RAG. You already know these problems under different names.
  • The four-layer AI stack (LLM → RAG → MCP → Agents) and the three-role model that keeps it governable.
  • A seven-item requirements checklist for delegating work to agents — and five things never to delegate.

But here is the part almost no one says when explaining AI to developers: you already understand the core concepts. You learned them in a different context, under different names. Search engine development — especially the decade between 2005 and 2015 — was applied machine learning at scale. The vocabulary was different. The problems were identical. And if you can connect the two, the entire AI stack stops being magic and starts being engineering.

before & after

The Year the Architecture Changed

Before 2017, machine learning systems processed language sequentially — reading a sentence word by word, carrying a kind of rolling memory forward. The approach worked. It did not scale. Long documents, complex instructions, multi-step reasoning: all degraded predictably as context grew. The model read everything, but attended to almost nothing.

2017
// The inflection point
The transformer replaced sequential processing with parallel processing and introduced attention — a mechanism that learns which parts of an input to weight more heavily when producing an output. The model does not read left to right. It considers all tokens simultaneously and decides what matters.

The result: better language understanding, larger context windows, generalisation across domains. Everything you call "AI" today — GPT, Claude, Gemini, Mistral, Llama, Codex — runs on some variant of this architecture. The transformer is the Intel 8086 of this era: not the final word, but the foundation everything else is built on.

the lineage

Search Was Your First Teacher

The search engine problems solved between 2005 and 2015 were not different problems from the ones AI solves now. They were the same problems, solved with less compute and simpler models. What changed in 2017 was the scale of the solution — not the shape of the problem.

Walk through the concepts side by side. If you built on search infrastructure, or just thought carefully about how search worked, you already have the mental model:

// concept lineage: search → AI
Search (2005–2015) What it solved AI equivalent (2026)
Query tokenisation Split raw text into units the system can operate on LLM tokenisation
TF-IDF / BM25 ranking Score documents by relevance to a query Attention weights
Inverted index Map terms to documents for fast retrieval Vector database / embedding index
Query expansion Retrieve related concepts, not just exact matches RAG / context retrieval
PageRank / link graph Score nodes by their connections, not just their content Graph RAG / knowledge graphs
Personalised ranking Adapt results to user context and history Fine-tuning / RLHF / agent memory
Autocomplete Predict the most likely next token given prior input Next-token prediction (LLM core)
Session context Maintain relevance across a multi-query session Conversation context / in-context memory

If you understand why inverted indexes exist, you understand why vector databases exist. If you understand what query expansion solved, you understand what RAG solves. If you understand PageRank, you understand why GraphRAG retrieves better than flat keyword search for connected data. The problem space did not change. The architecture did. The scale changed dramatically. But the instinct is the same.

Concepts are one thing; the journey is another. Here is what actually happens between a keystroke and a result — seven steps, each one a named piece of technology you can point at:

// the honest version

Most of what gets explained as AI magic is search engineering with better compute and a different name. Start there, and the rest becomes learnable.

the stack

The Four Layers Built on 2017

The transformer paper opened a capability gap. The industry spent the next six years filling it. The result is a four-layer stack that almost every AI deployment in 2026 sits inside. Each layer adds exactly one thing. Understand what each adds, and you understand the architecture. Treat any layer as a black box, and you cannot debug it when it fails — and it will fail.

01
LLM — Large Language Model
Adds: text → meaning → text
The reasoning engine. Takes text in; produces text out. Understands context, follows instructions, generates content. Claude, GPT-4o, Gemini 2.5, Llama 3 — all LLMs. The core capability from the 2017 architecture shift. Alone, it is powerful and stateless. It knows only what you put in the prompt.
02
RAG — Retrieval-Augmented Generation
Adds: static knowledge → live retrieval
The memory layer. A standalone LLM only knows what it was trained on. RAG adds real-time retrieval: when a prompt arrives, relevant documents are fetched from a knowledge base — your codebase, your docs, your database — and injected into context. The model responds with current, specific information instead of generalised training data.
03
MCP — Model Context Protocol
Adds: isolated → tool-connected
The integration layer. Connects the LLM to external tools and systems — databases, APIs, file systems, version control, communication platforms. Without MCP (or an equivalent tool-calling layer), the model can only generate text about the world. With it, the model can act on the world. That distinction is the line between a chatbot and an agent.
04
Agents — Autonomous Task Execution
Adds: single-call → multi-step orchestration
The execution layer. Instead of a single prompt-response cycle, agents operate across multiple steps — planning a task, calling tools, evaluating intermediate outputs, correcting course, and continuing until a defined goal is met. An agent is not a smarter chatbot. It is a controlled task-execution loop. In 2026 the tools are Claude Code, GitHub Copilot Workspace, Cursor, Amazon Q Developer, and Gemini Code Assist. The underlying model barely matters — the orchestration does.

That four-layer stack is a static picture. Here it is in motion — the same kind of lifecycle as search, from your prompt to the answer, with each component doing its one job:

// AI lifecycle · prompt → answer
1
You write a prompt
Your message — plus the system instructions and chat history — becomes the model's entire starting point. Nothing is being looked up in a database of stored answers.
context window
2
Tokenize
The text is split into tokens — word-pieces drawn from a fixed vocabulary (understandingunder + standing).
subword tokenizer · BPE
3
Embed
Each token becomes a vector — a list of numbers — so meaning turns into math: similar ideas land near each other (king near queen). The same primitive that powers semantic search.
embeddings
4
Retrieve context · RAG
The prompt's vector is matched against a vector database to pull in the most relevant documents — your docs, your code, your knowledge base — and inject them into the context. This is search's retrieval step, reborn.
vector search · RAG
5
Attend
The model weighs every token against every other, all at once, to work out what refers to what and what matters. This is the 2017 transformer doing its work.
transformer · self-attention
6
Generate, token by token
It predicts the most likely next token, picks one, appends it, and runs again — building the answer one piece at a time. A very large, very capable autocomplete.
next-token prediction · autoregressive loop
7
Act with tools · MCP
When the task needs more than text, the model calls tools — query a database, hit an API, edit a file. This is the line between a chatbot and an agent.
tool-calling · MCP
8
Review & answer
For multi-step work, a planner → worker → reviewer loop iterates, and a human approves anything consequential before it ships. Then the answer streams to your screen, token by token.
agent orchestration · human-in-the-loop
Same shape as search — retrieve, then produce — with one genuinely new step: stage 6 generates new text, where search only ever ranks pages that already exist.
↳ see also · Article 06 — The Build and the Store — the polyglot data layer behind that RAG step: vector, document, streaming, kept fresh.
the model

Three Roles. Not Two.

Most teams structure AI deployment around two roles: the AI system and the human using it. That model fails in production at a predictable rate. The failure point is almost always the same: no one defined who reviews what before it mattered. By the time it matters, the cost is already paid.

There are three roles, and all three apply whether you are deploying a single Copilot instance or orchestrating ten autonomous coding agents across a sprint.

// the three-role model · mandatory in production
Role 01 · End Users
Initiate intent and evaluate final output. They describe what they need. They approve or reject results. They do not write specifications for agents. They do not debug agent chains. Their only job is to be clear about desired outcomes — and to evaluate whether the output actually meets them.
Role 02 · Agents (three sub-roles)
Execute the work. Three sub-roles within any non-trivial agent system:
Planner — Receives user intent and decomposes it into a sequence of steps with defined inputs and outputs. This is where most agent failures originate: if the plan is underspecified, every downstream step amplifies the error.
Worker — Executes individual steps. Calls tools, retrieves data, generates outputs. Operates inside a bounded context. Does not make architectural decisions.
Reviewer (agent-side) — Checks the worker's output against the step's success criteria before handing off. This is not the human reviewer — it is a pre-check, not an approval gate. It reduces noise reaching humans, not risk.
Role 03 · Human Reviewers
The mandatory oversight layer. They approve, reject, or correct agent output at defined checkpoints. In a production system, a human reviewer must be in the loop at every point where the agent's action has external consequences — write operations, communication, financial transactions, deployments. The agent reviewer sub-role can be automated. The human reviewer cannot be replaced. Any architecture that removes human review from consequential actions is a prototype, not a system.
boundaries

Domain-Driven Design and Agent Scope

The question that kills most agent deployments is not "which model should I use?" It is: "what exactly is this agent supposed to do?" Vague task assignment produces vague output. An agent asked to "improve the onboarding experience" has no way to succeed — the task has no defined input, no bounded context, no measurable completion criteria, and no clear failure mode. It will do something. That something will not be what you meant.

Domain-Driven Design provides the natural unit of agent work: the bounded context. A bounded context defines a domain area with its own language, its own data, and its own rules. It has explicit inputs and outputs. It has ownership. It has defined edge cases. These are exactly the properties an agent needs to operate reliably.

// bounded context test — is this task agent-ready?

If the task can be described in one sentence with clear inputs, outputs, and a success condition → it is agent-ready.

If completing the task requires crossing multiple domain boundaries → it needs human orchestration before agents can handle any part of it.

If the failure mode cannot be defined in advance → it is not ready for agent execution, regardless of how sophisticated the model is.

If the agent's output would require organisational context an LLM does not have — cultural norms, relationship history, unwritten constraints — → a human must be in the decision path.

DDD was designed to manage complexity in large software systems by keeping domain concerns separate and explicit. The same complexity that DDD manages is the same complexity that defeats autonomous agents. A billing agent that accidentally touches user authentication data because the bounded context was not defined does not produce a billing bug. It produces an incident. The solution is the same in both cases: draw the boundary first.

requirements

What Agents Actually Need

The following is not a list of nice-to-haves. These are the minimum requirements for an AI agent operating in a real system. If any of these are absent, the deployment will eventually fail — the only variable is when, and what the cost is.

  1. 01
    Bounded Context
    The agent's scope must be defined before deployment. What domain does it operate in? What data can it access? What actions can it take? What is explicitly out of scope? Without this, the agent is free to interpret the task — which means it will, incorrectly, in ways you will not predict.
  2. 02
    Defined Output Format
    Agents do not improvise presentation. The expected output — structured JSON, a markdown document, a file change, an API call — must be specified. Ambiguous output requirements produce outputs that are technically correct and practically useless.
  3. 03
    Least-Privilege Access
    The agent should have access only to what it needs for the assigned task. Over-permissioned agents are not just a security risk — they are a reliability risk. Access to more data means more surface area for hallucination and unintended side effects.
  4. 04
    Rollback or Dry-Run Capability
    Any agent that writes to production systems must have a mechanism to preview or undo its actions. A dry run is not optional engineering polish. It is the difference between a recoverable mistake and an incident. No exceptions for agents touching live data.
  5. 05
    Human Review Gate
    At every consequential action — write operations, communications, deployments, financial transactions — a human reviewer must be in the loop. Define the review checkpoints before deployment, not after the first incident forces you to.
  6. 06
    Cost Awareness Per Run
    Every agent call runs on a meter. Multi-step agents multiply token costs across every tool call, every retrieval, every intermediate generation. Know the expected cost per run before deployment. Set a ceiling. Monitor it. Agents with unbounded loops and no cost cap are how $500/month tools become $50,000 incidents.
  7. 07
    Explicit Failure Definition
    "Done" must be defined. So must "failed." An agent without a clear failure state will run indefinitely — generating costs, producing output, and reporting success with equal confidence regardless of what it actually produced. Define the exit condition. Both of them.
the other list

What Not to Delegate to Agents

The list of things agents should not do describes operational realities that apply to any autonomous system — not AI limitations. Model capability is rarely the limiting factor. Whether the task structure supports reliable autonomous execution almost always is.

// do not delegate these to agents
  • Anything without a definition of done. If you cannot describe success in advance, the agent cannot reach it. It will produce output — it will not produce the right output.
  • Decisions requiring organisational context an LLM does not have. Culture, relationship history, political dynamics, unwritten constraints — these are not in the training data. They cannot be retrieved. They must be held by a human in the loop.
  • Actions that compound on failure. Database migrations, bulk record updates, mass communications, production deployments without a review gate. One step in the wrong direction multiplied across ten thousand rows is not a bug. It is a crisis.
  • Any task where the only reviewer is the agent itself. Agent-side self-review is a pre-check. It reduces noise. It does not replace external review. Self-certified output reaching production without human approval is a process failure, not an AI capability.
  • Anything you have never done manually and fully understood. Agents accelerate processes you already control. They do not substitute for understanding a process you have never owned.

The most expensive AI failures in 2025–2026 share a pattern: autonomous agents with broad permissions, no human review gate, and unclear completion criteria. Some teams discovered this after burning through investor runway at rates that would have been unthinkable before AI tooling made it technically possible to run ten agents simultaneously. The model was not the problem. The requirements were absent.

The next articles in this series move through the structural forces reshaping software development in 2026 — not as a trend report, but as a practitioner's map of where AI is genuinely useful, where it creates new risk, and what each phase of the SDLC actually needs from these tools. The foundation is here. The application is next.