The Language Model, Visualised

The four moving parts of a language model, pulled apart and made playable. Three of them you already met on the search side. One is genuinely new.

The essay made the argument; the search companion let you play with the ancestors. Now the machine itself. A prompt enters as text and leaves as an answer — here are the four stages in between, each one a small interactive you can step through. Press play, watch the numbers move, and the “magic” quietly becomes arithmetic.

// you are in the companion · read the essay first You Already Know AI – You Just Called It Search

// 01 · prompt → answerThe AI Lifecycle

Start with the whole journey. A prompt enters as text and leaves as an answer – here are the eight stages in between, the same shape as the search pipeline, with one genuinely new step. Press play and watch “explain attention simply” travel from your keyboard to a reply, then meet the four core parts below.

// interactive · step through prompt → answer open full screen ↗

// 02 · text becomes numbersTokenisation

A model never sees letters. Your text is first broken into tokens — subword pieces from a fixed vocabulary, by Byte-Pair Encoding — and each token is swapped for an integer ID. Common words stay whole; rarer ones split into reusable pieces. Watch a short sentence become the sequence of numbers the model actually reads.

// interactive · step through a BPE split open full screen ↗

// 03 · meaning becomes geometryEmbeddings

Each token ID is looked up in an embedding table and becomes a vector — a list of numbers that places it in space, where nearness means similarity. king lands near queen; the relationship between them is a direction you can do arithmetic with. It is the very same vector space that powers semantic search and RAG retrieval.

// interactive · map words, do word-math open full screen ↗

// 04 · context, computedAttention

A word means nothing alone. Self-attention — the heart of the 2017 transformer — lets every token look across the whole sentence and decide which others to listen to. Watch bank resolve to a riverbank when it attends to river, and to a financial institution when the neighbours change. It is relevance scoring, the search engineer’s craft, turned inward on the sentence.

// interactive · watch the weights resolve a word open full screen ↗

// 05 · writing, one piece at a timeNext-Token Generation

Finally the model writes. It produces a probability over every token in its vocabulary, picks one, appends it, and runs again — an autoregressive loop, a very large autocomplete. This is the one stage with no search twin: a search engine ranks pages that already exist; this loop generates text that did not.

// interactive · predict, decode, append, repeat open full screen ↗

// the pointSame Machine, One New Trick

Tokenise, embed, attend — three stages, three direct descendants of search: query tokenisation, dense vector retrieval, relevance ranking. The fourth, generation, is the genuinely new step the 2017 architecture unlocked. That is the whole lineage in four playable parts: mostly the search engine you already understood, plus one thing it could never do. The vocabulary changed. The instincts did not.