The Library Is the Work

Every team I have watched struggle with an AI agent reached for the same fix first: a bigger model, a newer one, a cleverer prompt. Almost nobody reached for the thing that was actually broken, which was the knowledge the agent was standing on. I spent the better part of two decades building payment systems where the single most important rule in the whole platform often lived inside one person’s head. So when Google published a small specification this year, I recognised the problem it was quietly trying to solve. It was my problem. It has been my problem for years.

I have onboarded onto enough payment platforms to know how this goes. You join a team. You are handed a repository, a wiki, and a few years of chat history. Somewhere in that pile is the rule that actually matters: how a settlement is reconciled, why a particular timeout is set to exactly that value, which edge case in a refund flow will quietly cost the company real money if you get it wrong. That rule is almost never written down in one place. It is half in a wiki page last edited two years ago, half in a code comment, and the rest inside the head of the one engineer who happens to be on leave the week you need it.

For human engineers, we made an uneasy peace with this. We call it onboarding, and we accept that it takes months. But we are now handing the same mess to AI agents and expecting them to perform on day one. An agent is the most literal new hire you will ever have. It cannot lean over and ask the senior engineer at the next desk. It cannot read the silence in a stale runbook and know to be suspicious of it. It knows only what you put in front of it, and most of what it needs is scattered across systems that were never designed to talk to each other.

the bottleneck

The bottleneck was never the model

There is an industry reflex right now to treat every agent failure as a model failure. The agent gave a wrong answer, so we reach for a stronger model, as though raw intelligence were the missing ingredient. It usually is not. The models are extraordinary, and they keep getting stronger, yet they stall on the same unglamorous thing: context. Ask an agent how to compute weekly active users from your event stream and it does not need more reasoning power. It needs your table schemas, your metric definitions, and the one join path your data team agreed on three years ago. None of that is a question of intelligence. It is a question of whether the knowledge exists somewhere an agent can actually reach.

I learned a version of this the expensive way, long before agents existed. In Germany I once cut a cloud bill by around seventy percent, and people assumed it was a feat of optimisation. It was nothing of the sort. The waste had been sitting in plain view for a long time. What had been missing was an environment where the knowledge of why things were built a certain way could surface and be questioned out loud. The problem was never compute. It was that no one could see the whole picture in one place. Agents fail for that same reason, only faster and more confidently.

A system that merely works is not finished. A well-crafted one can be explained, by a person, and now, by a machine.

retrieval

What RAG actually does, and where it strains

For the last few years our answer to this has been retrieval-augmented generation. The idea is sound and I am not here to bury it. Instead of cramming everything into the prompt, you store your documents as embeddings, and when the agent needs something, you fetch the chunks most similar to the question and hand them over. For a great many problems, that is enough. RAG earned its place honestly.

But run it in production long enough and the strain shows. RAG retrieves; it does not curate. It will faithfully return the three most similar paragraphs from a document that has been wrong for a year, with no idea that it is wrong. It re-reads the same scattered sources on every single query, re-deriving an answer that should have been settled once and written down properly. And every team builds the same retrieval plumbing from scratch, against its own silos, in its own incompatible shape. You end up with real intelligence sitting on top of a swamp. The retrieval works. The ground underneath it was never tended.

the spec

What Google actually shipped

In June 2026, Google Cloud published the Open Knowledge Format, or OKF, a vendor-neutral specification for representing the knowledge that AI systems need. The first thing worth understanding is what it is not. It is not a service, a platform, or a runtime. There is no SDK you are required to adopt and no infrastructure you have to stand up. It is, almost stubbornly, just files.

Concept

A unit of knowledge is a file

Each concept is a single markdown document. One idea, one file: a metric definition, a runbook, the meaning behind a table. Written by a person or generated by an agent, and readable by both without anything special installed.

Bundle

A collection of those files is a folder

A directory of documents is a Knowledge Bundle, the unit you ship and share. It lives in a git repository next to your code, and you review changes to it in pull requests, the same way you review anything else that matters.

Graph

The structure is just links

Concepts point at one another with ordinary markdown links, forming a graph a human can click through and an agent can traverse. No proprietary knowledge-graph database. The web’s oldest idea, turned toward the things a model needs to know.

Frontmatter

It asks for almost nothing

In version 0.1 the only required field is type, declared in a small block of YAML at the top of each file. Title, description, tags, and a timestamp are optional. The spec demands very little, which is precisely why it has a chance of travelling between teams.

If you have ever kept an Obsidian vault, maintained an AGENTS.md file, or hand-curated a folder of notes for a model to read, you already know this shape. Andrej Karpathy described the underlying pattern earlier this year as an “LLM wiki,” and a fair number of us had quietly bet on it already. OKF’s real contribution is not the idea. It is the agreement: a wiki written by one team can now be read by a different team’s agent without anyone writing a translator in between.

the shift

From retrieval to a source of truth

This is where it stopped being a Google announcement to me and became something I had been arguing for my entire career. I have always been a little difficult about the idea of a single source of truth. Everything I write is published once, on my own site, and everything else I post points back to it. In architecture I think in bounded contexts and contracts: a domain’s knowledge written down deliberately, owned, versioned, and not quietly re-derived by whoever happens to need it next.

OKF applies that same discipline to what machines read. RAG asks, every single time, “what is the most similar text I can find?” OKF asks a better question first: “what is the knowledge here actually worth keeping, and have we written it down well enough to hand to anyone?” You can still retrieve over an OKF bundle. Nothing about the two ideas is at war. The difference is what you are retrieving from. One approach treats your knowledge as a swamp to be dredged on every query. The other treats it as a library worth maintaining.

// two different jobs

RAG

Embeds and retrieves whatever happens to exist
Re-reads scattered sources on every query
Each team rebuilds its own retrieval plumbing
Will return an answer that is confidently out of date

OKF

Curates what is actually worth keeping
Written once, then maintained like source code
A shared format another team’s agent can read
Changes are reviewed in pull requests, like everything else

Not a replacement. A layer underneath.

sharing

Agents that share what they know

The part that should interest anyone running more than one agent is what happens between them. OKF is built so that an organisation’s knowledge can be shared across agents, tools, and teams. One agent enriches the bundle, writing new knowledge into it, and another agent reads and traverses that same bundle later. The work the first agent did becomes knowledge the next one inherits, instead of evaporating the moment its session closes.

That is a quiet but real shift. Most agents today are amnesiacs. They wake up, do a task, and forget everything they learned the instant the conversation ends. A shared, written knowledge layer is how a fleet of agents begins to behave less like a crowd of strangers and more like a team that has worked together for years. Which is, not at all by coincidence, the exact thing I have spent my career trying to build out of human beings.

the honest part

The honest part

I would not be doing you any favours if I let the enthusiasm run unchecked. This is version 0.1, and it shows its youth in the right way. What it standardises today is mostly structural: a folder layout, a file format, a single required field. The harder problem is semantic interoperability, getting different organisations to agree on what the knowledge actually means, and that is largely left to producers and to conventions that have not been written yet. One thoughtful write-up asked whether OKF is a real standard or just a folder. It is a fair question. A folder is also how most good standards quietly begin.

A few things it is plainly not, because the confusion is already circulating. OKF is not a search-ranking signal; adding these files will not move your Google rankings. It is not a public web file like llms.txt that you publish at a known URL for crawlers to find, and it does not replace your schema.org structured data. It is an internal knowledge bundle that your own agents read. And whether it becomes a shared language at all depends on the one thing Google cannot decree: producers outside Google choosing to adopt it.

closing

The work underneath

I have rebuilt enough systems to distrust anything that promises to fix intelligence. OKF makes no such promise, and it is more useful for its modesty. What it says is simple: write your knowledge down properly, in a form a human and a machine can both read, keep it where you keep your code, and let your agents stand on it. It is the same instinct behind keeping the oldest things in a system carefully tended rather than left to rot in a forgotten table.

The teams that do well in this next stretch will not be the ones with the largest model. They will be the ones who finally treated their knowledge as something worth tending, with an owner, a history, and a single place it lives. I have believed that about my own work for eighteen years. It is a strange and quietly satisfying thing to watch the machines arrive at the same conclusion.

Give a brilliant model a swamp and you get confident nonsense.
Give an ordinary one a well-kept library and you get something you can trust.

The model was never the hard part. The library is the work.