The previous article ended with a deliberate omission: it told you what AI costs, and said nothing about where the capability curve goes next. This article answers that – the only honest way I know how. Not with my predictions. With the work of people who publish theirs in advance, attach probabilities, and then grade themselves in public when reality arrives.
- How to read an AI forecast without being misled: medians are not promises, modal years are not medians, and public self-correction is a feature – not an embarrassment.
- What the three most credible signals say: METR's doubling time-horizon metric, the AI Futures Project's self-graded scenario work (reality ran at ~65% of their predicted pace), and the labs' own operational numbers.
- The practical posture: five no-regret moves that make sense whether the median lands in 2030, 2035, or never.
Fable 5 was released on June 9. I have used it for two days. For the first time in my career, I feel that something is more intelligent than I am in my own domain – software development and cloud architecture. Not faster at typing. Not better at recall. More intelligent, in the part of the work I considered mine: the architecture call, the trade-off, the failure mode I would have caught.
I am not alone in this reading. Stripe described it compressing "months of engineering into days" on a codebase migration. GitHub called the agentic coding results "the strongest of any Claude model we've had the opportunity to test." I cite those not as marketing – but because they match what I watched happen on my own screen, in my own codebase, this week.
That moment changes how you read forecasts. They stop being entertainment. So the question becomes urgent and practical: whose forecasts deserve your attention, and what do the credible ones actually say?
How to Read a Forecast Without Being Misled
Most AI timeline discourse fails before it starts, because most readers – and most journalists – read forecasts wrong. Three distinctions do most of the work:
A median is not a promise. When a forecaster says "median 2030," they are saying: half my probability mass is before this date, half after. It is a statement about a distribution, not an appointment. The AI Futures Project – the team behind the AI-2027 scenario – wrote an entire clarification post correcting coverage that read their scenario as a confident claim that AGI arrives in 2027. Their words: they "certainly cannot confidently predict a specific year."
Updating in public is the credential. The strongest signal that a forecaster deserves attention is not that they were right – it is that they show you their corrections. Watch the medians move as evidence arrived:
Read that strip carefully, because it contains the whole epistemic lesson. One forecaster moved his median later by three years as evidence came in slower than his scenario. Another moved his earlier by twenty-five years. They publish the moves, with reasons. Compare that to the people in your feed who have been alternately announcing AGI-next-year and permanent-AI-winter without ever scoring a past claim.
The Forecasters Who Grade Their Own Work
In late 2025, the AI Futures Project did the thing almost nobody in this field does: they went back through the AI-2027 scenario's checkable 2025 predictions and graded themselves in public. The headline result: reality ran at roughly 65% of their predicted pace on quantitative metrics. Directionally right, consistently too fast.
| Prediction (for 2025) | What happened | Verdict |
|---|---|---|
| SWE-bench ~85% by mid-2025 | Best actual score: 74.5% | Slower than predicted |
| OpenAI revenue ~$18B annualised | ~$20B annualised | Slightly ahead |
| Frontier lead of 3–9 months | Top US labs separated by 0–2 months | Closer race than modelled |
| Agents adopted into workflows | Happened – adoption real but narrower than scenario | Directionally right |
| AI R&D self-acceleration uplift | Materialised slower than modelled | Slower than predicted |
Their method, in their own words: "Make a detailed, concrete trajectory… wait a while… check if things are roughly on track… adjust your guess about how the future will go, to be correspondingly faster or slower." That is not prophecy. That is engineering discipline applied to the future – and it is why their separate forecaster survey is worth knowing too: the 2025 aggregate of 413 forecasters was "about right on benchmarks," underestimated revenue, and overestimated public salience. Capability and money moved faster than attention. Keep that asymmetry in mind; it describes 2026 as well.
The AI-2027 scenario itself – the team's detailed, quarter-by-quarter narrative – sketches where the curve points if it does not bend. Its early-2027 frame:
Hold both numbers at once: the scenario's scale, and the measured 65% pace correction. A vivid future, arriving slower than written – that is the honest summary of the best scenario work in the field.
METR: The One Metric Worth Tracking Quarterly
If you track a single capability number, make it METR's time horizon: the length of task – measured in expert-human completion time – that an AI agent can finish autonomously at a given reliability. A 50% horizon of one hour means the model succeeds half the time at tasks that take a skilled human an hour. It is the closest thing the field has to a speedometer, because it measures the thing that actually matters for work: how long can you leave it alone.
Three facts about it. First, METR's 2025 research found the 50% horizon doubling roughly every seven months across six years of models – with the recent subset trending faster. Second, exponential curves fit this data better than linear or flattening ones; that is a measured result, not an assumption. Third – and this is the detail I find most telling – METR notes that measurements above 16 hours are currently unreliable, because their task suite was not built for tasks that long. The frontier models of mid-2026, Fable 5 among them, are outgrowing the ruler. The constraint on measuring AI capability is now the cost of hiring humans who can do week-long tasks for comparison.
If the doubling holds – an "if" that the 65% grading result tells you to treat with respect – tasks measured in days fall within the planning horizon of anyone reading this, and tasks measured in weeks follow on a schedule you can roughly compute yourself. That is the entire forecast, stated plainly. No date. A slope, and an honest error bar.
What the Builders Report from Inside
The third signal source is the labs themselves. They have an obvious incentive to talk the curve upward – so I weight their operational numbers, the ones describing their own engineering reality, above their predictions. Anthropic's published material on recursive self-improvement reports, as of May 2026:
The same page sketches three futures: trends stall (which they rate least likely), efficiency keeps compounding under human direction (their most probable near-term), or AI systems begin building their successors. On the third, their language is deliberately uncomfortable: it "is not inevitable" but "could come sooner than most institutions are prepared for." Dario Amodei's January 2026 essay The Adolescence of Technology puts his own range on powerful AI – Nobel-level capability across fields, millions of parallel instances – at one to two years, while acknowledging the uncertainty, and pairs it with the prediction practitioners should sit with longest: serious displacement pressure on entry-level white-collar work within one to five years.
Discount all of this for incentive if you like – I do. But note what happens when you put the three sources side by side. The independent measurement (METR), the self-graded scenario team (AI Futures), and the most safety-vocal lab all describe the same shape: steep, compounding, and slower than the most vivid scenarios – but not by much, and not slowing to a stop.
Where the Credible Views Agree – and Where I'm Not Sure
The agreement zone is narrower than the headlines and wider than the scepticism. Every credible source I track expects continued rapid capability growth through at least 2027–2028, expects software engineering to stay the leading edge, and expects autonomous task length to keep extending. The disagreement is about timing and ceiling: medians for "better than humans at every cognitive task" run from 2030 (survey median, Kokotajlo) to 2035 (Lifland) – and a serious tail of probability extends well past both.
- Whether the doubling trend bends at long horizons. Week-long tasks involve context, judgment, and recovery-from-ambiguity that 16-hour tasks do not. The data cannot yet say; neither can I.
- Whether my Fable 5 moment generalises. "More intelligent than me in my domain" is one practitioner's reading after two days. It is a data point, not a measurement – I labelled it accordingly.
- Whether the chips-and-power constraints from Article 03 act as a brake the forecasts underweight. Capability curves assume the compute arrives. The HBM supply data says that assumption is at least worth a question mark.
- Whether economic diffusion keeps lagging capability. The 2025 forecasters overestimated public salience – society noticing less than expected is itself a forecast-relevant fact.
The Two Sources I Would Send Anyone To
Do not take the summary above on faith – that would defeat the entire point of this article. The two most credible starting points in AI forecasting are free, public, and written to be checked:
What to Actually Do With a Forecast
- Track the slope, not the headlines. Check METR's time horizons quarterly. One number, one trend line – it will tell you more than a year of launch-day coverage.
- Plan in ranges, not years. The credible medians span 2030–2035 with wide error bars. Any career or architecture decision that only works under one of those dates is a bet, not a plan.
- Keep filling the compounding column. Article 01's distinction holds under every scenario: fundamentals, judgment, and domain depth appreciate precisely because execution is getting cheap.
- Design processes for all three lab scenarios. The practices in Articles 06–10 – bounded contexts, review gates, evals, cost ceilings – are exactly the infrastructure that pays off whether trends stall, compound, or close the loop.
- Take the displacement forecast personally – and early. If entry-level cognitive work comes under pressure within one to five years, the move up the judgment ladder starts now, not at the median date.
A forecast is not a promise about the future. It is a discipline for being less wrong about it. The people worth reading publish their predictions, show their corrections, and put error bars on their own conviction – everyone else is doing marketing with dates.