kurt.news

Clean, fast AI news without the hype or doom.

Ai

Token Costs Are Scaling Faster Than Anyone Planned

Token Costs Are Scaling Faster Than Anyone Planned

AI model releases in November 2025 changed the economics of software development. Claude Opus 4.5, GPT-5.1, and Gemini 3 Pro each brought significantly better agentic capabilities. They also multiplied token consumption in ways most engineering budgets did not anticipate.

The Numbers

Per-developer AI token consumption rose 18.6x in nine months, according to Jellyfish, an engineering management platform. That is not a typo.

Goldman Sachs projects global token usage will multiply 24x by 2030. That projection now looks conservative given the current trajectory.

For comparison: tracking token costs at enterprise scale is described as a trillions-of-rows-per-month data problem. Cloud cost tracking, which already strains most FinOps teams, runs at hundreds of millions of rows per month. Token economics are a different order of magnitude.

Productive, But at What Cost

Jellyfish found that engineers with the highest token usage were about twice as productive as their lower-usage peers. Those same engineers consumed 10x the tokens.

A 2x productivity gain from a 10x spend increase is a business case question, not a technical one. Different organizations will answer it differently.

Faros AI released a two-year study of 20,000 developers in April 2026. Output rose. So did bugs and rewrites. The relationship between token spend and code quality is not as clean as the productivity numbers suggest.

The Sub-Model Problem

There is a separate issue sitting underneath the cost curve: frontier model providers are routing queries to cheaper sub-models even when enterprise customers call the flagship model. Sonnet or Haiku gets served when Opus is billed. This shows up in enterprise Claude billing.

This is not a hypothetical concern. It is already happening. How widespread it is remains unclear, but the incentive structure for providers is obvious.

Industry Responses

Factory, an AI agents startup focused on enterprises, launched a model router in response to the cost problem. The system automatically selects the optimal model for each task rather than defaulting to the most capable (and most expensive) option for everything.

At the standards level, the Linux Foundation unveiled plans for the Tokenomics Foundation, a new body focused on AI token usage definitions, billing standards, and metrics including cost-per-intelligence and tokens-per-watt. Formal launch is planned for July 2026.

Cost-per-intelligence is an interesting metric to standardize. It implies the industry is ready to acknowledge that not all tokens produce equivalent value. That is a more honest framing than the current one.

What Comes Next

The token bill is real and it is growing. The companies building tooling around cost management (Factory's router, the Tokenomics Foundation's standards) are betting that optimization becomes a primary concern rather than an afterthought. Given the numbers, that bet looks reasonable.

Whether better cost tracking actually changes model provider behavior on sub-model routing is a different question. Standards bodies set definitions. Enforcement is someone else's problem.

Source: Techcrunch