The Great Token Burn: How Uber Exhausted Its 2026 AI Budget by May

Eran Goldman-Malka · May 4, 2026

AI Economics

In early 2026, Uber’s CTO Praveen Neppalli Naga disclosed something that should have landed harder than it did: his engineering organisation had burned through its entire annual AI budget before the calendar reached May. The culprit was not a rogue data science project or an experimental LLM fine-tune. It was a coding assistant — specifically, Anthropic’s Claude Code — running at roughly $2,000 per engineer per month across a team large enough to make that number catastrophic at scale. This is the first instalment of an eight-part series on the 2026 AI Token Economy: what it costs, why costs escape, and how to govern them before your own budget evaporates.

The Numbers Behind the Headline

Uber is not a small shop experimenting with AI on the margins. It runs one of the world’s most complex real-time logistics platforms, and its engineering headcount is measured in thousands. When the CTO says the AI budget is gone by May, that means the annualised per-seat cost of developer tooling alone was underestimated by a factor that most finance organisations do not have a framework to reason about.

The mechanics are straightforward. Claude Code operates on Anthropic’s API pricing model. At the time of Uber’s overrun, the relevant rates for Claude Sonnet-class models were approximately:

Model	Input (per 1M tokens)	Output (per 1M tokens)
Claude 3.5 Sonnet	$3.00	$15.00
Claude 3 Opus	$15.00	$75.00
Claude 3 Haiku	$0.25	$1.25

A developer using Claude Code for active coding sessions generates far more tokens than most cost models anticipate. A single “refactor this module” instruction can expand into thousands of input tokens (the full file context, preceding conversation, system prompt) and hundreds of output tokens. Multiply by the frequency of interactions during an eight-hour workday, and the daily token load per engineer can reach 2–4 million tokens without any anomalous behaviour.

The baseline cost formula for any API-based AI tooling is:

\[C = T_{in} \cdot P_{in} + T_{out} \cdot P_{out}\]

Where:

$C$ is the total cost per session
$T_{in}$ is the number of input tokens
$T_{out}$ is the number of output tokens
$P_{in}$ and $P_{out}$ are the respective per-token prices

At $2,000 per engineer per month, and assuming 20 working days, that is $100 per engineer per day. Using Sonnet pricing, that maps to roughly 5.5 million input tokens or 130,000 output tokens daily — plausible numbers for an engineer who treats the assistant as a primary interface rather than an occasional tool.

Why Budget Models Fail for AI Tooling

Traditional software licensing is a known quantity. A SaaS seat costs $X per month, negotiated annually, capitalised as a fixed OpEx line. AI API consumption breaks every assumption that model makes.

First, usage is elastic and unbounded. A GitHub Copilot seat has a flat fee regardless of how many completions fire. Claude Code on the API has no ceiling — the more an engineer relies on it, the more it costs, and the correlation between productivity and cost is positive rather than inverse. The tools that work best cost the most.

Second, context windows amplify spend non-linearly. Claude’s 200K context window is a genuine capability advantage, but every token loaded into context is a billable token. An engineer who opens a 5,000-line codebase in context before asking a question has already spent more on input tokens than many legacy SaaS tools cost per day. This is the context tax — the subject of the next post in this series.

Third, agentic workflows multiply calls. Claude Code does not make a single API call per developer interaction. A single “fix this bug” instruction can trigger a chain of tool calls — read file, search codebase, write fix, run tests, evaluate result — each of which is a separate API round-trip with its own input and output token charge. An eight-step agentic loop on a Sonnet-class model can cost more than a hundred manual completions.

What Uber’s Overrun Reveals About Industry Norms

The Uber case is notable not because it is unique, but because it was disclosed. Most organisations in the same position absorb the overage silently, reclassify it across budget lines, or discover it too late to act. The disclosure from a CTO of Neppalli Naga’s stature signals that the problem has become too large to manage through informal budget adjustments.

The underlying issue is a governance gap. AI API costs behave more like cloud egress or database query costs than like software licences — they require the same instrumentation, alerting, and quota management that mature engineering organisations apply to AWS spend. Most companies that adopted AI tooling in 2024–2025 did so under the assumption that costs would follow the flat-fee SaaS model. They did not budget for metered consumption, did not instrument token usage, and did not set per-team or per-engineer spending limits.

The result is a class of budget overrun that is structurally different from traditional IT overspend. It is not caused by a failed project or a licensing mistake. It is caused by the tools working exactly as intended, at a cost that was never correctly modelled.

The Governance Primitives That Were Missing

A responsible AI tooling deployment requires at minimum:

Token metering at the team level — every API call attributed to a cost centre, with daily visibility into spend velocity.
Per-engineer monthly caps — hard limits enforced at the API gateway layer, not managed by individual discipline.
Model tier policies — clear rules about which model class is authorised for which workload. Opus for exploratory research; Sonnet for production coding; Haiku for high-frequency, low-complexity tasks.
Context hygiene guidelines — developer training on what to include in context windows and what to exclude. A 200K context capability does not mean a 200K context requirement.
Agentic loop budgets — explicit token budgets assigned to each automated workflow, with circuit-breaker logic to halt runaway chains.

None of these are exotic engineering challenges. They are standard operational practices applied to a new cost surface. The engineering teams that implement them will find that AI tooling delivers its full productivity benefit at a fraction of the unconstrained cost.

What This Means for Your 2026 Planning Cycle

If your organisation is mid-year and has not yet instrumented AI API spend, the Uber case is a preview of what your Q3 finance review may reveal. The corrective action is not to restrict AI tooling — the productivity gains are real and measurable. The corrective action is to apply the same operational rigour to AI spend that you apply to cloud infrastructure.

The organisations that get this right in 2026 will enter 2027 with a structural cost advantage over competitors who are still learning that the budget model they inherited from the SaaS era does not apply to metered inference.

Next in the series: The Context Tax: Quadratic Cost Scaling and the $6M Healthcare RAG Overrun — why every token you load into a long-context model costs more than the price sheet implies, and what that means for retrieval-augmented architectures at scale.

Share: Twitter, Facebook