The Infinite Spend Bug: Recursive Agent Loops and the Metered Future of Agentic AI

Eran Goldman-Malka · May 11, 2026

AI Economics

A software bug that causes infinite recursion terminates with a stack overflow. A token bug — an agentic AI loop that recurses without a termination condition — terminates with a billing invoice. In 2026, as autonomous agents displaced simple chat completions as the primary AI interaction pattern, organisations discovered that the economics of agentic systems are fundamentally different from those of single-shot inference. Anthropic’s decision to move Claude agents onto metered billing across all subscription tiers was not a product update. It was a signal that the industry has reached an inflection point where agent economics require the same governance discipline as cloud infrastructure.

What Makes an Agent Loop Different from a Completion

A standard LLM completion is a bounded transaction: a prompt goes in, a response comes out, and the API call closes. Cost is deterministic at the time of the request, given the context window size.

An agentic loop is structurally different. It is a control flow in which the model’s output determines the next action, which in turn generates new input, which feeds another model call. The loop terminates when the model decides it has completed its objective — or when an external circuit breaker intervenes. If neither condition is met cleanly, the loop continues.

The cost of a single agent task is therefore not the cost of one API call. It is the sum of all API calls across the entire task execution:

\[C_{agent} = \sum_{k=1}^{K} \left( T_{in}^{(k)} \cdot P_{in} + T_{out}^{(k)} \cdot P_{out} \right)\]

Where $K$ is the number of steps the agent takes to complete (or fail to complete) the task. In a well-designed agent, $K$ is bounded. In a poorly designed one, $K$ is determined by the model’s own assessment of task completion — which can be confused, misdirected, or pathologically optimistic.

The Anatomy of a Runaway Loop

Consider an agent tasked with “audit the security posture of this repository and generate a remediation plan.” The intended execution path might be:

Read repository structure
Identify configuration files
Check each file against a security policy
Generate findings
Draft remediation plan
Return result

In practice, an under-specified agent with access to search, read, and web-browsing tools may:

Read repository structure (1 API call)
Identify 47 configuration files (1 API call)
Check file 1 — finds a reference to an external library, decides to research the library’s CVE history (3 API calls)
CVE research yields references to related libraries — agent expands scope (8 more API calls)
Each new library reference expands the research scope further
After 200+ API calls covering three hours of wall-clock time, the agent produces a report that references 340 libraries, most of which are irrelevant

This is not a theoretical failure mode. It is a documented pattern in every agentic framework that allows models to self-direct their tool use without explicit step budgets. The model is behaving rationally from its own perspective — gathering more information to produce a more comprehensive output. The operator never specified a limit, so the model imposed none.

Anthropic’s Response: Metered Agent Billing

Anthropic’s 2026 shift to metered agent billing across its subscription tiers reflects a recognition that flat-rate pricing for agentic workloads is structurally unsustainable — for both the operator running the agent and for Anthropic managing server capacity.

Under the new model, agent runs within Claude’s built-in tools (computer use, web search, file operations) are billed against a token meter that tracks both input and output across the entire agent session. Subscription tiers receive a monthly token allocation; overages are billed at standard API rates. The architecture creates a natural pressure toward efficient agent design: organisations that burn their allocation on runaway loops face real financial consequences, not just degraded performance.

The implications for teams building on the Claude API rather than the consumer product are equally significant. The metered model validates a set of engineering practices that had previously been regarded as optional:

Practice	Pre-Metering Status	Post-Metering Status
Step count limits per agent run	Optional best practice	Financial necessity
Token budget per task type	Rarely implemented	Standard requirement
Agent run cost attribution	Difficult to instrument	Critical for billing accuracy
Circuit breakers on tool calls	Advanced implementations only	Baseline engineering requirement
Task scope specification in prompts	Improved output quality	Also cost control

The Three Failure Modes

Agentic cost overruns concentrate around three patterns:

1. Scope creep without bounds. The agent is given access to tools that allow it to expand its own task definition. Web search is the most common vector — a task that starts as “summarise this document” becomes “research all claims in this document” becomes “verify every cited source” becomes a multi-hour research engagement. Mitigation: define tool access narrowly, and include explicit scope boundaries in system prompts (“do not follow external links; work only from the provided document”).

2. Retry loops on tool failure. When a tool call fails — a web page returns a 404, an API is rate-limited, a file cannot be parsed — a poorly calibrated agent will retry. Without a maximum retry count, transient failures become infinite loops. Mitigation: implement explicit retry budgets (maximum 3 retries per tool call) and instruct the model to report failures rather than loop on them.

3. Verification spirals. Some models, when instructed to be thorough, enter a pattern of self-verification — generating a result, evaluating it, finding it inadequate, regenerating, re-evaluating. Each iteration is a full generation cycle. Mitigation: separate generation from evaluation into distinct agent steps with independent token budgets, and cap the number of evaluation passes.

Designing Cost-Bounded Agents

The engineering pattern that prevents runaway loops is the explicit budget constraint, embedded at three levels:

Prompt-level constraints. The system prompt specifies maximum steps, maximum tool calls, and acceptable scope boundaries. Example: “Complete this task in no more than 10 steps. If you cannot complete it within 10 steps, return a partial result with an explanation of what remains.”

Framework-level guardrails. Agent orchestration frameworks (LangGraph, AutoGen, CrewAI) support maximum iteration counts and token budgets as first-class configuration. These should always be set explicitly — the default in most frameworks is unbounded.

Infrastructure-level circuit breakers. API gateway or proxy layers can enforce hard token limits per agent session, killing runs that exceed the budget regardless of model state. This is the safety net when prompt-level and framework-level constraints fail.

The cost model for a well-governed agent run is:

\[C_{bounded} = \min\left( C_{agent},\ K_{max} \cdot \bar{C}_{step} \right)\]

Where $K_{max}$ is the maximum permitted step count and $\bar{C}_{step}$ is the average cost per agent step — a number that can be calibrated empirically during development and used to set $K_{max}$ against a per-task budget constraint.

What the Metered Shift Means for Architecture

Anthropic’s move to metered agent billing is not an isolated vendor decision. It reflects a broader industry direction. As agents become the primary interface through which organisations interact with AI — replacing both chat UIs and batch pipelines — the cost per unit of value delivered will increasingly be measured in agent steps rather than tokens.

This changes the economic calculus for every AI architecture decision. The question is no longer only “which model produces the best output?” It becomes “which model produces sufficient output at the lowest step count?” A smaller, faster model that completes a task in four steps is economically superior to a more capable model that reaches the same outcome in twelve — even if the per-token price of the smaller model is higher.

The organisations that tune their agents for step efficiency now will have a durable cost advantage as the agentic paradigm matures. Those that optimise only for output quality, without constraining step count, will find that the meter runs whether or not the agent produces useful work.

Next in the series: Beyond the Token: Google’s Per-Minute Pricing and What It Means for the Economics of Real-Time AI — how Gemini Live API’s $0.005/minute rate is disrupting the token-per-call pricing model, and when it makes economic sense to pay for time rather than tokens.

Share: Twitter, Facebook