Guide 1

How to Track AI Spend

Most teams can see total model spend. That is not enough. If you want to control cost, improve performance, or connect AI to business outcomes, you need to break spend down into the units the business actually runs on.

Start with the workflow, not the model

Do not begin by logging total spend by provider or model family alone. Begin with the repeatable workflow the system is trying to complete. That is the unit that eventually maps to business outcomes.

Core workflow dimensions
workflow_id
workflow_type
request_id
started_at
ended_at
status

If a request cannot be tied to a workflow, it will be hard to evaluate later. Workflow is the unit of analysis. Everything else hangs off it.

Instrument one event envelope per request

Every request should emit one structured event with a stable shape. This is the base object for analysis, benchmarking, and later optimization.

Example event envelope
{
  "request_id": "req_123",
  "workflow_id": "wf_8f2a",
  "workflow_type": "customer_support_reply",
  "agent_id": "support-agent-v2",
  "model": "claude-sonnet",
  "started_at": "2026-04-07T18:00:00Z",
  "ended_at": "2026-04-07T18:00:04Z",
  "latency_ms": 4120,
  "input_tokens": 8200,
  "output_tokens": 640,
  "payload_bytes": 58211,
  "tool_calls": 3,
  "retry_count": 1,
  "status": "success",
  "estimated_cost_usd": 0.092
}

Track the dimensions that actually drive cost

The point of instrumentation is not more logging. It is to identify the dimensions that explain why spend accumulates and where it becomes actionable.

Minimum useful dimensions
1. Workflow
2. Agent
3. Model
4. Tool calls
5. Payload size
6. Retries
7. Latency
8. Estimated cost
9. Success state

These dimensions let you answer: which workflows are most expensive, which agents retry most, which payloads are bloated, and which requests cost the most without producing a successful result.

Track by agent, tool path, and payload

A workflow may involve multiple agents, models, tools, and retrieval patterns. If you do not separate these, cost stays too aggregated to improve.

Useful instrumentation cuts
By agent:
  Which agent consumes the most tokens?
  Which agent retries most?

By tool path:
  Which tools create extra calls?
  Which tool sequences create latency?

By payload:
  Which workflows inject oversized context?
  Which requests carry data the model never uses?

Define success before cost per successful completion

Average request cost is not enough. You want to know what it costs to produce a usable result. That requires defining success first.

Cost per successful completion
cost_per_successful_completion =
  total_workflow_cost / successful_completions

Example:
  1,000 runs
  $4,000 total cost
  640 successful completions

  Cost per successful completion = $6.25

This metric is much more useful than average request cost because it tells you what it takes to get a working outcome, not just what you spent.

Build the right rollups

Once the event envelope exists, create reporting by workflow, by agent, by model, by tool, and by payload band. Those views tell you where to look first.

Recommended rollups
By workflow:
  total cost
  avg cost
  success rate
  cost per successful completion
  avg latency
  retry rate

By agent:
  total cost
  avg input/output tokens
  retry rate
  avg latency

By tool:
  call volume
  failure rate
  latency
  workflows touched

What good looks like

A well-instrumented AI system can answer where spend is happening, which workflows are worth it, and which parts of the instruction layer are creating waste.

Visibility

By workflow

Cost becomes attributable, not aggregated

Efficiency

By success

Measure what it costs to get a usable result

Actionability

By driver

See payload, tool, retry, and model patterns clearly

Benchmark an AI workflow.

We'll benchmark an AI workflow and show where the biggest gains are in cost, quality, speed, and performance.

Fixed-scope benchmark. You keep everything.