Guide 1
How to Track AI Spend
Most teams can see total model spend. That is not enough. If you want to control cost, improve performance, or connect AI to business outcomes, you need to break spend down into the units the business actually runs on.
Start with the workflow, not the model
Do not begin by logging total spend by provider or model family alone. Begin with the repeatable workflow the system is trying to complete. That is the unit that eventually maps to business outcomes.
workflow_id workflow_type request_id started_at ended_at status
If a request cannot be tied to a workflow, it will be hard to evaluate later. Workflow is the unit of analysis. Everything else hangs off it.
Instrument one event envelope per request
Every request should emit one structured event with a stable shape. This is the base object for analysis, benchmarking, and later optimization.
{
"request_id": "req_123",
"workflow_id": "wf_8f2a",
"workflow_type": "customer_support_reply",
"agent_id": "support-agent-v2",
"model": "claude-sonnet",
"started_at": "2026-04-07T18:00:00Z",
"ended_at": "2026-04-07T18:00:04Z",
"latency_ms": 4120,
"input_tokens": 8200,
"output_tokens": 640,
"payload_bytes": 58211,
"tool_calls": 3,
"retry_count": 1,
"status": "success",
"estimated_cost_usd": 0.092
}Track the dimensions that actually drive cost
The point of instrumentation is not more logging. It is to identify the dimensions that explain why spend accumulates and where it becomes actionable.
1. Workflow 2. Agent 3. Model 4. Tool calls 5. Payload size 6. Retries 7. Latency 8. Estimated cost 9. Success state
These dimensions let you answer: which workflows are most expensive, which agents retry most, which payloads are bloated, and which requests cost the most without producing a successful result.
Track by agent, tool path, and payload
A workflow may involve multiple agents, models, tools, and retrieval patterns. If you do not separate these, cost stays too aggregated to improve.
By agent: Which agent consumes the most tokens? Which agent retries most? By tool path: Which tools create extra calls? Which tool sequences create latency? By payload: Which workflows inject oversized context? Which requests carry data the model never uses?
Define success before cost per successful completion
Average request cost is not enough. You want to know what it costs to produce a usable result. That requires defining success first.
cost_per_successful_completion = total_workflow_cost / successful_completions Example: 1,000 runs $4,000 total cost 640 successful completions Cost per successful completion = $6.25
This metric is much more useful than average request cost because it tells you what it takes to get a working outcome, not just what you spent.
Build the right rollups
Once the event envelope exists, create reporting by workflow, by agent, by model, by tool, and by payload band. Those views tell you where to look first.
By workflow: total cost avg cost success rate cost per successful completion avg latency retry rate By agent: total cost avg input/output tokens retry rate avg latency By tool: call volume failure rate latency workflows touched
What good looks like
A well-instrumented AI system can answer where spend is happening, which workflows are worth it, and which parts of the instruction layer are creating waste.
Visibility
By workflow
Cost becomes attributable, not aggregated
Efficiency
By success
Measure what it costs to get a usable result
Actionability
By driver
See payload, tool, retry, and model patterns clearly