Guide 3
How to Improve AI Performance
Use the instruction layer to reduce waste, increase consistency, and improve workflow performance where it counts.
Do not optimize blindly
The biggest mistake is optimizing the loudest workflow instead of the most important one. Once spend and outcomes are visible, performance work should follow leverage, not intuition.
1. Track AI spend by workflow 2. Tie each workflow to an outcome 3. Rank workflows by leverage 4. Optimize the instruction layer behind them 5. Re-measure the same workflow metrics
Start with the instruction layer
AI performance problems often look like model problems from a distance. In practice, they are usually instruction-layer problems: bloated prompts, oversized payloads, vague tool descriptions, or the wrong model doing the wrong task.
Prompts: unclear rules, conflicting instructions, poor routing Context: oversized records, irrelevant fields, over-retrieval Tools: wrong tool selection, retries, unnecessary call chains Models: expensive models on low-complexity tasks
Optimize the layers that drive the outcome
Do not optimize all four layers equally. Use the workflow and spend data to decide where the waste actually is.
If cost is high because payloads are huge: optimize context first If latency is high because tool chains sprawl: optimize MCP/tool layer first If quality is inconsistent but context is clean: optimize prompts first If the task is simple but spend is high: optimize model selection first
Use the four layers as the operating levers
The four layers are the concrete levers you can pull once you know what matters. Each has its own methodology and test loop.
Layer 1
How to Optimize Prompts
Reduce conflicting instructions, routing errors, and unnecessary token load at the prompt layer.
Layer 2
How to Optimize Context Payloads
Trim oversized payloads so the model gets only the context it actually needs on each request.
Layer 3
How to Optimize MCP Tools
Improve tool routing, cut retries, and reduce waste across complex agent and workflow chains.
Layer 4
How to Optimize Model Routing
Use the right model for each task instead of paying one premium rate for every request.
Re-measure the same workflow, not a new one
Performance work only counts if it improves the same workflow against the same outcome. Re-run the same success checks, cost metrics, and latency measurements after the change.
Workflow: planning digest Before: cost/run: $0.94 success rate: 61% avg latency: 9.2s After: cost/run: $0.41 success rate: 82% avg latency: 4.8s Result: lower spend higher reliability faster workflow completion
Prioritization
By leverage
Optimize the workflows tied to the highest-value outcomes
Method
By layer
Use prompts, context, tools, and models as distinct levers
Proof
By re-measurement
Ship only what improves the workflow that matters