Guide 3

How to Improve AI Performance

Use the instruction layer to reduce waste, increase consistency, and improve workflow performance where it counts.

Do not optimize blindly

The biggest mistake is optimizing the loudest workflow instead of the most important one. Once spend and outcomes are visible, performance work should follow leverage, not intuition.

Optimization sequence

1. Track AI spend by workflow
2. Tie each workflow to an outcome
3. Rank workflows by leverage
4. Optimize the instruction layer behind them
5. Re-measure the same workflow metrics

Start with the instruction layer

AI performance problems often look like model problems from a distance. In practice, they are usually instruction-layer problems: bloated prompts, oversized payloads, vague tool descriptions, or the wrong model doing the wrong task.

What to change first

Prompts:
  unclear rules, conflicting instructions, poor routing

Context:
  oversized records, irrelevant fields, over-retrieval

Tools:
  wrong tool selection, retries, unnecessary call chains

Models:
  expensive models on low-complexity tasks

Optimize the layers that drive the outcome

Do not optimize all four layers equally. Use the workflow and spend data to decide where the waste actually is.

Example prioritization

If cost is high because payloads are huge:
  optimize context first

If latency is high because tool chains sprawl:
  optimize MCP/tool layer first

If quality is inconsistent but context is clean:
  optimize prompts first

If the task is simple but spend is high:
  optimize model selection first

Use the four layers as the operating levers

The four layers are the concrete levers you can pull once you know what matters. Each has its own methodology and test loop.

Layer 1

Re-measure the same workflow, not a new one

Performance work only counts if it improves the same workflow against the same outcome. Re-run the same success checks, cost metrics, and latency measurements after the change.

Measured performance improvement

Workflow: planning digest

Before:
  cost/run: $0.94
  success rate: 61%
  avg latency: 9.2s

After:
  cost/run: $0.41
  success rate: 82%
  avg latency: 4.8s

Result:
  lower spend
  higher reliability
  faster workflow completion

Prioritization

By leverage

Optimize the workflows tied to the highest-value outcomes

Method

By layer

Use prompts, context, tools, and models as distinct levers

Proof

By re-measurement

Ship only what improves the workflow that matters

How to Improve AI Performance

Do not optimize blindly

Start with the instruction layer

Optimize the layers that drive the outcome

Use the four layers as the operating levers

How to Optimize Prompts

How to Optimize Context Payloads

How to Optimize MCP Tools

How to Optimize Model Routing

Re-measure the same workflow, not a new one

Benchmark an AI workflow.