Layer 2: Context Payloads

How to Optimize Context Payloads

Most teams inject full data objects on every request without measuring what the model actually uses. JSON records, RAG chunks, conversation history, user profiles. The model ignores most of it, but you pay for every token. Here's a repeatable process to trim it down.

Step 1: Define your success signal

Context payload optimization has two things to measure:

Two layers of success
1. Field utilization: which fields does the model actually reference?
   Run a set of real queries against the current payload.
   For each response, trace which injected fields influenced the output.
   Any field never referenced across 50+ queries is a candidate for removal.

2. Output quality: does removing fields degrade the response?
   Take a representative query set. Run it with the full payload,
   then with the trimmed payload. Compare outputs.
   If the answer is the same or better, the removed fields were noise.

The goal is not to remove data the model needs. It is to stop paying for data the model ignores. Most teams discover 70 to 90 percent of injected fields are never used.

Step 2: Generate test cases

Pull real queries from production that hit this context payload. Cover the range of what users actually ask, including queries that need deep context and queries that need almost none.

Example: customer support agent with user context
Needs full context:
  "Why was I charged twice last month?"        → needs billing_history
  "Which integrations do I have set up?"       → needs integrations
  "Add my coworker to the team"                → needs team_members

Needs minimal context:
  "How do I reset my password?"                → needs user_id only
  "What's the API rate limit on my plan?"      → needs plan tier only
  "Where are my notification settings?"        → needs nothing from payload

Edge cases:
  "Show me everything about my account"        → ambiguous scope
  "Compare my usage to last quarter"           → needs historical data
  "Why is my dashboard different from my teammate's?" → needs preferences + team

Aim for 30 to 50 test queries. Weight by real-world frequency. Most support agents handle password resets and plan questions far more often than billing disputes.

Step 3: Benchmark the baseline

Run every test query against the current payload. For each response, record which fields were referenced, the output quality, and the token count. This tells you what the model actually uses versus what you are sending.

Baseline: user context payload
Test queries:    48
Payload size:    2,847 tokens (user_context.json)

Field utilization:
  user_id            48/48 queries (100%)
  plan               31/48 queries (64.6%)
  open_tickets       18/48 queries (37.5%)
  billing_history     6/48 queries (12.5%)
  integrations        4/48 queries (8.3%)
  team_members        3/48 queries (6.3%)
  preferences         2/48 queries (4.2%)
  feature_flags       0/48 queries (0%)
  profile_image       0/48 queries (0%)
  dashboard_layout    0/48 queries (0%)

Output quality score: 82% (rubric: relevance, accuracy, completeness)

Avg tokens/call: 2,847 (context) + 1,200 (prompt) + 480 (completion)
Cost per call:   $0.0136

This is your floor. Notice that feature_flags, profile_image, and dashboard_layout were never referenced. Preferences was used twice. That is over 1,800 tokens of noise on every single call.

Step 4: Generate optimization candidates

Use agents to propose payload restructuring. The same three approaches work here:

Multi-agent consensus
Spawn 10 agents with the field utilization data.
Each proposes a trimmed payload structure.
Consensus: which fields to keep inline vs move to tool-level.
Divergence: fields used 5-15% of the time. Human decides.
Agent debate
Spawn 3 agents:
  Minimalist: cut everything under 10% utilization
  Completionist: keep anything that prevents a follow-up call
  Architect: restructure into tiers (always, on-demand, never)
Three rounds. Converge on a tiered injection strategy.
Single model iteration
Pass the payload + utilization data to a single model.
"Here's my user context object. Here's how often each field
is referenced. Propose a trimmed version that keeps the fields
used in >20% of queries inline, and moves the rest to
tool-level injection."

The key optimization for context payloads is not just removing fields. It is tiered injection: always-present fields stay inline, sometimes-needed fields move to tool calls, never-used fields get dropped entirely.

Step 5: Test candidates against the same baseline

Run the exact same 48 test queries against the trimmed payload. Compare output quality to the baseline. If quality holds or improves, the optimization ships.

After optimization: same 48 test queries
Test queries:    48
Payload size:    312 tokens (trimmed user_context.json)

Always inline (every call):
  user_id, plan, open_tickets, last_billing_issue, active_integrations

On-demand (tool-level, loaded when needed):
  billing_history, team_members, integrations config

Dropped entirely:
  feature_flags, profile_image, dashboard_layout, preferences.theme,
  preferences.notifications, preferences.dashboard_layout

Output quality score: 91% (up from 82%. Less noise, better signal.)
Queries needing tool fallback: 8/48 (16.7%, handled automatically)

Avg tokens/call: 312 (context) + 1,200 (prompt) + 420 (completion)
Cost per call:   $0.0058

Quality

+9pts

82% → 91%. Less noise, better signal.

Tokens

-89%

2,847 → 312 per request

Cost

-57%

$0.0136 → $0.0058 per call

Step 6: Map to business outcomes

Context payloads are often the largest token cost per call because they repeat on every request. A 2,500 token reduction across 40,000 calls per month adds up fast.

Token-cost-to-outcome per workflow
Workflow              Calls/mo   Before      After       Savings/mo
────────────────────────────────────────────────────────────────────
Support agent         42,000     $571        $244        $327
Account dashboard     28,000     $380        $162        $218
Onboarding flow       15,000     $204        $87         $117
Search assistant       8,500     $116        $49         $67

Total monthly savings: $729
Annual savings:        $8,748

Context payload optimization often delivers the largest absolute cost savings because the waste compounds on every call. A bloated prompt is bad. A bloated payload attached to a bloated prompt is worse.

Then do it again

Payloads drift just like prompts. New fields get added to data models, new integrations get wired up, nobody audits what is actually being sent. The loop runs continuously:

1. Define success signal (field utilization + output quality)
2. Generate test queries from production traffic
3. Benchmark field utilization, output quality, and token cost
4. Generate optimization candidates (consensus, debate, or single model)
5. Test candidates: keep if quality holds at lower token count
6. Map to business outcomes: prioritize by call volume × savings
7. Re-audit quarterly, or when data models change

The same discipline applies to every type of injected context. RAG chunks, conversation history, tool results, memory files. If it gets injected per call, it should be measured per call.

Quality

+9pts

Less noise, better signal

Tokens

-89%

Payload trimmed to what matters

ROI

57%

Cost reduction at higher quality

Benchmark an AI workflow.

We'll benchmark an AI workflow and show where the biggest gains are in cost, quality, speed, and performance.

Fixed-scope benchmark. You keep everything.