xml-pipeline/xml_pipeline/primitives/budget_warning.py
dullfig e6697f0ea2 Add BudgetWarning system alerts for token budget thresholds
- Create BudgetWarning primitive payload (75%, 90%, 95% thresholds)
- Add threshold tracking to ThreadBudget with triggered_thresholds set
- Change consume() to return (budget, crossed_thresholds) tuple
- Wire warning injection in LLM router when thresholds crossed
- Add 15 new tests for threshold detection and warning injection

Agents now receive BudgetWarning messages when approaching their token limit,
allowing them to design contingencies (summarize, escalate, save state).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-27 21:41:34 -08:00

66 lines
2 KiB
Python

"""
BudgetWarning — System alerts for token budget thresholds.
When a thread approaches its token budget limit, the system injects
BudgetWarning messages to give agents a chance to wrap up gracefully.
Thresholds:
- 75%: Early warning - consider wrapping up
- 90%: Critical warning - finish current task
- 95%: Final warning - immediate action required
Agents can design contingencies:
- Summarize progress and respond early
- Escalate to a supervisor agent
- Save state for continuation
- Request budget increase (if supported)
Example handler pattern:
async def my_agent(payload, metadata):
if isinstance(payload, BudgetWarning):
if payload.severity == "critical":
# Wrap up immediately
return HandlerResponse.respond(
payload=Summary(progress="Reached 90% budget, stopping here...")
)
# Otherwise note it and continue
...
"""
# Note: Do NOT use `from __future__ import annotations` here
# as it breaks the xmlify decorator which needs concrete types
from dataclasses import dataclass
from third_party.xmlable import xmlify
@xmlify
@dataclass
class BudgetWarning:
"""
System warning about token budget consumption.
Sent to agents when their thread crosses budget thresholds.
Attributes:
percent_used: Current percentage of budget consumed (0-100)
tokens_used: Total tokens consumed so far
tokens_remaining: Tokens remaining before exhaustion
max_tokens: Total budget for this thread
severity: Warning level (warning, critical, final)
message: Human-readable description
"""
percent_used: float
tokens_used: int
tokens_remaining: int
max_tokens: int
severity: str # "warning" (75%), "critical" (90%), "final" (95%)
message: str
# Default thresholds (can be configured)
DEFAULT_THRESHOLDS = {
75: "warning", # 75% - early warning
90: "critical", # 90% - wrap up soon
95: "final", # 95% - last chance
}