- Create BudgetWarning primitive payload (75%, 90%, 95% thresholds)
- Add threshold tracking to ThreadBudget with triggered_thresholds set
- Change consume() to return (budget, crossed_thresholds) tuple
- Wire warning injection in LLM router when thresholds crossed
- Add 15 new tests for threshold detection and warning injection
Agents now receive BudgetWarning messages when approaching their token limit,
allowing them to design contingencies (summarize, escalate, save state).
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Endpoints:
- GET /api/v1/usage - Overview with totals, per-agent, per-model breakdown
- GET /api/v1/usage/threads - List all thread budgets sorted by usage
- GET /api/v1/usage/threads/{id} - Single thread budget details
- GET /api/v1/usage/agents/{id} - Usage totals for specific agent
- GET /api/v1/usage/models/{model} - Usage totals for specific model
- POST /api/v1/usage/reset - Reset all usage tracking
Models:
- UsageTotals, UsageOverview, UsageResponse
- ThreadBudgetInfo, ThreadBudgetListResponse
- AgentUsageInfo, ModelUsageInfo
Also adds has_budget() method to ThreadBudgetRegistry for checking
if a thread exists without auto-creating it.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>