- Create BudgetWarning primitive payload (75%, 90%, 95% thresholds)
- Add threshold tracking to ThreadBudget with triggered_thresholds set
- Change consume() to return (budget, crossed_thresholds) tuple
- Wire warning injection in LLM router when thresholds crossed
- Add 15 new tests for threshold detection and warning injection
Agents now receive BudgetWarning messages when approaching their token limit,
allowing them to design contingencies (summarize, escalate, save state).
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When threads terminate (handler returns None or chain exhausted),
the pump now calls budget_registry.cleanup_thread() to:
- Free memory for completed threads
- Return final budget for logging/billing
- Log token usage at debug level
This ensures budgets don't accumulate for completed conversations.
Also adds:
- has_budget() method to check if thread exists without creating
- Tests for cleanup behavior
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>