xml-pipeline/xml_pipeline/llm
dullfig 8b11323a8b Add token budget enforcement and usage tracking
Token Budget System:
- ThreadBudgetRegistry tracks per-thread token usage with configurable limits
- BudgetExhaustedError raised when thread exceeds max_tokens_per_thread
- Integrates with LLMRouter to block LLM calls when budget exhausted
- Automatic cleanup when threads are pruned

Usage Tracking (for production billing):
- UsageTracker emits events after each LLM completion
- Subscribers receive UsageEvent with tokens, latency, estimated cost
- Cost estimation for common models (Grok, Claude, GPT, etc.)
- Aggregate stats by agent, model, and totals

Configuration:
- max_tokens_per_thread in organism.yaml (default 100k)
- LLMRouter.complete() accepts thread_id and metadata parameters

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-27 21:07:43 -08:00
..
__init__.py Add token budget enforcement and usage tracking 2026-01-27 21:07:43 -08:00
backend.py Rename agentserver to xml_pipeline, add console example 2026-01-19 21:41:19 -08:00
router.py Add token budget enforcement and usage tracking 2026-01-27 21:07:43 -08:00
token_bucket.py Rename agentserver to xml_pipeline, add console example 2026-01-19 21:41:19 -08:00
usage_tracker.py Add token budget enforcement and usage tracking 2026-01-27 21:07:43 -08:00