Document token budget and usage tracking in CLAUDE.md

- Add ThreadBudgetRegistry and UsageTracker to key modules table
- Add new section explaining three-layer token tracking architecture
- Document thread budget enforcement with code examples
- Document usage tracking subscription pattern for billing
- Update organism.yaml example with max_tokens_per_thread setting

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
dullfig 2026-01-27 21:13:35 -08:00
parent 8b11323a8b
commit 31d799fafe

View file

@ -127,6 +127,8 @@ AgentServer implements a stream-based message pump where all communication flows
| ThreadRegistry | `xml_pipeline/message_bus/thread_registry.py` | Maps opaque UUIDs to call chains | | ThreadRegistry | `xml_pipeline/message_bus/thread_registry.py` | Maps opaque UUIDs to call chains |
| SystemPipeline | `xml_pipeline/message_bus/system_pipeline.py` | External message injection (console, webhooks) | | SystemPipeline | `xml_pipeline/message_bus/system_pipeline.py` | External message injection (console, webhooks) |
| LLMRouter | `xml_pipeline/llm/router.py` | Multi-backend LLM routing with failover | | LLMRouter | `xml_pipeline/llm/router.py` | Multi-backend LLM routing with failover |
| ThreadBudgetRegistry | `xml_pipeline/message_bus/budget_registry.py` | Per-thread token limits and enforcement |
| UsageTracker | `xml_pipeline/llm/usage_tracker.py` | Production billing and gas usage metering |
| PromptRegistry | `xml_pipeline/platform/prompt_registry.py` | Immutable system prompt storage | | PromptRegistry | `xml_pipeline/platform/prompt_registry.py` | Immutable system prompt storage |
| ContextBuffer | `xml_pipeline/memory/context_buffer.py` | Conversation history per thread | | ContextBuffer | `xml_pipeline/memory/context_buffer.py` | Conversation history per thread |
@ -235,6 +237,7 @@ See @docs/configuration.md for full reference.
organism: organism:
name: my-organism name: my-organism
port: 8765 port: 8765
max_tokens_per_thread: 100000 # Token budget per thread
llm: llm:
strategy: failover strategy: failover
@ -261,6 +264,65 @@ listeners:
- **Envelope Injection:** `<from>`, `<thread>`, `<to>` always set by system, never by handlers - **Envelope Injection:** `<from>`, `<thread>`, `<to>` always set by system, never by handlers
- **OOB Channel:** Privileged commands use separate localhost-only channel - **OOB Channel:** Privileged commands use separate localhost-only channel
## Token Budget & Usage Tracking
The platform provides three layers of token tracking:
| Layer | Module | Purpose |
|-------|--------|---------|
| Per-agent | `LLMRouter._agent_usage` | Internal token tracking per agent |
| Per-thread | `ThreadBudgetRegistry` | Enforcement limits (blocks LLM calls) |
| Platform | `UsageTracker` | Production billing and gas metering |
### Thread Budget Enforcement
Each thread has a token budget (default: 100,000 tokens). LLM calls are blocked when exhausted:
```python
from xml_pipeline.message_bus import get_budget_registry, BudgetExhaustedError
registry = get_budget_registry()
# Check before LLM call (automatic in router)
try:
registry.check_budget(thread_id, estimated_tokens=1000)
except BudgetExhaustedError as e:
print(f"Thread {e.thread_id} exhausted: {e.used}/{e.max_tokens}")
```
Configure via `organism.yaml`:
```yaml
organism:
name: my-organism
max_tokens_per_thread: 100000 # Default
```
### Usage Tracking (Billing)
Subscribe to usage events for production billing:
```python
from xml_pipeline.llm import get_usage_tracker
tracker = get_usage_tracker()
# Subscribe to events (for billing webhook, database, etc.)
def record_usage(event):
billing_db.record(
org_id=event.metadata.get("org_id"),
tokens=event.total_tokens,
cost=event.estimated_cost, # USD estimate
)
tracker.subscribe(record_usage)
# Query totals
totals = tracker.get_totals()
print(f"Total tokens: {totals['total_tokens']}")
print(f"Total cost: ${totals['total_cost']}")
```
## Message Envelope Format ## Message Envelope Format
All messages use the universal envelope with namespace `https://xml-pipeline.org/ns/envelope/v1`: All messages use the universal envelope with namespace `https://xml-pipeline.org/ns/envelope/v1`: