Document token budget and usage tracking in CLAUDE.md
- Add ThreadBudgetRegistry and UsageTracker to key modules table - Add new section explaining three-layer token tracking architecture - Document thread budget enforcement with code examples - Document usage tracking subscription pattern for billing - Update organism.yaml example with max_tokens_per_thread setting Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
parent
8b11323a8b
commit
31d799fafe
1 changed files with 62 additions and 0 deletions
62
CLAUDE.md
62
CLAUDE.md
|
|
@ -127,6 +127,8 @@ AgentServer implements a stream-based message pump where all communication flows
|
|||
| ThreadRegistry | `xml_pipeline/message_bus/thread_registry.py` | Maps opaque UUIDs to call chains |
|
||||
| SystemPipeline | `xml_pipeline/message_bus/system_pipeline.py` | External message injection (console, webhooks) |
|
||||
| LLMRouter | `xml_pipeline/llm/router.py` | Multi-backend LLM routing with failover |
|
||||
| ThreadBudgetRegistry | `xml_pipeline/message_bus/budget_registry.py` | Per-thread token limits and enforcement |
|
||||
| UsageTracker | `xml_pipeline/llm/usage_tracker.py` | Production billing and gas usage metering |
|
||||
| PromptRegistry | `xml_pipeline/platform/prompt_registry.py` | Immutable system prompt storage |
|
||||
| ContextBuffer | `xml_pipeline/memory/context_buffer.py` | Conversation history per thread |
|
||||
|
||||
|
|
@ -235,6 +237,7 @@ See @docs/configuration.md for full reference.
|
|||
organism:
|
||||
name: my-organism
|
||||
port: 8765
|
||||
max_tokens_per_thread: 100000 # Token budget per thread
|
||||
|
||||
llm:
|
||||
strategy: failover
|
||||
|
|
@ -261,6 +264,65 @@ listeners:
|
|||
- **Envelope Injection:** `<from>`, `<thread>`, `<to>` always set by system, never by handlers
|
||||
- **OOB Channel:** Privileged commands use separate localhost-only channel
|
||||
|
||||
## Token Budget & Usage Tracking
|
||||
|
||||
The platform provides three layers of token tracking:
|
||||
|
||||
| Layer | Module | Purpose |
|
||||
|-------|--------|---------|
|
||||
| Per-agent | `LLMRouter._agent_usage` | Internal token tracking per agent |
|
||||
| Per-thread | `ThreadBudgetRegistry` | Enforcement limits (blocks LLM calls) |
|
||||
| Platform | `UsageTracker` | Production billing and gas metering |
|
||||
|
||||
### Thread Budget Enforcement
|
||||
|
||||
Each thread has a token budget (default: 100,000 tokens). LLM calls are blocked when exhausted:
|
||||
|
||||
```python
|
||||
from xml_pipeline.message_bus import get_budget_registry, BudgetExhaustedError
|
||||
|
||||
registry = get_budget_registry()
|
||||
|
||||
# Check before LLM call (automatic in router)
|
||||
try:
|
||||
registry.check_budget(thread_id, estimated_tokens=1000)
|
||||
except BudgetExhaustedError as e:
|
||||
print(f"Thread {e.thread_id} exhausted: {e.used}/{e.max_tokens}")
|
||||
```
|
||||
|
||||
Configure via `organism.yaml`:
|
||||
|
||||
```yaml
|
||||
organism:
|
||||
name: my-organism
|
||||
max_tokens_per_thread: 100000 # Default
|
||||
```
|
||||
|
||||
### Usage Tracking (Billing)
|
||||
|
||||
Subscribe to usage events for production billing:
|
||||
|
||||
```python
|
||||
from xml_pipeline.llm import get_usage_tracker
|
||||
|
||||
tracker = get_usage_tracker()
|
||||
|
||||
# Subscribe to events (for billing webhook, database, etc.)
|
||||
def record_usage(event):
|
||||
billing_db.record(
|
||||
org_id=event.metadata.get("org_id"),
|
||||
tokens=event.total_tokens,
|
||||
cost=event.estimated_cost, # USD estimate
|
||||
)
|
||||
|
||||
tracker.subscribe(record_usage)
|
||||
|
||||
# Query totals
|
||||
totals = tracker.get_totals()
|
||||
print(f"Total tokens: {totals['total_tokens']}")
|
||||
print(f"Total cost: ${totals['total_cost']}")
|
||||
```
|
||||
|
||||
## Message Envelope Format
|
||||
|
||||
All messages use the universal envelope with namespace `https://xml-pipeline.org/ns/envelope/v1`:
|
||||
|
|
|
|||
Loading…
Reference in a new issue