From 31d799fafe7949d9e704f603738e2c7ce95ac300 Mon Sep 17 00:00:00 2001
From: dullfig <dan.u@grmcompany.com>
Date: Tue, 27 Jan 2026 21:13:35 -0800
Subject: [PATCH] Document token budget and usage tracking in CLAUDE.md

- Add ThreadBudgetRegistry and UsageTracker to key modules table
- Add new section explaining three-layer token tracking architecture
- Document thread budget enforcement with code examples
- Document usage tracking subscription pattern for billing
- Update organism.yaml example with max_tokens_per_thread setting

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---
 CLAUDE.md | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)
diff --git a/CLAUDE.md b/CLAUDE.md
index 6df8ea0..6ecc49f 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -127,6 +127,8 @@ AgentServer implements a stream-based message pump where all communication flows
 | ThreadRegistry | `xml_pipeline/message_bus/thread_registry.py` | Maps opaque UUIDs to call chains |
 | SystemPipeline | `xml_pipeline/message_bus/system_pipeline.py` | External message injection (console, webhooks) |
 | LLMRouter | `xml_pipeline/llm/router.py` | Multi-backend LLM routing with failover |
+| ThreadBudgetRegistry | `xml_pipeline/message_bus/budget_registry.py` | Per-thread token limits and enforcement |
+| UsageTracker | `xml_pipeline/llm/usage_tracker.py` | Production billing and gas usage metering |
 | PromptRegistry | `xml_pipeline/platform/prompt_registry.py` | Immutable system prompt storage |
 | ContextBuffer | `xml_pipeline/memory/context_buffer.py` | Conversation history per thread |
 
@@ -235,6 +237,7 @@ See @docs/configuration.md for full reference.
 organism:
   name: my-organism
   port: 8765
+  max_tokens_per_thread: 100000  # Token budget per thread
 
 llm:
   strategy: failover
@@ -261,6 +264,65 @@ listeners:
 - **Envelope Injection:** `<from>`, `<thread>`, `<to>` always set by system, never by handlers
 - **OOB Channel:** Privileged commands use separate localhost-only channel
 
+## Token Budget & Usage Tracking
+
+The platform provides three layers of token tracking:
+
+| Layer | Module | Purpose |
+|-------|--------|---------|
+| Per-agent | `LLMRouter._agent_usage` | Internal token tracking per agent |
+| Per-thread | `ThreadBudgetRegistry` | Enforcement limits (blocks LLM calls) |
+| Platform | `UsageTracker` | Production billing and gas metering |
+
+### Thread Budget Enforcement
+
+Each thread has a token budget (default: 100,000 tokens). LLM calls are blocked when exhausted:
+
+```python
+from xml_pipeline.message_bus import get_budget_registry, BudgetExhaustedError
+
+registry = get_budget_registry()
+
+# Check before LLM call (automatic in router)
+try:
+    registry.check_budget(thread_id, estimated_tokens=1000)
+except BudgetExhaustedError as e:
+    print(f"Thread {e.thread_id} exhausted: {e.used}/{e.max_tokens}")
+```
+
+Configure via `organism.yaml`:
+
+```yaml
+organism:
+  name: my-organism
+  max_tokens_per_thread: 100000  # Default
+```
+
+### Usage Tracking (Billing)
+
+Subscribe to usage events for production billing:
+
+```python
+from xml_pipeline.llm import get_usage_tracker
+
+tracker = get_usage_tracker()
+
+# Subscribe to events (for billing webhook, database, etc.)
+def record_usage(event):
+    billing_db.record(
+        org_id=event.metadata.get("org_id"),
+        tokens=event.total_tokens,
+        cost=event.estimated_cost,  # USD estimate
+    )
+
+tracker.subscribe(record_usage)
+
+# Query totals
+totals = tracker.get_totals()
+print(f"Total tokens: {totals['total_tokens']}")
+print(f"Total cost: ${totals['total_cost']}")
+```
+
 ## Message Envelope Format
 
 All messages use the universal envelope with namespace `https://xml-pipeline.org/ns/envelope/v1`: