diff --git a/README.md b/README.md
index b7d3f97..b5747ba 100644
--- a/README.md
+++ b/README.md
@@ -1,7 +1,7 @@
-# AgentServer — The Living Substrate (v2.0)
+# AgentServer — The Living Substrate (v2.1)
 ***"It just works... safely."***
 
-**January 03, 2026**  
+**January 06, 2026**  
 **Architecture: Autonomous Schema-Driven, Turing-Complete Multi-Agent Organism**
 
 ## The Rant
@@ -15,15 +15,15 @@ This project chooses XML deliberately. The organism enforces contracts exactly (
 Read the full rant [here](docs/why-not-json.md) for the history, pitfalls, and why XML wins permanently.
 
 ## What It Is
-AgentServer is a production-ready substrate for the `xml-pipeline` nervous system. Version 2.0 stabilizes the design around exact XSD validation, typed dataclass handlers, mandatory hierarchical threading, and strict out-of-band privileged control.
+AgentServer is a production-ready substrate for the `xml-pipeline` nervous system. Version 2.1 evolves the design around parallel per-listener pipelines, true concurrent broadcast, opaque UUID threading for privacy, and blind agent self-iteration—all while preserving strict validation and handler purity.
 
-See [Core Architectural Principles](docs/core-principles-v2.0.md) for the single canonical source of truth.
+See [Core Architectural Principles](docs/core-principles-v2.1.md) for the single canonical source of truth.
 
 ## Core Philosophy
 - **Autonomous DNA:** Listeners declare their contract via `@xmlify` dataclasses; the organism auto-generates XSDs, examples, and tool prompts.
 - **Schema-Locked Intelligence:** Payloads validated directly against XSD (lxml) → deserialized to typed instances → pure handlers.
 - **Multi-Response Tolerance:** Handlers return raw bytes; bus wraps in `<dummy></dummy>` and extracts multiple payloads (perfect for parallel tool calls or dirty LLM output).
-- **Computational Sovereignty:** Turing-complete via self-calls, subthreading primitives, and visible reasoning — all bounded by thread hierarchy and local-only control.
+- **Computational Sovereignty:** Turing-complete via blind self-calls, subthreading primitives, concurrent broadcast, and visible reasoning — all bounded by private thread hierarchy and local-only control.
 
 ## Developer Experience — Create a Listener in 12 Lines
 **No manual schemas. No brittle JSON conventions. No hand-written prompts.**  
@@ -59,19 +59,23 @@ Unlike rigid platforms requiring custom mappings or fragile item structures, thi
 ### 1. The Autonomous Schema Layer
 - Dataclass → cached XSD + example + rich tool prompt (mandatory description + field docs).
 - Namespaces: `https://xml-pipeline.org/ns/<category>/<name>/v1` (served live via domain for discoverability).
+- Multiple listeners per root tag supported (broadcast parallelism).
 
 ### 2. Thread-Based Lifecycle & Reasoning
-- Mandatory `<thread/>` with hierarchical IDs for reliable subthreading and audit trails.
-- LLM agents reason via open self-calls and optional `<todo-until/>`.
+- Opaque `<thread/>` UUIDs with private hierarchical path registry for reliable subthreading, audit trails, and topology privacy.
+- LLM agents use unique root tags for blind self-iteration (no name knowledge or `<to/>` needed).
+- Agents reason via open self-calls, multi-payload parallelism, and optional `<todo-until/>` scaffolding in visible text.
 - All thought steps visible as messages — no hidden state.
 
 ### 3. Message Pump
-- Single linear pipeline with repair, C14N, XSD validation, deserialization, handler execution, and multi-payload extraction.
-- Supports clean tools and forgiving LLM streams alike.
-- Thread-base message queue with bounded memory.
+- Parallel preprocessing pipelines (one per listener) with central async pump orchestration.
+- True concurrency: pipeline tasks parallel, broadcast handlers via asyncio.gather.
+- Single linear flow per pipeline with repair, C14N, XSD validation, deserialization, handler execution, and multi-payload extraction.
+- Supports clean tools, forgiving LLM streams, and natural broadcast alike.
+- Thread-based message queue with bounded memory and fair scheduling.
 
 ### 4. Structural Control
-- Bootstrap from `organism.yaml`.
+- Bootstrap from `organism.yaml` (including unique root enforcement for agents).
 - Runtime changes (hot-reload, add/remove listeners) via local-only OOB channel (localhost WSS or Unix socket — GUI-ready).
 - Main bus oblivious to privileged ops.
 
@@ -86,9 +90,9 @@ Unlike rigid platforms requiring custom mappings or fragile item structures, thi
 - **Format:** Exclusive C14N XML (wire sovereign).
 
 ## Why This Matters
-AgentServer v2.0 is a bounded, auditable, owner-controlled organism where the **XSD is the security**, the **thread is the memory**, and the **OOB channel is the sovereignty**.
+AgentServer v2.1 is a bounded, auditable, owner-controlled organism where the **XSD is the security**, the **private thread registry is the memory**, and the **OOB channel is the sovereignty**.
 
 One port. Many bounded minds. Autonomous yet obedient evolution. 🚀
 
 ---
-*XML wins. Safely. Permanently.*
\ No newline at end of file
+*XML wins. Safely. Permanently.*
diff --git a/README_v2.0.md b/README_v2.0.md
new file mode 100644
index 0000000..b7d3f97
--- /dev/null
+++ b/README_v2.0.md
@@ -0,0 +1,94 @@
+# AgentServer — The Living Substrate (v2.0)
+***"It just works... safely."***
+
+**January 03, 2026**  
+**Architecture: Autonomous Schema-Driven, Turing-Complete Multi-Agent Organism**
+
+## The Rant
+**Why XML?**  
+[Why not JSON?](docs/why-not-json.md)
+
+XML is the sovereign wire format — standards-based, self-describing, attack-resistant, and evolvable without drift. JSON was a quick hack that escaped into the wild and became the default for everything, including AI tool calling, where its brittleness causes endless prompt surgery and validation headaches.
+
+This project chooses XML deliberately. The organism enforces contracts exactly (XSD validation, no transcription bugs), tolerates dirty streams (repair + dummy extraction), and keeps reasoning visible. No fragile conventions. No escaping hell. Just bounded, auditable computation.
+
+Read the full rant [here](docs/why-not-json.md) for the history, pitfalls, and why XML wins permanently.
+
+## What It Is
+AgentServer is a production-ready substrate for the `xml-pipeline` nervous system. Version 2.0 stabilizes the design around exact XSD validation, typed dataclass handlers, mandatory hierarchical threading, and strict out-of-band privileged control.
+
+See [Core Architectural Principles](docs/core-principles-v2.0.md) for the single canonical source of truth.
+
+## Core Philosophy
+- **Autonomous DNA:** Listeners declare their contract via `@xmlify` dataclasses; the organism auto-generates XSDs, examples, and tool prompts.
+- **Schema-Locked Intelligence:** Payloads validated directly against XSD (lxml) → deserialized to typed instances → pure handlers.
+- **Multi-Response Tolerance:** Handlers return raw bytes; bus wraps in `<dummy></dummy>` and extracts multiple payloads (perfect for parallel tool calls or dirty LLM output).
+- **Computational Sovereignty:** Turing-complete via self-calls, subthreading primitives, and visible reasoning — all bounded by thread hierarchy and local-only control.
+
+## Developer Experience — Create a Listener in 12 Lines
+**No manual schemas. No brittle JSON conventions. No hand-written prompts.**  
+Just declare a dataclass contract and a one-line human description. The organism handles validation, XSD, examples, and tool prompts automatically.
+
+```python
+from xmlable import xmlify
+from dataclasses import dataclass
+from xml_pipeline import Listener, bus  # bus is the global MessageBus
+
+@xmlify
+@dataclass
+class AddPayload:
+    a: int
+    b: int
+
+def add_handler(payload: AddPayload) -> bytes:
+    result = payload.a + payload.b
+    return f"<result>{result}</result>".encode("utf-8")
+
+Listener(
+    payload_class=AddPayload,
+    handler=add_handler,
+    name="calculator.add",
+    description="Adds two integers and returns their sum."
+).register()  # ← Boom: XSD, example, prompt auto-generated + registered
+```
+
+The organism now speaks `<add>` — fully validated, typed, and discoverable.<br/>
+Unlike rigid platforms requiring custom mappings or fragile item structures, this is pure Python — typed, testable, and sovereign.
+
+## Key Features
+### 1. The Autonomous Schema Layer
+- Dataclass → cached XSD + example + rich tool prompt (mandatory description + field docs).
+- Namespaces: `https://xml-pipeline.org/ns/<category>/<name>/v1` (served live via domain for discoverability).
+
+### 2. Thread-Based Lifecycle & Reasoning
+- Mandatory `<thread/>` with hierarchical IDs for reliable subthreading and audit trails.
+- LLM agents reason via open self-calls and optional `<todo-until/>`.
+- All thought steps visible as messages — no hidden state.
+
+### 3. Message Pump
+- Single linear pipeline with repair, C14N, XSD validation, deserialization, handler execution, and multi-payload extraction.
+- Supports clean tools and forgiving LLM streams alike.
+- Thread-base message queue with bounded memory.
+
+### 4. Structural Control
+- Bootstrap from `organism.yaml`.
+- Runtime changes (hot-reload, add/remove listeners) via local-only OOB channel (localhost WSS or Unix socket — GUI-ready).
+- Main bus oblivious to privileged ops.
+
+### 5. Federation & Introspection
+- YAML-declared gateways with trusted keys.
+- Controlled meta queries (schema/example/prompt/capability list).
+
+## Technical Stack
+- **Validation & Parsing:** lxml (XSD, C14N, repair) + xmlable (round-trip).
+- **Protocol:** Mandatory WSS (TLS) + TOTP on main port.
+- **Identity:** Ed25519 (signing, federation, privileged).
+- **Format:** Exclusive C14N XML (wire sovereign).
+
+## Why This Matters
+AgentServer v2.0 is a bounded, auditable, owner-controlled organism where the **XSD is the security**, the **thread is the memory**, and the **OOB channel is the sovereignty**.
+
+One port. Many bounded minds. Autonomous yet obedient evolution. 🚀
+
+---
+*XML wins. Safely. Permanently.*
\ No newline at end of file
diff --git a/docs/core-principles-v2.0.md b/docs/core-principles-v2.0.md
index ba89e75..d940b2a 100644
--- a/docs/core-principles-v2.0.md
+++ b/docs/core-principles-v2.0.md
@@ -17,8 +17,8 @@ These principles are the single canonical source of truth for the project. All d
 - On response generation (after handler execution and multi-payload extraction):
   - The dispatcher injects <from> using the executing listener's registered name (e.g., "calculator.add" or "researcher").
   - For meta/primitive responses: <from> is injected as "core".
-- <thread> is inherited from the incoming message (or assigned/updated for primitives like spawn-thread).
-- <to> remains optional and rarely used.
+- &ltthread&gt is inherited from the incoming message (or assigned/updated for primitives like spawn-thread).
+- &ltto&gt remains optional and rarely used.
 - This ensures every enveloped message has a trustworthy, auditable <from> without handler involvement, preventing spoofing and keeping capability code minimal/testable.
 
 ## Configuration & Composition
@@ -49,6 +49,8 @@ These principles are the single canonical source of truth for the project. All d
   - Each extracted payload wrapped in separate response envelope (inherits thread/from, optional new subthread if primitive used)
   - Enveloped responses buffered and sent sequentially
 - Supports single clean response, multi-payload emission (parallel tools/thoughts), and dirty LLM output tolerance.
+- Message pump tracks token budgets per agent and thread, enforcing token limits and preventing abuse. The LLM abstraction layer informs the message bus on the actual token usage.
+- Message pump uses asynchronous non-blocking I/O for maximum throughput.
 
 ## Reasoning & Iteration
 - LLM agents iterate via open self-calls (same root tag, same thread ID).
@@ -95,4 +97,7 @@ These principles are the single canonical source of truth for the project. All d
 - Ensures isolation across conversations, automatic cleanup on idle, and minimal mutable state.
 - Handler closes over or receives UUID for access — still oblivious to readable path.
 
+## Resource Stewardship 
+- The Message Pump ensures fair execution and prevents "Paperclip" runaway scenarios via internal Thread-Level Scheduling. Every thread is subject to Token-Rate Monitoring and Fair-Share Queuing, ensuring that a high-volume agent (like a deep-thinking LLM) cannot block high-priority system events or starve simpler organs (like tools).
+
 These principles are now locked. All existing docs will be updated to match this file exactly. Future changes require explicit discussion and amendment here first.
\ No newline at end of file
diff --git a/docs/core-principles-v2.1.md b/docs/core-principles-v2.1.md
new file mode 100644
index 0000000..fb2fd84
--- /dev/null
+++ b/docs/core-principles-v2.1.md
@@ -0,0 +1,112 @@
+# AgentServer v2.1 — Core Architectural Principles
+**January 06, 2026**  
+**Architecture: Autonomous Schema-Driven, Turing-Complete Multi-Agent Organism**
+
+These principles are the single canonical source of truth for the project. All documentation, code, and future decisions must align with this file. This version incorporates Message Pump v2.1 parallelism and refines agent iteration patterns for blind, name-oblivious self-routing.
+
+## Identity & Communication
+- All traffic uses the universal `<message>` envelope defined in `envelope.xsd` (namespace `https://xml-pipeline.org/ns/envelope/v1`).
+- Mandatory `<from/>` and `<thread/>` (opaque UUID strings; system privately maps to hierarchical paths for subthreading and audit trails).
+- Optional `<to/>` (for rare direct routing; most flows use payload root tag with broadcast semantics).
+- Exclusive C14N on ingress and egress.
+- Malformed XML repaired on ingress; repairs logged in `<huh/>` metadata.
+
+## Identity Injection & Handler Purity
+- Handlers are pure, stateless functions with no knowledge of routing, thread context, their own registered name, or organism topology.
+- On ingress (external or gateway messages): `<from>` is provided and authenticated by the client/gateway (enforced by envelope validation).
+- On response generation (after handler execution and multi-payload extraction):
+  - The message pump injects `<from>` using the executing listener's registered name (e.g., "calculator.add" or "researcher").
+  - For meta/primitive responses: `<from>` is injected as "core".
+- `<thread>` is inherited from the incoming message (or assigned/updated for primitives like spawn-thread).
+- `<to>` remains optional and rarely used.
+- This ensures every enveloped message has a trustworthy, auditable `<from>` without handler involvement, preventing spoofing and keeping capability code minimal/testable.
+
+## Configuration & Composition
+- YAML file (`organism.yaml`) is the bootstrap source of truth, loaded at startup.
+- Defines initial listeners, agents, gateways, meta privileges, OOB channel configuration, and routing table (including multiple listeners per root tag).
+- LLM-based agents must use unique root tags (enforced on registration/hot-reload) to enable blind self-iteration.
+- Runtime structural changes (add/remove listeners, rewire agents, etc.) via local-only privileged commands on the dedicated OOB channel (hot-reload capability).
+- No remote or unprivileged structural changes ever.
+
+## Autonomous Schema Layer
+- Listeners defined by `@xmlify`-decorated dataclass (payload contract) + pure handler function.
+- Mandatory human-readable description string (short "what this does" blurb for tool prompt lead-in).
+- Registration (at startup or via hot-reload) automatically generates:
+  - XSD cached on disk (`schemas/<name>/v1.xsd`)
+  - Example XML
+  - Tool description prompt fragment (includes description, params with field docs if present, example input)
+- All capability namespaces under `https://xml-pipeline.org/ns/<category>/<name>/v1`.
+- Root element derived from payload class name (lowercase) or explicit.
+- Multiple listeners may register for the same root tag (enabling broadcast parallelism); LLM agents require unique root tags.
+
+## Message Pump
+- Parallel preprocessing pipelines (one per registered listener): ingress → repair → C14N → envelope validation → payload extraction → XSD validation → deserialization → error injection on failure.
+- Central async message pump orchestrates:
+  - Gathering ready messages from pipeline outputs
+  - Routing lookup: direct (`<to/>`) or broadcast (all listeners for root tag; unique roots naturally self-route for agents)
+  - Launching concurrent thin dispatchers (`await handler(msg)`)
+  - Response processing: multi-payload extraction (dummy wrap → parse → extract), envelope creation with `<from>` injection, re-injection to target pipelines
+- Thin, stateless dispatcher: pure async delivery mechanism with no loops or orchestration.
+- Supports true parallelism: pipeline preprocessing concurrent, broadcast handlers concurrent via asyncio.gather.
+- Validation failures inject `<huh>` error elements (LLM-friendly self-correction).
+- Message pump tracks token budgets per agent and thread, enforcing limits and preventing abuse. The LLM abstraction layer informs the message bus on actual token usage.
+- Message pump uses asynchronous non-blocking I/O for maximum throughput, with provisions for concurrency limits, fair scheduling, and backpressure.
+
+## Reasoning & Iteration
+- LLM agents iterate via blind self-calls: with unique root tags, emitting payloads using their own root tag automatically routes back to themselves—no `<to/>`, name knowledge, or special primitives required.
+- This enables natural looping, multi-step continuation, and parallelism via multi-payload emission (e.g., simultaneous tool calls or branched reasoning).
+- Conversation thread = complete memory and audit trail (all messages logged).
+- Subthreading natively supported via private hierarchical path registry (system appends/prunes listener names on delegation/responses; agents see only opaque UUIDs).
+- Optional structured constructs like `<todo-until/>` are encouraged in visible reasoning text (e.g., inside `<answer>`) for planning clarity and auditability but are not system-interpreted.
+- Broadcast enables natural parallelism (e.g., simultaneous tool calls) without agent-managed fan-out.
+- Thread path privacy: Agents remain oblivious to topology, preventing leakage or probing.
+- No hidden loops or state machines; all reasoning steps are visible messages.
+
+## Security & Sovereignty
+- Privileged messages (per `privileged-msg.xsd`) handled exclusively on dedicated OOB channel.
+- OOB channel bound to localhost by default (safe for local GUI); separate port/socket from main bus.
+- Main message pump and dispatcher oblivious to privileged operations — no routing or handling for privileged roots.
+- Remote privileged attempts impossible (channel not exposed); any leak to main port logged as security event and dropped.
+- Ed25519 identity key used for envelope signing, federation auth, and privileged command verification.
+- No agent may modify organism structure, register listeners, or access host resources beyond declared scope.
+- Opaque thread UUIDs + private path registry prevent topology disclosure.
+- “No Paperclippers” manifesto injected as first system message for every LLM-based listener.
+
+## Federation
+- Gateways declared in YAML with trusted remote public key.
+- Remote tools referenced by gateway name in agent tool lists.
+- Regular messages flow bidirectionally; privileged messages never forwarded or accepted.
+
+## Introspection (Meta)
+- Controlled via YAML flags (`allow_list_capabilities`, `allow_schema_requests`, etc.).
+- Supports `request-schema`, `request-example`, `request-prompt`, `list-capabilities`.
+- Remote meta queries optionally allowed per YAML (federation peers).
+
+## Technical Constraints
+- Mandatory WSS (TLS) + TOTP on main port.
+- OOB channel WSS or Unix socket, localhost-default.
+- Internal: lxml trees → XSD validation → xmlable deserialization → dataclass → handler → bytes → dummy extraction → multi-envelope re-injection.
+- Single process, async non-blocking.
+- XML is the sovereign wire format; everything else is implementation detail.
+
+## Scheduled Computation
+- Timers and delays implemented as normal listeners using async sleeps.
+- Caller idles naturally; wakeup messages bubble back via standard tracing.
+- Enables recurrent tasks (e.g., periodic monitoring) without blocking or external schedulers.
+
+## Bounded Stateful Listeners
+- Pure tools remain stateless.
+- Stateful capabilities (e.g., calculator memory, game state) store data per thread path UUID.
+- Ensures isolation across conversations, automatic cleanup on idle, and minimal mutable state.
+- Handler closes over or receives UUID for access — still oblivious to readable path.
+
+## Resource Stewardship 
+- The Message Pump ensures fair execution and prevents "Paperclip" runaway scenarios via Thread-Level Scheduling and Concurrency Controls. Every thread is subject to Token-Rate Monitoring and Fair-Share Queuing, ensuring that a high-volume agent cannot block high-priority events or starve simpler organs.
+
+These principles are now locked for v2.1. The Message Pump v2.1 specification remains the canonical detail for pump behavior. Future changes require explicit discussion and amendment here first.
+
+---
+
+This integrates the blind self-iteration pattern cleanly—no contradictions, stronger obliviousness, and explicit guidance on `<todo-until/>`. The unique-root enforcement for agents is called out in Configuration and Schema layers.
+
+Ready to roll with this as canonical. If you want any final phrasing tweaks or to add YAML examples, just say. 🚀
\ No newline at end of file
diff --git a/docs/message-pump.md b/docs/message-pump-v2.0.md
similarity index 85%
rename from docs/message-pump.md
rename to docs/message-pump-v2.0.md
index 74e8ecb..5b36fb3 100644
--- a/docs/message-pump.md
+++ b/docs/message-pump-v2.0.md
@@ -47,8 +47,7 @@ flowchart TD
     subgraph MessagePump
     subgraph Init
     start([Start])
-    raw[/Optional<br>Raw Bytes/]
-    wrapstart["Wrap<br>&ltstart&gt{...}&lt/start&gt"]
+    wrapstart["Boot Msg<br>&ltmessage&gt{...}&lt/message&gt"]
     end
     enq1([QUEUE 1])
     rawwaiting{Raw<br>Msg<br>Waiting?}
@@ -79,7 +78,7 @@ flowchart TD
     end
     end
     
-    start --> raw --> wrapstart --> enq1 --> rawwaiting 
+    start --> wrapstart --> enq1 --> rawwaiting 
     rawwaiting --> |NO| waitRaw
     rawwaiting ---> |YES| extract
     extract --> split --> foreach
@@ -96,6 +95,16 @@ flowchart TD
 ```
 ## Detailed Stages (Per-Message)
 
+### Boot Message
+   - Since all agents are listeners, there would be no way for a client to initiate a message pump.
+   - The boot message is a dummy message that is enqueued to the root thread buffer. Any listener may chose to register a root tag for it.
+   - The root thread buffer is the only one that is drained by the dispatcher loop.
+   - if a listener (like a human agent) is registered for the boot message, it will receive the boot message and then async await for keyboard input.
+### Queue 1
+   - The first buffer holds raw unprocessed messages from the network.
+### Queue 2
+   - The second buffer holds messages that have been processed and are ready to be sent back to the network.
+   - 
 1. **Ingress/Enqueue**: Raw bytes → repair → preliminary tree → enqueue to target thread buffer.
 
 2. **Dispatcher Loop**: Single async non-blocking loop selects next message from per-thread queues (breadth-first default for fairness).
diff --git a/docs/token-scheduling-issues.md b/docs/token-scheduling-issues.md
new file mode 100644
index 0000000..b1b57bd
--- /dev/null
+++ b/docs/token-scheduling-issues.md
@@ -0,0 +1,83 @@
+# Proposed Scheduling Enhancements for Token-Constrained Environments
+
+**January 05, 2026**
+
+These ideas (originally surfaced by Gemini) introduce token-aware and fairness-oriented scheduling on top of the current AgentServer v2.0 message pump. The current pump already uses per-thread queues with configurable breadth-first or depth-first draining, but these suggestions shift focus to **per-agent** fairness and explicit handling of shared LLM API rate limits (TPM/RPM).
+
+They are presented here cleaned up and structured for easier discussion and potential adoption.
+
+### Per-Agent Buffer Pools (Targeted Fairness)
+
+**Concept**  
+Instead of a single global queue or purely thread-based queues, each registered agent (especially LLM-based listeners) gets its own dedicated message buffer.
+
+**Benefit**  
+- Guarantees "system attention" round-robin across all agents.  
+- Prevents a high-volume agent (e.g., a central researcher or coordinator) from starving others.  
+- Caps the number of active queues to the number of agents rather than potentially unbounded threads.
+
+### Token-Aware Weighted Deficit Round Robin (TA-WDRR)
+
+**Concept**  
+Treat the provider's Tokens Per Minute (TPM) limit as a shared "power budget". Each agent maintains a deficit counter that accumulates each scheduling round.
+
+**Logic**  
+1. The pump looks at the next message in an agent's buffer.  
+2. It estimates the token cost of that message (prompt + max_tokens).  
+3. If the cost exceeds the agent's current deficit **or** the remaining global budget, skip that agent and try another.  
+4. Select a smaller job from another agent that fits the remaining budget "bin".
+
+**Benefit**  
+Maximizes overall throughput by opportunistically filling small budget gaps with lightweight tasks while large reasoning jobs wait for the next budget refill.
+
+### Adaptive Congestion Control (Servo Loop)
+
+**Concept**  
+A feedback control system that reacts to 429 rate-limit errors by dynamically adjusting the pump's assumed TPM budget.
+
+**Mechanism**  
+- Uses Additive Increase Multiplicative Decrease (AIMD).  
+- On rate-limit error → immediately halve the local TPM budget.  
+- On successful calls → gradually increase the budget to probe the provider's true capacity.
+
+**Benefit**  
+Avoids "thundering herd" retry storms and smoothly converges to the maximum sustainable rate without manual tuning.
+
+### Feedforward Prep-Side Token Estimation
+
+**Concept**  
+During early message preparation (repair, validation, deserialization), pre-compute and tag each message with an `estimated_tokens` attribute.
+
+**Logic**  
+The scheduler can use this tag to make informed decisions **before** handing the message to the LLM abstraction layer, avoiding late failures.
+
+**Benefit**  
+Enables predictive skipping/reordering without waiting for the API call to fail.
+
+### Context-Isolated Memory via Dot-Notation Thread Paths
+
+**Concept**  
+Use the full hierarchical thread path (e.g., `sess-abcd1234.researcher.search.calc`) as the unique key for per-conversation memory/state.
+
+**Logic**  
+Even when multiple threads call the same shared tool/agent, their histories and any "memory button" state remain strictly partitioned by the thread path.
+
+**Benefit**  
+- Prevents context poisoning across parallel branches.  
+- Allows stateless, horizontally scalable agents while preserving private conversation continuity.
+
+### Key Metrics for Monitoring ("Speedometer")
+
+| Metric                  | Description                                                                 |
+|-------------------------|-----------------------------------------------------------------------------|
+| TPM (Tokens Per Minute) | Real-time rolling window of input + output tokens across the organism.      |
+| RPM (Requests Per Minute) | Count of individual API calls to avoid separate request-rate throttling.   |
+| Reservation Variance    | Difference between reserved max_tokens and actual consumption; used to "refund" budget in real time. |
+
+These enhancements would layer naturally on top of the existing per-thread queue model:
+
+- Threads remain the unit of conversation memory and hierarchical tracing (unchanged).  
+- Agents become the unit of scheduling fairness and token budgeting.  
+- The dispatcher loop could select the next **agent** to service (round-robin or weighted), then drain the highest-priority thread queue belonging to that agent, applying the token-aware checks.
+
+This keeps the current thread-oblivious, provenance-preserving design while adding production-grade rate-limit resilience and fairness for LLM-heavy workloads. Worth considering for a future v2.1 scheduling module.
\ No newline at end of file