fixing docs

2026-01-06 19:23:40 -08:00 · 2026-01-06 19:23:40 -08:00 · 2e8ed88e42
commit 2e8ed88e42
parent ed66b16821
8 changed files with 735 additions and 18 deletions
--- a/README.md
+++ b/README.md
@ -1,7 +1,7 @@
-# AgentServer — The Living Substrate (v2.0)
+# AgentServer — The Living Substrate (v2.1)
 ***"It just works... safely."***

-**January 03, 2026**  
+**January 06, 2026**  
 **Architecture: Autonomous Schema-Driven, Turing-Complete Multi-Agent Organism**

 ## The Rant
@ -15,15 +15,15 @@ This project chooses XML deliberately. The organism enforces contracts exactly (
 Read the full rant [here](docs/why-not-json.md) for the history, pitfalls, and why XML wins permanently.

 ## What It Is
-AgentServer is a production-ready substrate for the `xml-pipeline` nervous system. Version 2.0 stabilizes the design around exact XSD validation, typed dataclass handlers, mandatory hierarchical threading, and strict out-of-band privileged control.
+AgentServer is a production-ready substrate for the `xml-pipeline` nervous system. Version 2.1 evolves the design around parallel per-listener pipelines, true concurrent broadcast, opaque UUID threading for privacy, and blind agent self-iteration—all while preserving strict validation and handler purity.

-See [Core Architectural Principles](docs/core-principles-v2.0.md) for the single canonical source of truth.
+See [Core Architectural Principles](docs/core-principles-v2.1.md) for the single canonical source of truth.

 ## Core Philosophy
 - **Autonomous DNA:** Listeners declare their contract via `@xmlify` dataclasses; the organism auto-generates XSDs, examples, and tool prompts.
 - **Schema-Locked Intelligence:** Payloads validated directly against XSD (lxml) → deserialized to typed instances → pure handlers.
 - **Multi-Response Tolerance:** Handlers return raw bytes; bus wraps in `<dummy></dummy>` and extracts multiple payloads (perfect for parallel tool calls or dirty LLM output).
- **Computational Sovereignty:** Turing-complete via self-calls, subthreading primitives, and visible reasoning — all bounded by thread hierarchy and local-only control.
+- **Computational Sovereignty:** Turing-complete via blind self-calls, subthreading primitives, concurrent broadcast, and visible reasoning — all bounded by private thread hierarchy and local-only control.

 ## Developer Experience — Create a Listener in 12 Lines
 **No manual schemas. No brittle JSON conventions. No hand-written prompts.**  
@ -59,19 +59,23 @@ Unlike rigid platforms requiring custom mappings or fragile item structures, thi
 ### 1. The Autonomous Schema Layer
 - Dataclass → cached XSD + example + rich tool prompt (mandatory description + field docs).
 - Namespaces: `https://xml-pipeline.org/ns/<category>/<name>/v1` (served live via domain for discoverability).
+- Multiple listeners per root tag supported (broadcast parallelism).

 ### 2. Thread-Based Lifecycle & Reasoning
- Mandatory `<thread/>` with hierarchical IDs for reliable subthreading and audit trails.
- LLM agents reason via open self-calls and optional `<todo-until/>`.
+- Opaque `<thread/>` UUIDs with private hierarchical path registry for reliable subthreading, audit trails, and topology privacy.
+- LLM agents use unique root tags for blind self-iteration (no name knowledge or `<to/>` needed).
+- Agents reason via open self-calls, multi-payload parallelism, and optional `<todo-until/>` scaffolding in visible text.
 - All thought steps visible as messages — no hidden state.

 ### 3. Message Pump
- Single linear pipeline with repair, C14N, XSD validation, deserialization, handler execution, and multi-payload extraction.
- Supports clean tools and forgiving LLM streams alike.
- Thread-base message queue with bounded memory.
+- Parallel preprocessing pipelines (one per listener) with central async pump orchestration.
+- True concurrency: pipeline tasks parallel, broadcast handlers via asyncio.gather.
+- Single linear flow per pipeline with repair, C14N, XSD validation, deserialization, handler execution, and multi-payload extraction.
+- Supports clean tools, forgiving LLM streams, and natural broadcast alike.
+- Thread-based message queue with bounded memory and fair scheduling.

 ### 4. Structural Control
- Bootstrap from `organism.yaml`.
+- Bootstrap from `organism.yaml` (including unique root enforcement for agents).
 - Runtime changes (hot-reload, add/remove listeners) via local-only OOB channel (localhost WSS or Unix socket — GUI-ready).
 - Main bus oblivious to privileged ops.

@ -86,7 +90,7 @@ Unlike rigid platforms requiring custom mappings or fragile item structures, thi
 - **Format:** Exclusive C14N XML (wire sovereign).

 ## Why This Matters
-AgentServer v2.0 is a bounded, auditable, owner-controlled organism where the **XSD is the security**, the **thread is the memory**, and the **OOB channel is the sovereignty**.
+AgentServer v2.1 is a bounded, auditable, owner-controlled organism where the **XSD is the security**, the **private thread registry is the memory**, and the **OOB channel is the sovereignty**.

 One port. Many bounded minds. Autonomous yet obedient evolution. 🚀

--- a/README_v2.0.md
+++ b/README_v2.0.md
@ -0,0 +1,94 @@
+# AgentServer — The Living Substrate (v2.0)
+***"It just works... safely."***
+
+**January 03, 2026**  
+**Architecture: Autonomous Schema-Driven, Turing-Complete Multi-Agent Organism**
+
+## The Rant
+**Why XML?**  
+[Why not JSON?](docs/why-not-json.md)
+
+XML is the sovereign wire format — standards-based, self-describing, attack-resistant, and evolvable without drift. JSON was a quick hack that escaped into the wild and became the default for everything, including AI tool calling, where its brittleness causes endless prompt surgery and validation headaches.
+
+This project chooses XML deliberately. The organism enforces contracts exactly (XSD validation, no transcription bugs), tolerates dirty streams (repair + dummy extraction), and keeps reasoning visible. No fragile conventions. No escaping hell. Just bounded, auditable computation.
+
+Read the full rant [here](docs/why-not-json.md) for the history, pitfalls, and why XML wins permanently.
+
+## What It Is
+AgentServer is a production-ready substrate for the `xml-pipeline` nervous system. Version 2.0 stabilizes the design around exact XSD validation, typed dataclass handlers, mandatory hierarchical threading, and strict out-of-band privileged control.
+
+See [Core Architectural Principles](docs/core-principles-v2.0.md) for the single canonical source of truth.
+
+## Core Philosophy
+- **Autonomous DNA:** Listeners declare their contract via `@xmlify` dataclasses; the organism auto-generates XSDs, examples, and tool prompts.
+- **Schema-Locked Intelligence:** Payloads validated directly against XSD (lxml) → deserialized to typed instances → pure handlers.
+- **Multi-Response Tolerance:** Handlers return raw bytes; bus wraps in `<dummy></dummy>` and extracts multiple payloads (perfect for parallel tool calls or dirty LLM output).
+- **Computational Sovereignty:** Turing-complete via self-calls, subthreading primitives, and visible reasoning — all bounded by thread hierarchy and local-only control.
+
+## Developer Experience — Create a Listener in 12 Lines
+**No manual schemas. No brittle JSON conventions. No hand-written prompts.**  
+Just declare a dataclass contract and a one-line human description. The organism handles validation, XSD, examples, and tool prompts automatically.
+
+```python
+from xmlable import xmlify
+from dataclasses import dataclass
+from xml_pipeline import Listener, bus  # bus is the global MessageBus
+
+@xmlify
+@dataclass
+class AddPayload:
+    a: int
+    b: int
+
+def add_handler(payload: AddPayload) -> bytes:
+    result = payload.a + payload.b
+    return f"<result>{result}</result>".encode("utf-8")
+
+Listener(
+    payload_class=AddPayload,
+    handler=add_handler,
+    name="calculator.add",
+    description="Adds two integers and returns their sum."
+).register()  # ← Boom: XSD, example, prompt auto-generated + registered
+```
+
+The organism now speaks `<add>` — fully validated, typed, and discoverable.<br/>
+Unlike rigid platforms requiring custom mappings or fragile item structures, this is pure Python — typed, testable, and sovereign.
+
+## Key Features
+### 1. The Autonomous Schema Layer
+- Dataclass → cached XSD + example + rich tool prompt (mandatory description + field docs).
+- Namespaces: `https://xml-pipeline.org/ns/<category>/<name>/v1` (served live via domain for discoverability).
+
+### 2. Thread-Based Lifecycle & Reasoning
+- Mandatory `<thread/>` with hierarchical IDs for reliable subthreading and audit trails.
+- LLM agents reason via open self-calls and optional `<todo-until/>`.
+- All thought steps visible as messages — no hidden state.
+
+### 3. Message Pump
+- Single linear pipeline with repair, C14N, XSD validation, deserialization, handler execution, and multi-payload extraction.
+- Supports clean tools and forgiving LLM streams alike.
+- Thread-base message queue with bounded memory.
+
+### 4. Structural Control
+- Bootstrap from `organism.yaml`.
+- Runtime changes (hot-reload, add/remove listeners) via local-only OOB channel (localhost WSS or Unix socket — GUI-ready).
+- Main bus oblivious to privileged ops.
+
+### 5. Federation & Introspection
+- YAML-declared gateways with trusted keys.
+- Controlled meta queries (schema/example/prompt/capability list).
+
+## Technical Stack
+- **Validation & Parsing:** lxml (XSD, C14N, repair) + xmlable (round-trip).
+- **Protocol:** Mandatory WSS (TLS) + TOTP on main port.
+- **Identity:** Ed25519 (signing, federation, privileged).
+- **Format:** Exclusive C14N XML (wire sovereign).
+
+## Why This Matters
+AgentServer v2.0 is a bounded, auditable, owner-controlled organism where the **XSD is the security**, the **thread is the memory**, and the **OOB channel is the sovereignty**.
+
+One port. Many bounded minds. Autonomous yet obedient evolution. 🚀
+
+---
+*XML wins. Safely. Permanently.*
--- a/docs/core-principles-v2.0.md
+++ b/docs/core-principles-v2.0.md
@ -17,8 +17,8 @@ These principles are the single canonical source of truth for the project. All d
 - On response generation (after handler execution and multi-payload extraction):
  - The dispatcher injects <from> using the executing listener's registered name (e.g., "calculator.add" or "researcher").
  - For meta/primitive responses: <from> is injected as "core".
- <thread> is inherited from the incoming message (or assigned/updated for primitives like spawn-thread).
- <to> remains optional and rarely used.
+- &ltthread&gt is inherited from the incoming message (or assigned/updated for primitives like spawn-thread).
+- &ltto&gt remains optional and rarely used.
 - This ensures every enveloped message has a trustworthy, auditable <from> without handler involvement, preventing spoofing and keeping capability code minimal/testable.

 ## Configuration & Composition
@ -49,6 +49,8 @@ These principles are the single canonical source of truth for the project. All d
  - Each extracted payload wrapped in separate response envelope (inherits thread/from, optional new subthread if primitive used)
  - Enveloped responses buffered and sent sequentially
 - Supports single clean response, multi-payload emission (parallel tools/thoughts), and dirty LLM output tolerance.
+- Message pump tracks token budgets per agent and thread, enforcing token limits and preventing abuse. The LLM abstraction layer informs the message bus on the actual token usage.
+- Message pump uses asynchronous non-blocking I/O for maximum throughput.

 ## Reasoning & Iteration
 - LLM agents iterate via open self-calls (same root tag, same thread ID).
@ -95,4 +97,7 @@ These principles are the single canonical source of truth for the project. All d
 - Ensures isolation across conversations, automatic cleanup on idle, and minimal mutable state.
 - Handler closes over or receives UUID for access — still oblivious to readable path.

+## Resource Stewardship 
+- The Message Pump ensures fair execution and prevents "Paperclip" runaway scenarios via internal Thread-Level Scheduling. Every thread is subject to Token-Rate Monitoring and Fair-Share Queuing, ensuring that a high-volume agent (like a deep-thinking LLM) cannot block high-priority system events or starve simpler organs (like tools).
+
 These principles are now locked. All existing docs will be updated to match this file exactly. Future changes require explicit discussion and amendment here first.
--- a/docs/core-principles-v2.1.md
+++ b/docs/core-principles-v2.1.md
@ -0,0 +1,112 @@
+# AgentServer v2.1 — Core Architectural Principles
+**January 06, 2026**  
+**Architecture: Autonomous Schema-Driven, Turing-Complete Multi-Agent Organism**
+
+These principles are the single canonical source of truth for the project. All documentation, code, and future decisions must align with this file. This version incorporates Message Pump v2.1 parallelism and refines agent iteration patterns for blind, name-oblivious self-routing.
+
+## Identity & Communication
+- All traffic uses the universal `<message>` envelope defined in `envelope.xsd` (namespace `https://xml-pipeline.org/ns/envelope/v1`).
+- Mandatory `<from/>` and `<thread/>` (opaque UUID strings; system privately maps to hierarchical paths for subthreading and audit trails).
+- Optional `<to/>` (for rare direct routing; most flows use payload root tag with broadcast semantics).
+- Exclusive C14N on ingress and egress.
+- Malformed XML repaired on ingress; repairs logged in `<huh/>` metadata.
+
+## Identity Injection & Handler Purity
+- Handlers are pure, stateless functions with no knowledge of routing, thread context, their own registered name, or organism topology.
+- On ingress (external or gateway messages): `<from>` is provided and authenticated by the client/gateway (enforced by envelope validation).
+- On response generation (after handler execution and multi-payload extraction):
+  - The message pump injects `<from>` using the executing listener's registered name (e.g., "calculator.add" or "researcher").
+  - For meta/primitive responses: `<from>` is injected as "core".
+- `<thread>` is inherited from the incoming message (or assigned/updated for primitives like spawn-thread).
+- `<to>` remains optional and rarely used.
+- This ensures every enveloped message has a trustworthy, auditable `<from>` without handler involvement, preventing spoofing and keeping capability code minimal/testable.
+
+## Configuration & Composition
+- YAML file (`organism.yaml`) is the bootstrap source of truth, loaded at startup.
+- Defines initial listeners, agents, gateways, meta privileges, OOB channel configuration, and routing table (including multiple listeners per root tag).
+- LLM-based agents must use unique root tags (enforced on registration/hot-reload) to enable blind self-iteration.
+- Runtime structural changes (add/remove listeners, rewire agents, etc.) via local-only privileged commands on the dedicated OOB channel (hot-reload capability).
+- No remote or unprivileged structural changes ever.
+
+## Autonomous Schema Layer
+- Listeners defined by `@xmlify`-decorated dataclass (payload contract) + pure handler function.
+- Mandatory human-readable description string (short "what this does" blurb for tool prompt lead-in).
+- Registration (at startup or via hot-reload) automatically generates:
+  - XSD cached on disk (`schemas/<name>/v1.xsd`)
+  - Example XML
+  - Tool description prompt fragment (includes description, params with field docs if present, example input)
+- All capability namespaces under `https://xml-pipeline.org/ns/<category>/<name>/v1`.
+- Root element derived from payload class name (lowercase) or explicit.
+- Multiple listeners may register for the same root tag (enabling broadcast parallelism); LLM agents require unique root tags.
+
+## Message Pump
+- Parallel preprocessing pipelines (one per registered listener): ingress → repair → C14N → envelope validation → payload extraction → XSD validation → deserialization → error injection on failure.
+- Central async message pump orchestrates:
+  - Gathering ready messages from pipeline outputs
+  - Routing lookup: direct (`<to/>`) or broadcast (all listeners for root tag; unique roots naturally self-route for agents)
+  - Launching concurrent thin dispatchers (`await handler(msg)`)
+  - Response processing: multi-payload extraction (dummy wrap → parse → extract), envelope creation with `<from>` injection, re-injection to target pipelines
+- Thin, stateless dispatcher: pure async delivery mechanism with no loops or orchestration.
+- Supports true parallelism: pipeline preprocessing concurrent, broadcast handlers concurrent via asyncio.gather.
+- Validation failures inject `<huh>` error elements (LLM-friendly self-correction).
+- Message pump tracks token budgets per agent and thread, enforcing limits and preventing abuse. The LLM abstraction layer informs the message bus on actual token usage.
+- Message pump uses asynchronous non-blocking I/O for maximum throughput, with provisions for concurrency limits, fair scheduling, and backpressure.
+
+## Reasoning & Iteration
+- LLM agents iterate via blind self-calls: with unique root tags, emitting payloads using their own root tag automatically routes back to themselves—no `<to/>`, name knowledge, or special primitives required.
+- This enables natural looping, multi-step continuation, and parallelism via multi-payload emission (e.g., simultaneous tool calls or branched reasoning).
+- Conversation thread = complete memory and audit trail (all messages logged).
+- Subthreading natively supported via private hierarchical path registry (system appends/prunes listener names on delegation/responses; agents see only opaque UUIDs).
+- Optional structured constructs like `<todo-until/>` are encouraged in visible reasoning text (e.g., inside `<answer>`) for planning clarity and auditability but are not system-interpreted.
+- Broadcast enables natural parallelism (e.g., simultaneous tool calls) without agent-managed fan-out.
+- Thread path privacy: Agents remain oblivious to topology, preventing leakage or probing.
+- No hidden loops or state machines; all reasoning steps are visible messages.
+
+## Security & Sovereignty
+- Privileged messages (per `privileged-msg.xsd`) handled exclusively on dedicated OOB channel.
+- OOB channel bound to localhost by default (safe for local GUI); separate port/socket from main bus.
+- Main message pump and dispatcher oblivious to privileged operations — no routing or handling for privileged roots.
+- Remote privileged attempts impossible (channel not exposed); any leak to main port logged as security event and dropped.
+- Ed25519 identity key used for envelope signing, federation auth, and privileged command verification.
+- No agent may modify organism structure, register listeners, or access host resources beyond declared scope.
+- Opaque thread UUIDs + private path registry prevent topology disclosure.
+- “No Paperclippers” manifesto injected as first system message for every LLM-based listener.
+
+## Federation
+- Gateways declared in YAML with trusted remote public key.
+- Remote tools referenced by gateway name in agent tool lists.
+- Regular messages flow bidirectionally; privileged messages never forwarded or accepted.
+
+## Introspection (Meta)
+- Controlled via YAML flags (`allow_list_capabilities`, `allow_schema_requests`, etc.).
+- Supports `request-schema`, `request-example`, `request-prompt`, `list-capabilities`.
+- Remote meta queries optionally allowed per YAML (federation peers).
+
+## Technical Constraints
+- Mandatory WSS (TLS) + TOTP on main port.
+- OOB channel WSS or Unix socket, localhost-default.
+- Internal: lxml trees → XSD validation → xmlable deserialization → dataclass → handler → bytes → dummy extraction → multi-envelope re-injection.
+- Single process, async non-blocking.
+- XML is the sovereign wire format; everything else is implementation detail.
+
+## Scheduled Computation
+- Timers and delays implemented as normal listeners using async sleeps.
+- Caller idles naturally; wakeup messages bubble back via standard tracing.
+- Enables recurrent tasks (e.g., periodic monitoring) without blocking or external schedulers.
+
+## Bounded Stateful Listeners
+- Pure tools remain stateless.
+- Stateful capabilities (e.g., calculator memory, game state) store data per thread path UUID.
+- Ensures isolation across conversations, automatic cleanup on idle, and minimal mutable state.
+- Handler closes over or receives UUID for access — still oblivious to readable path.
+
+## Resource Stewardship 
+- The Message Pump ensures fair execution and prevents "Paperclip" runaway scenarios via Thread-Level Scheduling and Concurrency Controls. Every thread is subject to Token-Rate Monitoring and Fair-Share Queuing, ensuring that a high-volume agent cannot block high-priority events or starve simpler organs.
+
+These principles are now locked for v2.1. The Message Pump v2.1 specification remains the canonical detail for pump behavior. Future changes require explicit discussion and amendment here first.
+
+---
+
+This integrates the blind self-iteration pattern cleanly—no contradictions, stronger obliviousness, and explicit guidance on `<todo-until/>`. The unique-root enforcement for agents is called out in Configuration and Schema layers.
+
+Ready to roll with this as canonical. If you want any final phrasing tweaks or to add YAML examples, just say. 🚀
--- a/docs/message-pump-v1.0.md
+++ b/docs/message-pump-v1.0.md
@ -0,0 +1,60 @@
+# Message Pump — End-to-End Flow (v2.0)
+
+The AgentServer message pump processes individual messages through a single, linear, attack-resistant pipeline. The outer dispatcher runs a continuous async loop, draining per-thread message buffers (queues) until empty — enabling persistent, branched reasoning without artificial limits.
+
+
+```mermaid
+flowchart TD
+    A["WebSocket Ingress\n(enqueue to thread buffer)"] --> B["Dispatcher Loop:\nSelect next message\n(per thread_scheduling strategy)"]
+    B --> C["Repair + Exclusive C14N"]
+    C --> D["Envelope Validation (lxml)"]
+    D --> E["Extract Payload Tree"]
+    E --> F{"Payload Namespace?"}
+    F -->|meta/v1| G["Core Meta Handler\n(introspection & reserved primitives)"]
+    F -->|capability| H["Route by (namespace, root)"]
+    H --> I["Validate Payload vs Listener XSD (lxml)"]
+    I --> J["Deserialize to Dataclass Instance (xmlable)"]
+    J --> K["Call handler(instance) → raw bytes"]
+    K --> L["Wrap bytes in <dummy></dummy>"]
+    L --> M["Repair/Parse → Extract all top-level payloads"]
+    M --> N["Wrap each payload in separate envelope\n(enqueue to target thread buffers)"]
+    G --> N
+    N --> O["Exclusive C14N + Sign"]
+    O --> P["WebSocket Egress\n(sequential per connection)"]
+    P --> B["Continue dispatcher loop if buffers non-empty"]
+```
+
+## Detailed Stages (Per-Message)
+
+1. **Ingress/Enqueue**: Raw bytes → repair → preliminary tree → enqueue to target thread buffer.
+
+2. **Dispatcher Loop**: Single async non-blocking loop selects next message from per-thread queues (breadth-first default for fairness).
+
+3. **Processing**:
+   - Full repair + C14N.
+   - Envelope validation.
+   - Routing decision:
+     - **Meta Branch** (`https://xml-pipeline.org/ns/meta/v1` namespace): Handled directly by privileged core handler (no listener lookup or XSD validation needed).
+       - Purpose: Introspection and reserved organism primitives.
+       - Examples:
+         - `request-schema`, `request-example`, `request-prompt`, `list-capabilities` (returns XSD bytes, example XML, prompt fragment, or capability list).
+         - Thread primitives like `spawn-thread`, `clear-context`.
+       - Privileged: Controlled via YAML `meta` flags (e.g., `allow_schema_requests: "admin"` or "none"). Remote queries optional.
+       - Why separate: Faster, safer (no user listener involved), topology privacy preserved.
+     - Capability namespace → normal listener route (XSD validation + deserialization).
+
+   - Typed handler call → raw bytes.
+
+4. **Response Handling**:
+   - Dummy wrap → extract multi-payloads.
+   - Each enqueued as new message(s) in appropriate thread buffer(s).
+
+5. **Egress**: Dequeue → C14N/sign → send.
+
+## Key Properties
+- Continuous looping until all thread buffers empty — natural iteration/subthreading without one-shot constraints.
+- Multi-payload enqueues enable parallel branches/thoughts.
+- Scheduling balances deep dives vs fair exploration.
+- Attack-resistant at every step.
+
+XML in → queued → processed → multi-out → re-queued. Loops forever if needed. Safely. Permanently.
--- a/docs/message-pump-v2.0.md
+++ b/docs/message-pump-v2.0.md
@ -47,8 +47,7 @@ flowchart TD
    subgraph MessagePump
    subgraph Init
    start([Start])
-    raw[/Optional<br>Raw Bytes/]
-    wrapstart["Wrap<br>&ltstart&gt{...}&lt/start&gt"]
+    wrapstart["Boot Msg<br>&ltmessage&gt{...}&lt/message&gt"]
    end
    enq1([QUEUE 1])
    rawwaiting{Raw<br>Msg<br>Waiting?}
@ -79,7 +78,7 @@ flowchart TD
    end
    end
    
-    start --> raw --> wrapstart --> enq1 --> rawwaiting 
+    start --> wrapstart --> enq1 --> rawwaiting 
    rawwaiting --> |NO| waitRaw
    rawwaiting ---> |YES| extract
    extract --> split --> foreach
@ -96,6 +95,16 @@ flowchart TD
 ```
 ## Detailed Stages (Per-Message)

+### Boot Message
+   - Since all agents are listeners, there would be no way for a client to initiate a message pump.
+   - The boot message is a dummy message that is enqueued to the root thread buffer. Any listener may chose to register a root tag for it.
+   - The root thread buffer is the only one that is drained by the dispatcher loop.
+   - if a listener (like a human agent) is registered for the boot message, it will receive the boot message and then async await for keyboard input.
+### Queue 1
+   - The first buffer holds raw unprocessed messages from the network.
+### Queue 2
+   - The second buffer holds messages that have been processed and are ready to be sent back to the network.
+   - 
 1. **Ingress/Enqueue**: Raw bytes → repair → preliminary tree → enqueue to target thread buffer.

 2. **Dispatcher Loop**: Single async non-blocking loop selects next message from per-thread queues (breadth-first default for fairness).
--- a/docs/message-pump-v2.1.md
+++ b/docs/message-pump-v2.1.md
@ -0,0 +1,350 @@
+
+# Message Pump Architecture v2.1
+**January 06, 2026**  
+**AgentServer: Pipeline-per-Listener + Dispatcher Pattern**
+
+This document is the canonical specification for the AgentServer message pump. All implementation must conform to this architecture.
+
+---
+
+## Core Pattern: Dictionary of Pipelines → Message Pump → Dispatcher
+
+The message pump implements a three-stage architecture:
+
+1. **Pipeline Stage**: Parallel preprocessing pipelines (one per registered listener) that sanitize, validate, and prepare messages
+2. **Message Pump**: Async event loop that orchestrates concurrent message processing, manages scheduling and backpressure
+3. **Dispatcher**: Simple async function that delivers messages to handlers and awaits responses
+
+```
+Raw Message Ingress
+    ↓
+Pipeline Lookup & Assignment
+    ↓
+[Pipeline 1]  [Pipeline 2]  [Pipeline N]  (parallel preprocessing)
+    ↓              ↓              ↓
+Pipeline Output Queues (processed messages ready for dispatch)
+    ↓
+Message Pump Event Loop
+  - Gathers ready messages
+  - Launches concurrent dispatcher(msg, handler) invocations
+  - Manages concurrency/scheduling/backpressure
+    ↓
+[dispatcher()]  [dispatcher()]  [dispatcher()]  (concurrent, async)
+    ↓                ↓                ↓
+Handler Execution → await Response
+    ↓
+Message Pump Response Processing
+  - Extract multi-payloads (dummy wrap → parse → extract)
+  - Create envelopes with <from> injection
+  - Re-inject to appropriate pipelines
+    ↓
+Pipeline Re-injection (cycle continues)
+```
+
+---
+
+## Pipeline Architecture
+
+### Pipeline Registration
+
+At boot (or hot-reload), each listener registration creates:
+- Dedicated preprocessing pipeline instance
+- Entry in routing table: `Dict[root_tag, Dict[listener_name, Pipeline]]`
+- Cached XSD schema (derived from `@xmlify` dataclass)
+- Example XML and tool description fragments
+
+**Example Registration**:
+```python
+@xmlify
+@dataclass
+class CalculatorAdd:
+    """Add two numbers and return the sum."""
+    a: float
+    b: float
+
+# Creates:
+# - Pipeline instance for "calculator/add"
+# - XSD cached at schemas/calculator/add/v1.xsd
+# - Routing entry: pipelines["add"]["calculator"] = pipeline_instance
+```
+
+### Pipeline Structure
+
+Each pipeline is identical in structure but operates on messages bound for its specific listener. A pipeline consists of an ordered array of processing tasks:
+
+**Standard Task Sequence**:
+1. **Repair**: Fix malformed XML (lxml recovery mode)
+2. **Canonicalization (C14N)**: Normalize whitespace, attributes, namespaces
+3. **Envelope Validation**: Verify against `envelope.xsd`
+4. **Payload Extraction**: Extract payload from `<message>` wrapper
+5. **XSD Validation**: Validate payload against listener's cached schema
+6. **Deserialization**: Convert XML to typed `@dataclass` instance via `xmlable.from_xml`
+7. **Error Injection**: On failure, inject `<huh>` error tag instead of discarding
+
+**Error Handling Philosophy**:
+- Early pipelines (repair, C14N): May discard truly corrupt messages
+- Later stages (validation): Inject `<huh>error description</huh>` into response
+- LLMs see their errors and can self-correct
+- Prevents silent failures while maintaining flow
+
+### System Pipeline
+
+A special system pipeline handles messages not bound for user listeners:
+- Processes `<boot/>` messages (startup trigger for human/keyboard listeners)
+- Handles system-generated error responses
+- Uses same task sequence but no XSD validation step
+
+---
+
+## Dispatcher Architecture
+
+### Dispatcher Responsibilities
+
+The dispatcher is a **simple async function** that delivers a message to a handler and awaits the response:
+
+```python
+async def dispatcher(msg, handler):
+    """Thin async routing layer - delivers message and awaits response"""
+    response = await handler(msg)
+    return response
+```
+
+**Critical Property**: The dispatcher itself has no loop, no queue management, no concurrency control. It's a pure async delivery mechanism. All orchestration happens in the message pump.
+
+### Routing Logic
+
+**Lookup Key**: `(root_tag, listener_name)` from pipeline's registered listener
+
+**Delivery Rules**:
+- **`<to/>` present**: Direct delivery to specific listener at `root_tag/listener_name`
+- **`<to/>` absent**: Broadcast to ALL listeners registered for `root_tag`
+
+**Broadcast Semantics**:
+- All handlers for a given root tag execute concurrently (via concurrent task launch).
+- Responses are processed progressively as each handler completes (streaming/as-completed semantics).
+- Each response is fully handled independently (multi-payload extraction, enveloping, re-injection).
+- Responses bubble up in completion order (nondeterministic); no waiting for the full group.
+- Ideal for racing parallel tools; agents handle any needed synchronization.
+
+**Example**: Message with root tag `<search>` and no `<to/>`:
+```
+Pump sees: root_tag="search", to=None
+Lookup: pipelines["search"] → {"google": pipeline_1, "bing": pipeline_2}
+Execute:
+  - Launch concurrent dispatchers for all handlers
+  - Monitor tasks via asyncio.as_completed
+  - As each completes: extract payloads, envelope, re-inject immediately
+  - No batch wait—fast responses bubble first
+```
+
+---
+
+## Message Pump Event Loop
+
+The message pump is the orchestration layer that manages concurrency, scheduling, and message flow:
+
+```python
+async def message_pump():
+    """Main event loop - orchestrates concurrent message processing"""
+    while True:
+        # Gather all ready messages from pipeline outputs
+        ready_messages = await gather_ready_messages_from_pipelines()
+        
+        # For each message, lookup handler(s) and launch dispatcher(s)
+        tasks = []
+        for msg in ready_messages:
+            handlers = lookup_handlers(msg)  # may return multiple for broadcast
+            for handler in handlers:
+                task = asyncio.create_task(dispatcher(msg, handler))
+                tasks.append(task)
+        
+        # Process responses as they complete (streaming)
+        for completed_task in asyncio.as_completed(tasks):
+            response = await completed_task
+            # Extract multi-payloads (dummy wrap → parse → extract)
+            payloads = extract_payloads(response)
+            
+            # Wrap each in envelope, inject <from>, re-route to pipelines
+            for payload in payloads:
+                enveloped = create_envelope(payload, response.context)
+                await send_to_pipeline(enveloped)
+```
+
+**Key Responsibilities**:
+1. **Concurrency Control**: Decides how many dispatchers to launch simultaneously
+2. **Fair Scheduling**: Can implement priority queues, round-robin, or other fairness policies
+3. **Backpressure**: Monitors pipeline queue depths, throttles if needed
+4. **Response Handling**: Extracts multi-payloads and re-injects each response as soon as its handler completes (progressive streaming for broadcasts)
+
+**Concurrency Model**: Unbounded concurrent dispatchers; responses stream independently. Future enhancements include per-listener semaphores, global limits, and token-rate throttling.
+
+---
+
+## Message Flow Example: Complete Cycle (Broadcast <search>)
+
+1. **Ingress**: External `<message><thread>root</thread><search query="weather today"/></message>`
+2. **Pipeline Assignment**: Root tag "search" → multiple pipelines (google, bing)
+3. **Pipeline Processing** (parallel per listener): Repair/C14N/validation/deserialization
+4. **Message Pump**: Gathers ready messages, launches concurrent dispatchers
+5. **Concurrent Handler Execution**:
+   - google_handler completes first (500ms): `<search-result>Sunny, 72°F</search-result>` → processed/bubbled immediately
+   - bing_handler completes second (700ms): `<search-result>Clear skies, 70°F</search-result>` → processed/bubbled next
+   - No waiting—receiver sees results as they arrive
+6. **Response Processing** (progressive): As each completes, extract, envelope with `<from>`, re-inject to target pipeline
+7. **Response Bubbling**: Results route back to parent (e.g., researcher/user) in completion order
+
+---
+
+## Boot Sequence
+
+1. On startup, system generates: `<message><from>system</from><thread>root</thread><boot/></message>`
+2. Sent to system pipeline
+3. Dispatched to ALL listeners registered for `<boot/>` root tag
+4. Human listener can register for `<boot/>` to:
+   - Display welcome message
+   - Await keyboard input
+   - Initiate first real conversation
+
+**Example Human Listener**:
+```python
+@xmlify
+@dataclass  
+class Boot:
+    """System boot notification"""
+    pass
+
+async def human_boot_handler(msg: Boot) -> bytes:
+    print("System ready. Type your message:")
+    user_input = await async_input()
+    return f"<message>{user_input}</message>".encode()
+```
+
+---
+
+## Out-of-Band (OOB) Privileged Messages
+
+### Separation of Concerns
+
+Privileged operations (defined in `privileged-msg.xsd`) operate on a completely separate channel:
+- Dedicated websocket port (or Unix socket)
+- Bound to localhost by default
+- Uses Ed25519 signature verification
+
+**The message pump dispatcher has NO knowledge of privileged messages**:
+- Main dispatcher only routes messages with user/capability payloads
+- Privileged messages like `<add-listener>`, `<remove-listener>`, `<hot-reload>` are handled by separate OOB handler
+- No possibility of privilege escalation via main message flow
+
+**Security Guarantee**: Remote clients cannot send privileged messages (channel not exposed). Even if leaked to main port, dispatcher would fail routing lookup (no pipeline registered for privileged root tags).
+
+---
+
+## Pipeline Optimization & Scheduling
+
+### Pipeline Parallelism
+
+Pipelines process independently and in parallel:
+- Each listener's pipeline can execute simultaneously
+- No shared state between pipelines (XSD schemas are cached read-only)
+- Enables high throughput for multi-listener broadcasts
+
+### Future: Token-Rate Monitoring
+
+Currently not implemented, but architecture supports:
+- Each pipeline tracks tokens processed per minute
+- Dispatcher can throttle high-volume agents
+- Fair-share scheduling to prevent LLM monopolization
+
+**Placeholder**: Token counting will be integrated once LLM abstraction layer is defined.
+
+---
+
+## Configuration & Wiring
+
+### YAML Bootstrap (`organism.yaml`)
+
+Defines initial swarm topology:
+```yaml
+listeners:
+  - name: calculator
+    capability: calculator.add
+    root_tag: add
+    namespace: https://xml-pipeline.org/ns/tools/calculator/v1
+    
+  - name: researcher  
+    capability: llm.researcher
+    root_tag: research-query
+    namespace: https://xml-pipeline.org/ns/agents/researcher/v1
+    tools:
+      - calculator  # researcher can see/call calculator
+      - websearch
+
+  - name: websearch
+    capability: tools.google_search  
+    root_tag: search
+    namespace: https://xml-pipeline.org/ns/tools/websearch/v1
+
+agents:
+  - name: researcher
+    type: llm
+    model: claude-sonnet-4
+    system_prompt: "You are a research assistant..."
+    visible_tools:  # restricts which listeners this agent can call
+      - calculator
+      - websearch
+
+meta:
+  allow_list_capabilities: admin  # or "all", "none"
+  allow_schema_requests: admin
+```
+
+**Key Properties**:
+- Defines initial routing table (`root_tag → listener_name`)
+- Controls visibility (agent A may not know agent B exists)
+- Meta introspection privileges
+- All structural changes require OOB privileged commands (hot-reload)
+
+---
+
+## Summary: Critical Invariants
+
+1. **Pipeline-per-Listener**: Each registered listener has dedicated preprocessing pipeline
+2. **Async Concurrency**: Message pump launches concurrent dispatcher invocations; handlers run in parallel via asyncio
+3. **Stateless Dispatcher**: Dispatcher is a simple async function `(msg, handler) → response`, no loop or state
+4. **Pump Orchestrates**: Message pump event loop controls concurrency, scheduling, backpressure, and response handling
+5. **UUID Privacy**: Thread paths are opaque UUIDs; system maintains actual tree privately
+6. **Error Injection**: Validation failures inject `<huh>` instead of silent discard
+7. **Multi-Payload Extraction**: Handlers may emit multiple payloads; pump extracts, envelopes, and re-injects each
+8. **Broadcast = Streaming Concurrent**: Multiple listeners execute in parallel; responses processed and bubbled as they complete (no group wait)
+9. **OOB Isolation**: Privileged messages never touch main message pump or dispatcher
+10. **Boot Message**: System-generated `<boot/>` enables listener-only architecture
+11. **Stateless Handlers**: All routing, thread context, and identity is managed externally; handlers remain pure
+12. **Parallel Everything**: Pipelines preprocess concurrently, pump launches dispatchers concurrently, responses stream progressively
+
+---
+
+## Next Steps
+
+This document establishes the foundational architecture. Implementation priorities:
+
+1. **Immediate (Echo Chamber Milestone)**:
+   - Implement basic pipeline task sequence (repair → C14N → validate)
+   - Implement sequential dispatcher with simple routing
+   - Basic `<huh>` error injection on validation failure
+   - Boot message generation
+
+2. **Near-Term**:
+   - Multi-payload extraction and re-injection
+   - UUID path registry and privacy enforcement
+   - YAML-driven listener registration
+   - Pipeline parallelism
+
+3. **Future**:
+   - Token-rate monitoring per pipeline
+   - Fair-share dispatcher scheduling
+   - Advanced error recovery strategies
+   - Hot-reload capability via OOB
+
+---
+
+**Status**: This document is now the single source of truth for message pump architecture. All code, diagrams, and decisions must align with this specification.
--- a/docs/token-scheduling-issues.md
+++ b/docs/token-scheduling-issues.md
@ -0,0 +1,83 @@
+# Proposed Scheduling Enhancements for Token-Constrained Environments
+
+**January 05, 2026**
+
+These ideas (originally surfaced by Gemini) introduce token-aware and fairness-oriented scheduling on top of the current AgentServer v2.0 message pump. The current pump already uses per-thread queues with configurable breadth-first or depth-first draining, but these suggestions shift focus to **per-agent** fairness and explicit handling of shared LLM API rate limits (TPM/RPM).
+
+They are presented here cleaned up and structured for easier discussion and potential adoption.
+
+### Per-Agent Buffer Pools (Targeted Fairness)
+
+**Concept**  
+Instead of a single global queue or purely thread-based queues, each registered agent (especially LLM-based listeners) gets its own dedicated message buffer.
+
+**Benefit**  
+- Guarantees "system attention" round-robin across all agents.  
+- Prevents a high-volume agent (e.g., a central researcher or coordinator) from starving others.  
+- Caps the number of active queues to the number of agents rather than potentially unbounded threads.
+
+### Token-Aware Weighted Deficit Round Robin (TA-WDRR)
+
+**Concept**  
+Treat the provider's Tokens Per Minute (TPM) limit as a shared "power budget". Each agent maintains a deficit counter that accumulates each scheduling round.
+
+**Logic**  
+1. The pump looks at the next message in an agent's buffer.  
+2. It estimates the token cost of that message (prompt + max_tokens).  
+3. If the cost exceeds the agent's current deficit **or** the remaining global budget, skip that agent and try another.  
+4. Select a smaller job from another agent that fits the remaining budget "bin".
+
+**Benefit**  
+Maximizes overall throughput by opportunistically filling small budget gaps with lightweight tasks while large reasoning jobs wait for the next budget refill.
+
+### Adaptive Congestion Control (Servo Loop)
+
+**Concept**  
+A feedback control system that reacts to 429 rate-limit errors by dynamically adjusting the pump's assumed TPM budget.
+
+**Mechanism**  
+- Uses Additive Increase Multiplicative Decrease (AIMD).  
+- On rate-limit error → immediately halve the local TPM budget.  
+- On successful calls → gradually increase the budget to probe the provider's true capacity.
+
+**Benefit**  
+Avoids "thundering herd" retry storms and smoothly converges to the maximum sustainable rate without manual tuning.
+
+### Feedforward Prep-Side Token Estimation
+
+**Concept**  
+During early message preparation (repair, validation, deserialization), pre-compute and tag each message with an `estimated_tokens` attribute.
+
+**Logic**  
+The scheduler can use this tag to make informed decisions **before** handing the message to the LLM abstraction layer, avoiding late failures.
+
+**Benefit**  
+Enables predictive skipping/reordering without waiting for the API call to fail.
+
+### Context-Isolated Memory via Dot-Notation Thread Paths
+
+**Concept**  
+Use the full hierarchical thread path (e.g., `sess-abcd1234.researcher.search.calc`) as the unique key for per-conversation memory/state.
+
+**Logic**  
+Even when multiple threads call the same shared tool/agent, their histories and any "memory button" state remain strictly partitioned by the thread path.
+
+**Benefit**  
+- Prevents context poisoning across parallel branches.  
+- Allows stateless, horizontally scalable agents while preserving private conversation continuity.
+
+### Key Metrics for Monitoring ("Speedometer")
+
+| Metric                  | Description                                                                 |
+|-------------------------|-----------------------------------------------------------------------------|
+| TPM (Tokens Per Minute) | Real-time rolling window of input + output tokens across the organism.      |
+| RPM (Requests Per Minute) | Count of individual API calls to avoid separate request-rate throttling.   |
+| Reservation Variance    | Difference between reserved max_tokens and actual consumption; used to "refund" budget in real time. |
+
+These enhancements would layer naturally on top of the existing per-thread queue model:
+
+- Threads remain the unit of conversation memory and hierarchical tracing (unchanged).  
+- Agents become the unit of scheduling fairness and token budgeting.  
+- The dispatcher loop could select the next **agent** to service (round-robin or weighted), then drain the highest-priority thread queue belonging to that agent, applying the token-aware checks.
+
+This keeps the current thread-oblivious, provenance-preserving design while adding production-grade rate-limit resilience and fairness for LLM-heavy workloads. Worth considering for a future v2.1 scheduling module.