fixing docs

This commit is contained in:
dullfig 2026-01-07 20:58:31 -08:00
parent f2758e5c49
commit e314bb01e8
6 changed files with 296 additions and 14 deletions

View file

@ -55,6 +55,54 @@ Listener(
The organism now speaks `<add>` — fully validated, typed, and discoverable.<br/> The organism now speaks `<add>` — fully validated, typed, and discoverable.<br/>
Unlike rigid platforms requiring custom mappings or fragile item structures, this is pure Python — typed, testable, and sovereign. Unlike rigid platforms requiring custom mappings or fragile item structures, this is pure Python — typed, testable, and sovereign.
## Security Model
AgentServer's security is **architectural**, not bolted-on:
### Two Completely Isolated Channels
- **Main Bus**: Standard `<message>` envelope, all traffic undergoes identical validation pipeline regardless of source
- **OOB Channel**: Privileged commands only, different schema, localhost-bound, used for structural changes
### Handler Isolation & Trust Boundary
**Handlers are untrusted code.** Even compromised handlers cannot:
- Forge their identity (sender name captured in coroutine scope before execution)
- Escape thread context (thread UUID captured in coroutine, not handler output)
- Route to arbitrary targets (routing computed from peers list, not handler claims)
- Access other threads' data (opaque UUIDs, private path registry)
- Discover topology (only declared peers visible)
The message pump maintains authoritative metadata in coroutine scope and **never trusts handler output** for security-critical properties.
### Closed-Loop Validation
ALL messages on the main bus undergo identical security processing:
- External ingress: WSS → pipeline → validation
- Handler outputs: bytes → pipeline → validation (same steps!)
- Error messages: generated → pipeline → validation
- System notifications: generated → pipeline → validation
No fast-path bypasses. No "trusted internal" messages. Everything validates.
### Topology Privacy
- Agents see only opaque thread UUIDs, never hierarchical paths
- Private path registry (UUID → `agent.tool.subtool`) maintained by system
- Peers list enforces capability boundaries (no ambient authority)
- Federation gateways are opaque abstractions
### Anti-Paperclip Architecture
- Threads are ephemeral (complete audit trail, then deleted)
- No persistent cross-thread memory primitives
- Token budgets enforce computational bounds
- Thread pruning prevents state accumulation
- All reasoning visible in message history
This architecture ensures:<br>
✅ No privilege escalation (handlers can't forge privileged commands)<br>
✅ No fast-path bypasses (even system-generated messages validate)<br>
✅ Physical separation (privileged and regular traffic cannot mix)<br>
✅ Capability-safe handlers (compromised code still bounded by peers list)<br>
✅ Complete auditability (thread history is ground truth)
## Key Features ## Key Features
### 1. The Autonomous Schema Layer ### 1. The Autonomous Schema Layer
- Dataclass → cached XSD + example + rich tool prompt (mandatory description + field docs). - Dataclass → cached XSD + example + rich tool prompt (mandatory description + field docs).

View file

@ -2,6 +2,18 @@ from dataclasses import dataclass, field
from lxml.etree import Element from lxml.etree import Element
from typing import Any from typing import Any
"""
default_listener_steps = [
repair_step, # raw bytes → repaired bytes
c14n_step, # bytes → lxml Element
envelope_validation_step, # Element → validated Element
payload_extraction_step, # Element → payload Element
xsd_validation_step, # payload Element + cached XSD → validated
deserialization_step, # payload Element → dataclass instance
routing_resolution_step, # attaches target_listeners (or error)
]
"""
@dataclass @dataclass
class HandlerMetadata: class HandlerMetadata:
"""Trustworthy context passed to every handler.""" """Trustworthy context passed to every handler."""

View file

@ -0,0 +1,53 @@
"""
c14n.py Canonicalization step for the full <message> envelope.
After repair, the envelope_tree may have different but semantically equivalent
representations (attribute order, namespace prefixes, whitespace, etc.).
This step produces Exclusive XML Canonicalization (C14N 1.1) bytes that are
identical for equivalent documents essential for validation and signing.
Part of AgentServer v2.1 message pump.
"""
from lxml import etree
from agentserver.message_bus.message_state import MessageState
async def c14n_step(state: MessageState) -> MessageState:
"""
Canonicalize the envelope_tree to Exclusive C14N form.
If repair_step succeeded, this step normalizes the tree so that:
- Validation against envelope.xsd is deterministic
- Future signing/federation comparisons are reliable
On failure, sets state.error and continues (downstream steps will short-circuit).
"""
if state.envelope_tree is None:
state.error = "c14n_step: no envelope_tree (previous repair failed)"
return state
try:
# lxml's tostring with method="c14n" implements Exclusive XML Canonicalization
# (the same form we require on egress)
c14n_bytes = etree.tostring(
state.envelope_tree,
method="c14n", # Exclusive C14N 1.0 (lxml default)
exclusive=True,
with_comments=False, # Comments not part of canonical form
strip_text=False,
)
# Re-parse the canonical bytes to get a clean tree (prefixes normalized, etc.)
# This ensures downstream steps see a consistent document
clean_tree = etree.fromstring(c14n_bytes)
state.envelope_tree = clean_tree
# raw_bytes already cleared by repair_step
except Exception as exc: # pylint: disable=broad-except
state.error = f"c14n_step failed: {exc}"
state.envelope_tree = None
return state

View file

@ -0,0 +1,42 @@
from lxml import etree
from agentserver.message_bus.message_state import MessageState
# lxml parser configured for maximum tolerance + recovery
_RECOVERY_PARSER = etree.XMLParser(
recover=True, # Try to recover from malformed XML
remove_blank_text=True, # Normalize whitespace
resolve_entities=False, # Security: don't resolve external entities
huge_tree=False, # Default is safe
)
async def repair_step(state: MessageState) -> MessageState:
"""
First pipeline step: repair malformed ingress bytes into a recoverable lxml ElementTree.
Takes raw_bytes from ingress (or multi-payload extraction) and attempts to produce
a valid envelope_tree. Uses lxml's recovery mode to tolerate dirty streams.
Always returns a MessageState (even on total failure injects diagnostic error).
"""
if state.raw_bytes is None:
state.error = "repair_step: no raw_bytes available"
return state
try:
# lxml recovery parser turns most garbage into something parseable
tree = etree.fromstring(state.raw_bytes, parser=_RECOVERY_PARSER)
if tree is None:
raise ValueError("Parser returned None — unrecoverable XML")
state.envelope_tree = tree
# Optional: free memory early — raw bytes no longer needed after repair
state.raw_bytes = None
except Exception as exc:
# Even if recovery fails completely, we capture the diagnostic
state.error = f"repair_step failed: {exc}"
# We still set envelope_tree to None so later steps know to short-circuit
state.envelope_tree = None
return state

View file

@ -72,6 +72,40 @@ These principles are the single canonical source of truth for the project. All d
- Opaque thread UUIDs + private path registry prevent topology disclosure. - Opaque thread UUIDs + private path registry prevent topology disclosure.
- “No Paperclippers” manifesto injected as first system message for every LLM-based listener. - “No Paperclippers” manifesto injected as first system message for every LLM-based listener.
### Privileged Operations
- Privileged messages (per `privileged-msg.xsd`) handled exclusively on dedicated OOB channel.
- OOB channel bound to localhost by default (safe for local GUI); separate port/socket from main bus.
- Main message pump and dispatcher oblivious to privileged operations no routing or handling for privileged roots.
- Remote privileged attempts impossible (channel not exposed); any leak to main port logged as security event and dropped.
### Identity & Cryptography
- Ed25519 identity key used for envelope signing, federation auth, and privileged command verification.
- All traffic on main bus uses mandatory WSS (TLS) + TOTP authentication.
### Handler Isolation (NEW)
- **Handlers are untrusted code** running in coroutine sandboxes with minimal context.
- Security-critical metadata (sender identity, thread path, routing) captured in coroutine scope before handler execution.
- Handler output never trusted for identity, routing, or thread context all envelope metadata injected from coroutine-captured state.
- Even compromised handlers cannot forge messages, escape threads, or discover topology beyond declared peers.
### Topology Privacy
- Opaque thread UUIDs prevent topology disclosure to handlers and agents.
- Private path registry maps UUIDs to hierarchical paths (e.g., `agent.tool.subtool`) for routing and audit.
- Agents receive only opaque UUIDs; system maintains authoritative path mapping.
- Peers list enforces capability boundaries: agents can only call declared tools.
### Anti-Paperclip Guarantees
- No persistent cross-thread memory (threads are ephemeral audit trails).
- Token budgets per thread enforce computational bounds.
- Thread pruning on delegation return prevents state accumulation.
- All agent reasoning visible in message history (no hidden state machines).
- "No Paperclippers" manifesto injected as first system message for every LLM-based listener.
### Audit & Forensics
- Complete message history per thread provides full audit trail.
- Privileged introspection (via OOB) can map UUID→path for forensics without exposing to agents.
- All structural changes (hot-reload, listener registration) logged as audit events on main bus.
## Federation ## Federation
- Gateways declared in YAML with trusted remote public key. - Gateways declared in YAML with trusted remote public key.
- Remote tools referenced by gateway name in agent tool lists. - Remote tools referenced by gateway name in agent tool lists.
@ -105,8 +139,97 @@ These principles are the single canonical source of truth for the project. All d
These principles are now locked for v2.1. The Message Pump v2.1 specification remains the canonical detail for pump behavior. Future changes require explicit discussion and amendment here first. These principles are now locked for v2.1. The Message Pump v2.1 specification remains the canonical detail for pump behavior. Future changes require explicit discussion and amendment here first.
## Handler Trust Boundary & Coroutine Isolation
Handlers are treated as **untrusted code** that runs in an isolated coroutine context.
The message pump maintains authoritative metadata in coroutine scope and never trusts
handler output to preserve security-critical properties.
### Coroutine Capture Pattern
When dispatching a message to a handler, the pump captures metadata in coroutine scope
BEFORE handler execution:
```python
async def dispatch(msg: ParsedMessage):
# TRUSTED: Captured before handler runs
thread_uuid = msg.thread_id
sender_name = msg.listener_name
thread_path = path_registry[thread_uuid]
parent = get_parent_from_path(thread_path)
allowed_peers = registry.get_listener(sender_name).peers
# UNTRUSTED: Handler executes with minimal context
response_bytes = await handler(
payload=msg.deserialized_payload,
meta=HandlerMetadata(thread_id=thread_uuid) # Opaque UUID only
)
# TRUSTED: Coroutine scope still has authoritative metadata
# Process response using captured context, not handler claims
await process_response(
response_bytes,
actual_sender=sender_name, # From coroutine, not handler
actual_thread=thread_uuid, # From coroutine, not handler
actual_parent=parent, # From coroutine, not handler
allowed_peers=allowed_peers # From registration, not handler
)
```
### What Handlers Cannot Do
Even compromised or malicious handlers cannot:
- **Forge identity**: `<from>` is always injected from coroutine-captured sender name
- **Escape thread context**: `<thread>` is always from coroutine-captured UUID
- **Route arbitrarily**: `<to>` is computed from coroutine-captured peers list and thread path
- **Access other threads**: UUIDs are opaque; path registry is private
- **Discover topology**: Only peers list is visible; no access to path structure
- **Spoof system messages**: `<from>core</from>` only injectable by system, never handlers
### What Handlers Can Do
Handlers can only:
- **Call declared peers**: Emit XML matching peer schemas (validated against peers list)
- **Self-iterate**: Emit `<todo-until>` (routes back to sender automatically)
- **Return to caller**: Emit any other payload (routes to parent in thread path)
- **Access thread-scoped storage**: Via opaque UUID (isolated per delegation chain)
### Response Processing Security
Handler output (raw bytes) undergoes full security processing:
1. **Wrap in dummy tags** and parse with repair mode
2. **Extract payloads** via C14N and XSD validation
3. **Determine routing** using coroutine-captured metadata (never handler claims)
4. **Inject envelope** with trusted `<from>`, `<thread>`, `<to>` from coroutine scope
5. **Re-inject to pipeline** for identical security processing
Any envelope metadata in handler output is **ignored and overwritten**.
### Trust Architecture
```
┌─────────────────────────────────────────────────────┐
│ TRUSTED ZONE (System) │
│ • Path registry (UUID → hierarchical path) │
│ • Listener registry (name → peers, schema) │
│ • Thread management (pruning, parent lookup) │
│ • Envelope injection (<from>, <thread>, <to>) │
└─────────────────────────────────────────────────────┘
Coroutine Capture Boundary
┌─────────────────────────────────────────────────────┐
│ UNTRUSTED ZONE (Handler) │
│ • Receives: typed payload + opaque UUID │
│ • Returns: raw bytes │
│ • Cannot: forge identity, escape thread, probe │
│ • Can: call peers, self-iterate, return to caller │
└─────────────────────────────────────────────────────┘
```
This design ensures handlers are **capability-safe by construction**: even fully
compromised handler code cannot violate security boundaries or topology privacy.
--- ---
This integrates the blind self-iteration pattern cleanly—no contradictions, stronger obliviousness, and explicit guidance on `<todo-until/>`. The unique-root enforcement for agents is called out in Configuration and Schema layers. This integrates the blind self-iteration pattern cleanly—no contradictions, stronger obliviousness, and explicit guidance on `<todo-until/>`. The unique-root enforcement for agents is called out in Configuration and Schema layers.
Ready to roll with this as canonical. If you want any final phrasing tweaks or to add YAML examples, just say. 🚀

View file

@ -17,19 +17,21 @@ xml-pipeline/
│ │ ├── llm_connection.py │ │ ├── llm_connection.py
│ │ └── llm_listener.py │ │ └── llm_listener.py
│ ├── message_bus/ │ ├── message_bus/
│ │ ├── steps/
│ │ │ ├── __init__.py
│ │ │ └── repair_step.py
│ │ ├── __init__.py │ │ ├── __init__.py
│ │ ├── bus.py │ │ ├── bus.py
│ │ ├── config.py │ │ ├── config.py
│ │ ├── envelope.py │ │ ├── envelope.py
│ │ ├── errors.py │ │ ├── errors.py
│ │ ├── message_state.py
│ │ ├── scheduler.py │ │ ├── scheduler.py
│ │ └── thread.py │ │ └── thread.py
│ ├── prompts/ │ ├── prompts/
│ │ ├── grok_classic.py │ │ ├── grok_classic.py
│ │ └── no_paperclippers.py │ │ └── no_paperclippers.py
│ ├── schema/ │ ├── schema/
│ │ ├── payloads/
│ │ │ └── grok-response.xsd
│ │ ├── envelope.xsd │ │ ├── envelope.xsd
│ │ └── privileged-msg.xsd │ │ └── privileged-msg.xsd
│ ├── utils/ │ ├── utils/
@ -40,22 +42,24 @@ xml-pipeline/
│ ├── main.py │ ├── main.py
│ └── xml_listener.py │ └── xml_listener.py
├── docs/ ├── docs/
│ ├── agent-server.md │ ├── archive-obsolete/
│ ├── local-privilege-only.md │ │ ├── logic-and-iteration.md
│ ├── logic-and-iteration.md │ │ ├── thread-management.md
│ ├── prompt-no-paperclippers.md │ │ └── token-scheduling-issues.md
│ └── self-grammar-generation.md │ ├── configuration.md
├── scripts/ │ ├── core-principles-v2.1.md
│ └── generate_organism_key.py │ ├── listener-class-v2.1.md
│ ├── message-pump-v2.1.md
│ ├── self-grammar-generation.md
│ └── why-not-json.md
├── tests/ ├── tests/
│ ├── scripts/
│ │ └── generate_organism_key.py
│ └── __init__.py │ └── __init__.py
├── LICENSE ├── LICENSE
├── README.md ├── README.md
├── README.v0.md
├── README.v1.md
├── __init__.py ├── __init__.py
├── pyproject.toml ├── pyproject.toml
├── setup-project.ps1 ├── setup-project.ps1
└── structure.md └── structure.md
``` ```