xml-pipeline/docs/wiki/architecture/Overview.md
dullfig 3a128d4d1f Fix line endings in wiki docs
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 22:16:26 -08:00

9.6 KiB

Architecture Overview

xml-pipeline implements a stream-based message pump where all communication flows through validated XML envelopes. The architecture enforces strict isolation between handlers (untrusted code) and the system (trusted zone).

High-Level Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                        TRUSTED ZONE (System)                        │
│  • Thread registry (UUID ↔ call chain mapping)                      │
│  • Listener registry (name → peers, schema)                         │
│  • Envelope injection (<from>, <thread>, <to>)                      │
│  • Peer constraint enforcement                                      │
└─────────────────────────────────────────────────────────────────────┘
                               ↕
                    Coroutine Capture Boundary
                               ↕
┌─────────────────────────────────────────────────────────────────────┐
│                      UNTRUSTED ZONE (Handlers)                      │
│  • Receive typed payload + metadata                                 │
│  • Return HandlerResponse or None                                   │
│  • Cannot forge identity, escape thread, or probe topology          │
└─────────────────────────────────────────────────────────────────────┘

Core Components

Message Pump (StreamPump)

The central orchestrator that:

  1. Receives raw XML bytes
  2. Runs messages through preprocessing pipeline
  3. Routes to appropriate handlers
  4. Processes responses and re-injects

See Message Pump for details.

Pipeline Steps

Messages flow through ordered processing stages:

Raw Bytes
    │
    ▼
┌─────────────────┐
│  repair_step    │  Fix malformed XML (lxml recover mode)
└────────┬────────┘
         ▼
┌─────────────────┐
│   c14n_step     │  Canonicalize XML (Exclusive C14N)
└────────┬────────┘
         ▼
┌─────────────────┐
│ envelope_valid  │  Validate against envelope.xsd
└────────┬────────┘
         ▼
┌─────────────────┐
│ payload_extract │  Extract payload from envelope
└────────┬────────┘
         ▼
┌─────────────────┐
│ thread_assign   │  Assign or inherit thread UUID
└────────┬────────┘
         ▼
┌─────────────────┐
│  xsd_validate   │  Validate against listener's XSD
└────────┬────────┘
         ▼
┌─────────────────┐
│ deserialize     │  XML → @xmlify dataclass
└────────┬────────┘
         ▼
┌─────────────────┐
│    routing      │  Match to listener(s)
└────────┬────────┘
         ▼
    Handler

Thread Registry

Maps opaque UUIDs to call chains:

UUID: 550e8400-e29b-41d4-...
Chain: system.organism.console.greeter.calculator
        │       │        │       │        │
        │       │        │       │        └─ Current handler
        │       │        │       └─ Previous hop
        │       │        └─ Entry point
        │       └─ Organism name
        └─ Root

Handlers only see the UUID. The actual chain is private to the system.

See Thread Registry for details.

Listener Registry

Tracks registered listeners:

name: "greeter"
  ├── payload_class: Greeting
  ├── handler: handle_greeting
  ├── description: "Friendly greeting handler"
  ├── agent: true
  ├── peers: [shouter, calculator]
  └── schema: schemas/greeter/v1.xsd

Context Buffer

Stores message history per thread:

Thread: uuid-123
  ├── Slot 0: Greeting(name="Alice") from console
  ├── Slot 1: GreetingResponse(message="Hello!") from greeter
  └── Slot 2: ShoutResponse(text="HELLO!") from shouter

Append-only, immutable slots. Auto-GC when thread is pruned.

Message Flow

1. Message Arrival

External message arrives (console, WebSocket, etc.):

<message xmlns="https://xml-pipeline.org/ns/envelope/v1">
  <meta>
    <from>console</from>
    <to>greeter</to>
  </meta>
  <greeting>
    <name>Alice</name>
  </greeting>
</message>

2. Pipeline Processing

Message flows through pipeline steps. Each step transforms MessageState:

@dataclass
class MessageState:
    raw_bytes: bytes | None          # Input
    envelope_tree: Element | None    # After repair
    payload_tree: Element | None     # After extraction
    payload: Any | None              # After deserialization
    thread_id: str | None            # After assignment
    from_id: str | None              # Sender
    target_listeners: list | None    # After routing
    error: str | None                # If step fails

3. Handler Dispatch

Handler receives typed payload + metadata:

async def handle_greeting(payload: Greeting, metadata: HandlerMetadata):
    # payload.name == "Alice"
    # metadata.thread_id == "uuid-123"
    # metadata.from_id == "console"

4. Response Processing

Handler returns HandlerResponse:

return HandlerResponse(
    payload=GreetingResponse(message="Hello, Alice!"),
    to="shouter",
)

System:

  1. Validates to against peer list
  2. Serializes payload to XML
  3. Creates new envelope with injected <from>
  4. Re-injects into pipeline

Trust Boundaries

What the System Controls

Aspect System Responsibility
<from> Always injected from listener.name
<thread> Managed by thread registry
<to> validation Checked against peers list
Schema enforcement XSD validation on every message
Call chain Private, never exposed to handlers

What Handlers Control

Aspect Handler Capability
Payload content Full control
Target selection Via HandlerResponse.to (validated)
Response/no response Return value
Self-iteration Call own name

What Handlers Cannot Do

  • Forge sender identity
  • Access other threads
  • Discover topology
  • Route to undeclared peers
  • Modify message history
  • Access other handlers' state

Multiprocess Architecture

For CPU-bound handlers:

┌─────────────────────────────────────────────────────────────────┐
│  Main Process (StreamPump)                                      │
│  - Ingress pipeline                                             │
│  - Routing decisions                                            │
│  - Response re-injection                                        │
└───────────────────────────┬─────────────────────────────────────┘
                            │ UUID + handler_path (minimal IPC)
              ┌─────────────┼─────────────┐
              ▼             ▼             ▼
┌─────────────────┐ ┌─────────────┐ ┌─────────────────┐
│ Python Async    │ │ ProcessPool │ │ (Future: WASM)  │
│ (main process)  │ │ (N workers) │ │                 │
│ - Default mode  │ │ - cpu_bound │ │                 │
└────────┬────────┘ └──────┬──────┘ └────────┬────────┘
         │                 │                  │
         └─────────────────┼──────────────────┘
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│  Shared Backend (Redis / Manager / Memory)                      │
│  - Context buffer slots                                         │
│  - Thread registry mappings                                     │
└─────────────────────────────────────────────────────────────────┘

See Shared Backend for details.

See Also