xml-pipeline/docs/parallelism-by-topology.md
dullfig d97c24b1dd
Some checks failed
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
CI / test (3.13) (push) Has been cancelled
CI / lint (push) Has been cancelled
CI / typecheck (push) Has been cancelled
Add message journal, graceful restart, and clean repo for public release
Three workstreams implemented:

W1 (Repo Split): Remove proprietary BloxServer files and docs, update
pyproject.toml URLs to public GitHub, clean doc references, add CI
workflow (.github/workflows/ci.yml) and CONTRIBUTING.md.

W2 (Message Journal): Add DispatchHook protocol for dispatch lifecycle
events, SQLite-backed MessageJournal with WAL mode for certified-mail
delivery guarantees (PENDING→DISPATCHED→ACKED/FAILED), integrate hooks
into StreamPump._dispatch_to_handlers(), add journal REST endpoints,
and aiosqlite dependency.

W3 (Hot Deployment): Add RestartOrchestrator for graceful restart with
queue drain and journal stats collection, SIGHUP signal handler in CLI,
POST /organism/restart endpoint, restart-aware app lifespan with journal
recovery on boot, and os.execv/subprocess re-exec for Unix/Windows.

All 439 tests pass (37 new tests for W2/W3).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 22:27:38 -08:00

235 lines
8.8 KiB
Markdown

# Parallelism by Topology
**Status:** Architectural Concept
**Author:** Dan & Donna
**Date:** 2026-01-26
## Overview
In xml-pipeline, **parallelism is a wiring decision, not a configuration option**.
Unlike traditional workflow tools where you toggle "parallel execution" checkboxes or set concurrency limits in config files, xml-pipeline uses the flow topology itself to determine whether work is processed sequentially or in parallel.
The key insight: **a buffer node acts as a parallelism primitive**.
## The Problem with Traditional Approaches
Most workflow tools treat parallelism as an afterthought:
```yaml
# n8n / Zapier style
node:
type: process_video
concurrency: 3 # Magic number, what if you want 1 sometimes and 10 others?
batch_size: 5 # More config to maintain
```
This leads to:
- Config sprawl
- Hidden behavior (why did this run in parallel?)
- Hard-to-debug race conditions
- One-size-fits-all concurrency that doesn't adapt to workload
## The xml-pipeline Way
Parallelism emerges from how you wire your flow.
### Sequential Execution (Direct Wiring)
```
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Trigger │─────▶│ Process │─────▶│ Output │
│ (files) │ │ Video │ │ │
└──────────┘ └──────────┘ └──────────┘
```
When files are dropped:
1. Trigger fires for File 1 → Process Video receives it
2. Trigger fires for File 2 → Process Video is busy, message queues
3. Trigger fires for File 3 → Also queues
4. Files process **one at a time**, in order
**Use when:** Order matters, resources are limited, or downstream systems can't handle concurrency.
### Parallel Execution (Buffer Wiring)
```
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Trigger │─────▶│ Buffer │─────▶│ Process │─────▶│ Output │
│ (files) │ │ │ │ Video │ │ │
└──────────┘ └──────────┘ └──────────┘ └──────────┘
```
When files are dropped:
1. Trigger fires for File 1 → Buffer spawns **Thread A**, forwards to Process Video
2. Trigger fires for File 2 → Buffer spawns **Thread B**, forwards to Process Video
3. Trigger fires for File 3 → Buffer spawns **Thread C**, forwards to Process Video
4. Three independent threads now executing in parallel!
**Use when:** Tasks are independent, you want maximum throughput, or tasks have variable duration.
## Why This Works
### Thread Isolation
Each thread gets:
- **Unique UUID** — Opaque identifier, can't be guessed or forged
- **Independent call chain** — `trigger.buffer.process` per thread
- **Isolated context** — Thread A can't see Thread B's state
- **Separate lifecycle** — Thread A completing doesn't affect Thread B
### Multiprocess Nodes
Every node runs in its own process. When multiple threads arrive:
- Node spawns worker processes as needed
- Each worker handles one thread
- WASM sandboxing ensures isolation
- No shared state, no race conditions
### Natural Load Balancing
The buffer doesn't just fan-out — it **decouples** the trigger from processing:
- Fast tasks (2-minute shorts) complete quickly
- Slow tasks (2-hour features) chug along
- No artificial batching or waiting
- System naturally adapts to workload
## Real-World Example: Video Processing Pipeline
Imagine a content pipeline that receives video requirements:
```
Requirements folder watched by trigger:
├── short_ad_1.json (30 sec video, ~2 min to process)
├── short_ad_2.json (30 sec video, ~2 min to process)
└── feature_film.json (2 hour video, ~4 hours to process)
```
### Without Buffer (Sequential)
```
Timeline:
├─ 0:00 Start short_ad_1
├─ 0:02 Finish short_ad_1, start short_ad_2
├─ 0:04 Finish short_ad_2, start feature_film
└─ 4:04 Finish feature_film
Total time: 4 hours 4 minutes
Shorts delivered: After 2-4 minutes
```
### With Buffer (Parallel)
```
Timeline:
├─ 0:00 Start short_ad_1 (Thread A)
├─ 0:00 Start short_ad_2 (Thread B)
├─ 0:00 Start feature_film (Thread C)
├─ 0:02 Finish short_ad_1 ✓
├─ 0:02 Finish short_ad_2 ✓
└─ 4:00 Finish feature_film ✓
Total time: 4 hours (wall clock, same as longest task)
Shorts delivered: After 2 minutes!
```
The shorts are ready in **2 minutes** instead of waiting behind the feature film.
## Advanced Patterns
### Controlled Parallelism
Want parallelism but with limits? Use a **throttled buffer**:
```
┌──────────┐ ┌──────────────┐ ┌──────────┐
│ Trigger │─────▶│ Buffer │─────▶│ Process │
│ │ │ (max_threads │ │ │
│ │ │ = 3) │ │ │
└──────────┘ └──────────────┘ └──────────┘
```
Buffer spawns at most 3 threads. Additional items queue until a slot opens.
### Fan-Out / Fan-In
Process items in parallel, then aggregate results:
```
┌──────────┐
┌──▶│ Worker A │──┐
┌────────┐ ┌────┴─┐ └──────────┘ │ ┌───────────┐ ┌────────┐
│Trigger │───▶│Buffer│ ├─▶│ Collector │───▶│ Output │
└────────┘ └────┬─┘ ┌──────────┐ │ └───────────┘ └────────┘
└──▶│ Worker B │──┘
└──────────┘
```
Collector waits for all threads in its context to complete, then aggregates.
### Conditional Parallelism
Use a **router** before the buffer to decide:
```
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Trigger │─────▶│ Router │─────▶│ Buffer │─────▶│ Process │
│ │ │ (if big │ │ │ │ │
│ │ │ file) │ └──────────┘ └──────────┘
│ │ │ │
│ │ │ (else) │─────────────────────────────▶│
└──────────┘ └──────────┘
```
Small files go direct (sequential among themselves), large files get buffered (parallel).
## Implementation Notes
### Buffer Node Contract
A buffer node:
1. Accepts incoming messages
2. For each message, generates a **new thread UUID**
3. Forwards message with new thread to downstream node
4. Does NOT wait for downstream completion (fire-and-forget)
### Thread Registry Behavior
When buffer spawns a new thread:
```
Incoming thread: trigger.buffer (uuid-1)
Buffer creates: trigger.buffer.process (uuid-2) ─▶ Thread A
trigger.buffer.process (uuid-3) ─▶ Thread B
trigger.buffer.process (uuid-4) ─▶ Thread C
```
Each UUID is independent. Responses from Process don't need to reconverge unless explicitly routed to a collector.
### No Hidden Magic
The behavior is **entirely determined by the visual flow**:
- See a direct wire? Sequential.
- See a buffer? Parallel.
- No config files to check.
- No runtime surprises.
## Comparison with Other Tools
| Feature | n8n/Zapier | Temporal | xml-pipeline |
|---------|------------|----------|--------------|
| Parallelism control | Config flags | Code annotations | **Topology** |
| Visibility | Hidden in settings | Hidden in code | **Visual in canvas** |
| Flexibility | Fixed at deploy | Fixed at deploy | **Changeable by rewiring** |
| Learning curve | Read docs | Read code | **Look at flow** |
## Summary
> "If you want sequential, wire direct. If you want parallel, add a buffer."
This single principle replaces pages of concurrency documentation. Users learn it once, apply it everywhere, and can see their concurrency decisions directly in the flow canvas.
---
*Parallelism should be obvious, not hidden.*