Comprehensive documentation set for XWiki: - Home, Installation, Quick Start guides - Writing Handlers and LLM Router guides - Architecture docs (Overview, Message Pump, Thread Registry, Shared Backend) - Reference docs (Configuration, Handler Contract, CLI) - Hello World tutorial - Why XML rationale - Pandoc conversion scripts (bash + PowerShell) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
5.1 KiB
Why XML?
XML is the right format for a sovereign, attack-resistant message bus in a multi-agent system. JSON is not.
The Short Answer
| Feature | XML | JSON |
|---|---|---|
| Schema validation | XSD (built-in, precise) | JSON Schema (optional, lossy) |
| Namespaces | Native support | None |
| Canonicalization | C14N standard | No standard |
| Repair tolerance | lxml recover mode | Parser fails |
| Comments | Supported | Forbidden |
| Mixed content | Native | Fragile |
JSON's Origins
JSON (JavaScript Object Notation) was invented in the early 2000s as a subset of JavaScript literal syntax for simple data exchange in web browsers. It was never designed as a general-purpose format—just a quick way to serialize objects for Ajax calls.
It became popular because:
- Simple for JavaScript developers
- Human-readable
- Web API boom (REST over SOAP)
- Low barrier to entry
Why JSON Fails for Multi-Agent Systems
No Schema Enforcement
JSON Schema exists but is:
- Optional (rarely enforced on wire)
- Lossy (can't express all constraints)
- Inconsistently implemented
Result: Messages accepted without validation, bugs discovered at runtime.
No Namespaces
Can't safely mix vocabularies:
{
"name": "Alice", // User name? Product name?
"type": "admin" // User type? Message type?
}
No Canonicalization
No standard way to normalize for signing:
{"a": 1, "b": 2}
{"b": 2, "a": 1}
Same data? Different bytes. Can't sign reliably.
No Repair Tolerance
One syntax error → entire payload rejected:
{"name": "Alice",} // Trailing comma → FAIL
Escaping Hell
Strings with special characters are fragile:
{"message": "She said \"hello\""} // Manual escaping
Easy to break, security vulnerability vector.
Why JSON Fails for LLM Integration
Hallucination Fragility
LLMs routinely produce invalid JSON:
- Trailing commas
- Missing quotes
- Wrong nesting
- Comments (forbidden!)
Result: Massive prompt bloat ("You MUST output valid JSON, NO trailing commas EVER...") and post-processing parsers.
No Graceful Degradation
One parse error → entire response lost. No partial recovery.
Injection Attacks
User input in strings can break JSON structure:
{"user_input": "Alice", "role": "admin"}
If user provides ", "role": "admin" in their name → injection.
Why XML Succeeds
Schema as Contract
XSD enforces exact structure on the wire:
<xs:element name="greeting">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
Every message validated before processing. No ambiguity.
Namespaces
Safe vocabulary mixing:
<message xmlns="https://xml-pipeline.org/ns/envelope/v1">
<user:profile xmlns:user="https://example.org/user">
<user:name>Alice</user:name>
</user:profile>
</message>
Canonicalization (C14N)
Deterministic representation for signing:
c14n_bytes = etree.tostring(tree, method='c14n')
signature = sign(c14n_bytes)
Same logical content → same bytes → verifiable signatures.
Repair Tolerance
lxml recover mode fixes common issues:
parser = etree.XMLParser(recover=True)
tree = etree.fromstring(broken_xml, parser)
Partial documents, encoding issues, missing tags → recovered.
Self-Describing
Elements carry meaning:
<greeting>
<name>Alice</name>
</greeting>
vs JSON:
["Alice"] // What is this?
LLM + XML = Reliable
Natural Streaming
XML streams naturally (can process before complete).
Repair on Output
LLM produces broken XML? lxml fixes it:
from lxml import etree
parser = etree.XMLParser(recover=True)
tree = etree.fromstring(llm_output, parser)
# Works even with minor errors
Schema-Guided Generation
XSD tells LLM exactly what to produce:
Generate XML matching this schema:
<greeting><name>string</name></greeting>
Clear contract, fewer hallucinations.
Graceful Validation
Validation errors become helpful feedback:
<huh>
<error>Element 'greeting' missing required element 'name'</error>
</huh>
LLM can self-correct.
The Trade-Offs
XML is More Verbose
<greeting><name>Alice</name></greeting>
vs
{"name": "Alice"}
But: Compression eliminates this on wire. And verbosity aids debugging.
XML Parsing is Slower
Microseconds more than JSON parsing.
But: Network latency dominates. And lxml is highly optimized.
XML is "Old"
True. Also mature, battle-tested, standards-based.
Conclusion
JSON won the web because it was "good enough" for stateless HTTP requests.
XML wins for multi-agent systems because:
- Security requires schema enforcement
- Signing requires canonicalization
- LLMs require repair tolerance
- Complexity requires namespaces
JSON won the web. XML wins the swarm.