Comprehensive documentation set for XWiki: - Home, Installation, Quick Start guides - Writing Handlers and LLM Router guides - Architecture docs (Overview, Message Pump, Thread Registry, Shared Backend) - Reference docs (Configuration, Handler Contract, CLI) - Hello World tutorial - Why XML rationale - Pandoc conversion scripts (bash + PowerShell) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
254 lines
5.1 KiB
Markdown
254 lines
5.1 KiB
Markdown
# Why XML?
|
|
|
|
XML is the right format for a sovereign, attack-resistant message bus in a multi-agent system. JSON is not.
|
|
|
|
## The Short Answer
|
|
|
|
| Feature | XML | JSON |
|
|
|---------|-----|------|
|
|
| Schema validation | XSD (built-in, precise) | JSON Schema (optional, lossy) |
|
|
| Namespaces | Native support | None |
|
|
| Canonicalization | C14N standard | No standard |
|
|
| Repair tolerance | lxml recover mode | Parser fails |
|
|
| Comments | Supported | Forbidden |
|
|
| Mixed content | Native | Fragile |
|
|
|
|
## JSON's Origins
|
|
|
|
JSON (JavaScript Object Notation) was invented in the early 2000s as a subset of JavaScript literal syntax for simple data exchange in web browsers. It was never designed as a general-purpose format—just a quick way to serialize objects for Ajax calls.
|
|
|
|
It became popular because:
|
|
- Simple for JavaScript developers
|
|
- Human-readable
|
|
- Web API boom (REST over SOAP)
|
|
- Low barrier to entry
|
|
|
|
## Why JSON Fails for Multi-Agent Systems
|
|
|
|
### No Schema Enforcement
|
|
|
|
JSON Schema exists but is:
|
|
- Optional (rarely enforced on wire)
|
|
- Lossy (can't express all constraints)
|
|
- Inconsistently implemented
|
|
|
|
Result: Messages accepted without validation, bugs discovered at runtime.
|
|
|
|
### No Namespaces
|
|
|
|
Can't safely mix vocabularies:
|
|
|
|
```json
|
|
{
|
|
"name": "Alice", // User name? Product name?
|
|
"type": "admin" // User type? Message type?
|
|
}
|
|
```
|
|
|
|
### No Canonicalization
|
|
|
|
No standard way to normalize for signing:
|
|
|
|
```json
|
|
{"a": 1, "b": 2}
|
|
{"b": 2, "a": 1}
|
|
```
|
|
|
|
Same data? Different bytes. Can't sign reliably.
|
|
|
|
### No Repair Tolerance
|
|
|
|
One syntax error → entire payload rejected:
|
|
|
|
```json
|
|
{"name": "Alice",} // Trailing comma → FAIL
|
|
```
|
|
|
|
### Escaping Hell
|
|
|
|
Strings with special characters are fragile:
|
|
|
|
```json
|
|
{"message": "She said \"hello\""} // Manual escaping
|
|
```
|
|
|
|
Easy to break, security vulnerability vector.
|
|
|
|
## Why JSON Fails for LLM Integration
|
|
|
|
### Hallucination Fragility
|
|
|
|
LLMs routinely produce invalid JSON:
|
|
- Trailing commas
|
|
- Missing quotes
|
|
- Wrong nesting
|
|
- Comments (forbidden!)
|
|
|
|
Result: Massive prompt bloat ("You MUST output valid JSON, NO trailing commas EVER...") and post-processing parsers.
|
|
|
|
### No Graceful Degradation
|
|
|
|
One parse error → entire response lost. No partial recovery.
|
|
|
|
### Injection Attacks
|
|
|
|
User input in strings can break JSON structure:
|
|
|
|
```json
|
|
{"user_input": "Alice", "role": "admin"}
|
|
```
|
|
|
|
If user provides `", "role": "admin"` in their name → injection.
|
|
|
|
## Why XML Succeeds
|
|
|
|
### Schema as Contract
|
|
|
|
XSD enforces exact structure on the wire:
|
|
|
|
```xml
|
|
<xs:element name="greeting">
|
|
<xs:complexType>
|
|
<xs:sequence>
|
|
<xs:element name="name" type="xs:string"/>
|
|
</xs:sequence>
|
|
</xs:complexType>
|
|
</xs:element>
|
|
```
|
|
|
|
Every message validated before processing. No ambiguity.
|
|
|
|
### Namespaces
|
|
|
|
Safe vocabulary mixing:
|
|
|
|
```xml
|
|
<message xmlns="https://xml-pipeline.org/ns/envelope/v1">
|
|
<user:profile xmlns:user="https://example.org/user">
|
|
<user:name>Alice</user:name>
|
|
</user:profile>
|
|
</message>
|
|
```
|
|
|
|
### Canonicalization (C14N)
|
|
|
|
Deterministic representation for signing:
|
|
|
|
```python
|
|
c14n_bytes = etree.tostring(tree, method='c14n')
|
|
signature = sign(c14n_bytes)
|
|
```
|
|
|
|
Same logical content → same bytes → verifiable signatures.
|
|
|
|
### Repair Tolerance
|
|
|
|
lxml recover mode fixes common issues:
|
|
|
|
```python
|
|
parser = etree.XMLParser(recover=True)
|
|
tree = etree.fromstring(broken_xml, parser)
|
|
```
|
|
|
|
Partial documents, encoding issues, missing tags → recovered.
|
|
|
|
### Self-Describing
|
|
|
|
Elements carry meaning:
|
|
|
|
```xml
|
|
<greeting>
|
|
<name>Alice</name>
|
|
</greeting>
|
|
```
|
|
|
|
vs JSON:
|
|
|
|
```json
|
|
["Alice"] // What is this?
|
|
```
|
|
|
|
## LLM + XML = Reliable
|
|
|
|
### Natural Streaming
|
|
|
|
XML streams naturally (can process before complete).
|
|
|
|
### Repair on Output
|
|
|
|
LLM produces broken XML? lxml fixes it:
|
|
|
|
```python
|
|
from lxml import etree
|
|
|
|
parser = etree.XMLParser(recover=True)
|
|
tree = etree.fromstring(llm_output, parser)
|
|
# Works even with minor errors
|
|
```
|
|
|
|
### Schema-Guided Generation
|
|
|
|
XSD tells LLM exactly what to produce:
|
|
|
|
```
|
|
Generate XML matching this schema:
|
|
<greeting><name>string</name></greeting>
|
|
```
|
|
|
|
Clear contract, fewer hallucinations.
|
|
|
|
### Graceful Validation
|
|
|
|
Validation errors become helpful feedback:
|
|
|
|
```xml
|
|
<huh>
|
|
<error>Element 'greeting' missing required element 'name'</error>
|
|
</huh>
|
|
```
|
|
|
|
LLM can self-correct.
|
|
|
|
## The Trade-Offs
|
|
|
|
### XML is More Verbose
|
|
|
|
```xml
|
|
<greeting><name>Alice</name></greeting>
|
|
```
|
|
|
|
vs
|
|
|
|
```json
|
|
{"name": "Alice"}
|
|
```
|
|
|
|
**But:** Compression eliminates this on wire. And verbosity aids debugging.
|
|
|
|
### XML Parsing is Slower
|
|
|
|
Microseconds more than JSON parsing.
|
|
|
|
**But:** Network latency dominates. And lxml is highly optimized.
|
|
|
|
### XML is "Old"
|
|
|
|
True. Also mature, battle-tested, standards-based.
|
|
|
|
## Conclusion
|
|
|
|
JSON won the web because it was "good enough" for stateless HTTP requests.
|
|
|
|
XML wins for multi-agent systems because:
|
|
- Security requires schema enforcement
|
|
- Signing requires canonicalization
|
|
- LLMs require repair tolerance
|
|
- Complexity requires namespaces
|
|
|
|
**JSON won the web. XML wins the swarm.**
|
|
|
|
## Further Reading
|
|
|
|
- [W3C XML Schema](https://www.w3.org/XML/Schema)
|
|
- [Exclusive XML Canonicalization](https://www.w3.org/TR/xml-exc-c14n/)
|
|
- [lxml Documentation](https://lxml.de/)
|