# Why XML?
XML is the right format for a sovereign, attack-resistant message bus in a multi-agent system. JSON is not.
## The Short Answer
| Feature | XML | JSON |
|---------|-----|------|
| Schema validation | XSD (built-in, precise) | JSON Schema (optional, lossy) |
| Namespaces | Native support | None |
| Canonicalization | C14N standard | No standard |
| Repair tolerance | lxml recover mode | Parser fails |
| Comments | Supported | Forbidden |
| Mixed content | Native | Fragile |
## JSON's Origins
JSON (JavaScript Object Notation) was invented in the early 2000s as a subset of JavaScript literal syntax for simple data exchange in web browsers. It was never designed as a general-purpose format—just a quick way to serialize objects for Ajax calls.
It became popular because:
- Simple for JavaScript developers
- Human-readable
- Web API boom (REST over SOAP)
- Low barrier to entry
## Why JSON Fails for Multi-Agent Systems
### No Schema Enforcement
JSON Schema exists but is:
- Optional (rarely enforced on wire)
- Lossy (can't express all constraints)
- Inconsistently implemented
Result: Messages accepted without validation, bugs discovered at runtime.
### No Namespaces
Can't safely mix vocabularies:
```json
{
"name": "Alice", // User name? Product name?
"type": "admin" // User type? Message type?
}
```
### No Canonicalization
No standard way to normalize for signing:
```json
{"a": 1, "b": 2}
{"b": 2, "a": 1}
```
Same data? Different bytes. Can't sign reliably.
### No Repair Tolerance
One syntax error → entire payload rejected:
```json
{"name": "Alice",} // Trailing comma → FAIL
```
### Escaping Hell
Strings with special characters are fragile:
```json
{"message": "She said \"hello\""} // Manual escaping
```
Easy to break, security vulnerability vector.
## Why JSON Fails for LLM Integration
### Hallucination Fragility
LLMs routinely produce invalid JSON:
- Trailing commas
- Missing quotes
- Wrong nesting
- Comments (forbidden!)
Result: Massive prompt bloat ("You MUST output valid JSON, NO trailing commas EVER...") and post-processing parsers.
### No Graceful Degradation
One parse error → entire response lost. No partial recovery.
### Injection Attacks
User input in strings can break JSON structure:
```json
{"user_input": "Alice", "role": "admin"}
```
If user provides `", "role": "admin"` in their name → injection.
## Why XML Succeeds
### Schema as Contract
XSD enforces exact structure on the wire:
```xml
```
Every message validated before processing. No ambiguity.
### Namespaces
Safe vocabulary mixing:
```xml
Alice
```
### Canonicalization (C14N)
Deterministic representation for signing:
```python
c14n_bytes = etree.tostring(tree, method='c14n')
signature = sign(c14n_bytes)
```
Same logical content → same bytes → verifiable signatures.
### Repair Tolerance
lxml recover mode fixes common issues:
```python
parser = etree.XMLParser(recover=True)
tree = etree.fromstring(broken_xml, parser)
```
Partial documents, encoding issues, missing tags → recovered.
### Self-Describing
Elements carry meaning:
```xml
Alice
```
vs JSON:
```json
["Alice"] // What is this?
```
## LLM + XML = Reliable
### Natural Streaming
XML streams naturally (can process before complete).
### Repair on Output
LLM produces broken XML? lxml fixes it:
```python
from lxml import etree
parser = etree.XMLParser(recover=True)
tree = etree.fromstring(llm_output, parser)
# Works even with minor errors
```
### Schema-Guided Generation
XSD tells LLM exactly what to produce:
```
Generate XML matching this schema:
string
```
Clear contract, fewer hallucinations.
### Graceful Validation
Validation errors become helpful feedback:
```xml
Element 'greeting' missing required element 'name'
```
LLM can self-correct.
## The Trade-Offs
### XML is More Verbose
```xml
Alice
```
vs
```json
{"name": "Alice"}
```
**But:** Compression eliminates this on wire. And verbosity aids debugging.
### XML Parsing is Slower
Microseconds more than JSON parsing.
**But:** Network latency dominates. And lxml is highly optimized.
### XML is "Old"
True. Also mature, battle-tested, standards-based.
## Conclusion
JSON won the web because it was "good enough" for stateless HTTP requests.
XML wins for multi-agent systems because:
- Security requires schema enforcement
- Signing requires canonicalization
- LLMs require repair tolerance
- Complexity requires namespaces
**JSON won the web. XML wins the swarm.**
## Further Reading
- [W3C XML Schema](https://www.w3.org/XML/Schema)
- [Exclusive XML Canonicalization](https://www.w3.org/TR/xml-exc-c14n/)
- [lxml Documentation](https://lxml.de/)