66 lines
No EOL
4.6 KiB
Markdown
66 lines
No EOL
4.6 KiB
Markdown
# So Why XML and Not JSON?
|
|
|
|
XML is the right format for a sovereign, attack-resistant message bus in a multi-agent organism. JSON is not — it's a lightweight data interchange hack that exploded in popularity but carries fundamental flaws that make it brittle, insecure, and unsuitable for precise, evolvable contracts.
|
|
|
|
This project uses Exclusive C14N XML on the wire, XSD for exact validation, and repair for tolerance. The result: no drift, no escaping nightmares, no hidden parsing ambiguities. Contracts are enforced by standards, not convention.
|
|
|
|
## Where Did JSON Come From?
|
|
|
|
JSON (JavaScript Object Notation) was invented in the early 2000s by Douglas Crockford as a subset of JavaScript literal syntax for simple data exchange in web browsers. It was never designed as a general-purpose format — just a quick way to serialize objects for Ajax calls without eval() dangers.
|
|
|
|
It started as "fat-free XML" for dynamic web pages, stripping tags for brevity.
|
|
|
|
## What Made It Popular?
|
|
|
|
- **Simplicity for JS devs**: No schema, human-readable, easy to generate/parse in browsers.
|
|
- **Web API boom**: REST APIs adopted it over XML (SOAP was verbose/bloated).
|
|
- **Ecosystem inertia**: Libraries everywhere, low barrier for startups.
|
|
- **Perceived lightness**: Smaller payloads than verbose XML.
|
|
|
|
It spread because it was "good enough" for stateless HTTP requests in the 2010s web era.
|
|
|
|
## How It Carried Over to AI
|
|
|
|
LLM tool calling adopted JSON because:
|
|
- OpenAI's function calling API used JSON schemas.
|
|
- Everyone copied the leader — brittle but "standard".
|
|
- Prompt engineers learned to coerce models into valid JSON with endless instructions ("always output valid JSON, no trailing commas...").
|
|
|
|
Result: Massive prompt bloat, hallucinated formats, post-processing parsers, and constant fixes for escaping/order issues.
|
|
|
|
## What Makes It the Wrong Format
|
|
|
|
JSON lacks:
|
|
- **Namespaces** — no way to mix vocabularies safely.
|
|
- **Schemas as first-class contracts** — JSON Schema is optional, lossy, and rarely enforced on wire.
|
|
- **Canonicalization** — no standard way to normalize for signing/comparison.
|
|
- **Comments** — forbidden, forcing side channels.
|
|
- **Mixed content** — text + structure fragile.
|
|
|
|
It's order-sensitive (objects) but unordered by spec, leading to bugs.
|
|
|
|
## All the (Really) Bad Things That JSON Brings
|
|
|
|
- **Escaping hell**: Strings with quotes/newlines/tabs require manual escaping; easy to break.
|
|
- **No validation on wire**: Servers trust client JSON — injection attacks common.
|
|
- **Order ambiguity**: Objects unordered, arrays ordered — inconsistent, parsing surprises.
|
|
- **No self-description**: Types inferred, no built-in schema reference.
|
|
- **Brittle AI wiring**: LLMs hallucinate invalid JSON (trailing commas, wrong types) → endless retry loops or custom parsers.
|
|
- **Drift prone**: No standard evolution path — schemas change, old clients break silently.
|
|
|
|
## Red Team Horror Stories from JSON in AI
|
|
|
|
JSON's flaws turn into nightmares when LLMs are involved — brittle formats meet hallucination-prone models:
|
|
|
|
- **Hallucinated Invalid JSON**: LLMs routinely add trailing commas, forget quotes, or nest wrong — forcing massive prompt bloat ("You MUST output valid JSON, no trailing commas EVER") and post-processing parsers. One missing brace → entire tool call dropped, reasoning loop broken.
|
|
- **Escaping Injection Hell**: User input with quotes/newlines in strings? Escaping fails → malformed JSON → parser crashes or silent data loss. Red teams exploit this for prompt injection (embed control sequences in "safe" strings).
|
|
- **Order & Type Ambiguity**: Objects unordered → tools break on key order assumptions. Numbers as strings → type confusion attacks. Arrays of mixed types → validation impossible without custom code.
|
|
- **No Tolerance for Dirt**: LLM streams comments or extra text? JSON parsers choke. No repair — whole response rejected, forcing retries and token waste.
|
|
- **Real-World Breaks**: Early OpenAI function calling — endless "invalid JSON" errors until prompts became novels. Projects add custom "JSON repair" libraries — admitting the format's fragility.
|
|
|
|
XML + XSD + repair avoids this entirely: exact contracts enforced on wire, dirty streams tolerated via dummy extraction, no escaping quagmires. The organism stays sovereign.
|
|
|
|
JSON lost the AI war before it started.
|
|
XML (with XSD + C14N) solves these: exact contracts, namespaces, repair tolerance, signing, comments if needed. It's heavier on disk but sovereign on wire — perfect for a bounded organism where security and auditability matter more than minimal bytes.
|
|
|
|
JSON won the web. XML wins the swarm. Permanently. |