From a1ad8b843f104c1ba4fbbc4f0e8fec10f5ff35e2 Mon Sep 17 00:00:00 2001 From: dullfig Date: Sat, 3 Jan 2026 17:57:07 -0800 Subject: [PATCH] added rant --- README.md | 10 +++ agentserver/schema/payloads/grok-response.xsd | 15 ----- docs/why-not-json.md | 66 +++++++++++++++++++ 3 files changed, 76 insertions(+), 15 deletions(-) delete mode 100644 agentserver/schema/payloads/grok-response.xsd create mode 100644 docs/why-not-json.md diff --git a/README.md b/README.md index b206a35..b7d3f97 100644 --- a/README.md +++ b/README.md @@ -4,6 +4,16 @@ **January 03, 2026** **Architecture: Autonomous Schema-Driven, Turing-Complete Multi-Agent Organism** +## The Rant +**Why XML?** +[Why not JSON?](docs/why-not-json.md) + +XML is the sovereign wire format — standards-based, self-describing, attack-resistant, and evolvable without drift. JSON was a quick hack that escaped into the wild and became the default for everything, including AI tool calling, where its brittleness causes endless prompt surgery and validation headaches. + +This project chooses XML deliberately. The organism enforces contracts exactly (XSD validation, no transcription bugs), tolerates dirty streams (repair + dummy extraction), and keeps reasoning visible. No fragile conventions. No escaping hell. Just bounded, auditable computation. + +Read the full rant [here](docs/why-not-json.md) for the history, pitfalls, and why XML wins permanently. + ## What It Is AgentServer is a production-ready substrate for the `xml-pipeline` nervous system. Version 2.0 stabilizes the design around exact XSD validation, typed dataclass handlers, mandatory hierarchical threading, and strict out-of-band privileged control. diff --git a/agentserver/schema/payloads/grok-response.xsd b/agentserver/schema/payloads/grok-response.xsd deleted file mode 100644 index 541c1c9..0000000 --- a/agentserver/schema/payloads/grok-response.xsd +++ /dev/null @@ -1,15 +0,0 @@ - - - - - - - - - - - - \ No newline at end of file diff --git a/docs/why-not-json.md b/docs/why-not-json.md new file mode 100644 index 0000000..92b14d4 --- /dev/null +++ b/docs/why-not-json.md @@ -0,0 +1,66 @@ +# So Why XML and Not JSON? + +XML is the right format for a sovereign, attack-resistant message bus in a multi-agent organism. JSON is not — it's a lightweight data interchange hack that exploded in popularity but carries fundamental flaws that make it brittle, insecure, and unsuitable for precise, evolvable contracts. + +This project uses Exclusive C14N XML on the wire, XSD for exact validation, and repair for tolerance. The result: no drift, no escaping nightmares, no hidden parsing ambiguities. Contracts are enforced by standards, not convention. + +## Where Did JSON Come From? + +JSON (JavaScript Object Notation) was invented in the early 2000s by Douglas Crockford as a subset of JavaScript literal syntax for simple data exchange in web browsers. It was never designed as a general-purpose format — just a quick way to serialize objects for Ajax calls without eval() dangers. + +It started as "fat-free XML" for dynamic web pages, stripping tags for brevity. + +## What Made It Popular? + +- **Simplicity for JS devs**: No schema, human-readable, easy to generate/parse in browsers. +- **Web API boom**: REST APIs adopted it over XML (SOAP was verbose/bloated). +- **Ecosystem inertia**: Libraries everywhere, low barrier for startups. +- **Perceived lightness**: Smaller payloads than verbose XML. + +It spread because it was "good enough" for stateless HTTP requests in the 2010s web era. + +## How It Carried Over to AI + +LLM tool calling adopted JSON because: +- OpenAI's function calling API used JSON schemas. +- Everyone copied the leader — brittle but "standard". +- Prompt engineers learned to coerce models into valid JSON with endless instructions ("always output valid JSON, no trailing commas..."). + +Result: Massive prompt bloat, hallucinated formats, post-processing parsers, and constant fixes for escaping/order issues. + +## What Makes It the Wrong Format + +JSON lacks: +- **Namespaces** — no way to mix vocabularies safely. +- **Schemas as first-class contracts** — JSON Schema is optional, lossy, and rarely enforced on wire. +- **Canonicalization** — no standard way to normalize for signing/comparison. +- **Comments** — forbidden, forcing side channels. +- **Mixed content** — text + structure fragile. + +It's order-sensitive (objects) but unordered by spec, leading to bugs. + +## All the (Really) Bad Things That JSON Brings + +- **Escaping hell**: Strings with quotes/newlines/tabs require manual escaping; easy to break. +- **No validation on wire**: Servers trust client JSON — injection attacks common. +- **Order ambiguity**: Objects unordered, arrays ordered — inconsistent, parsing surprises. +- **No self-description**: Types inferred, no built-in schema reference. +- **Brittle AI wiring**: LLMs hallucinate invalid JSON (trailing commas, wrong types) → endless retry loops or custom parsers. +- **Drift prone**: No standard evolution path — schemas change, old clients break silently. + +## Red Team Horror Stories from JSON in AI + +JSON's flaws turn into nightmares when LLMs are involved — brittle formats meet hallucination-prone models: + +- **Hallucinated Invalid JSON**: LLMs routinely add trailing commas, forget quotes, or nest wrong — forcing massive prompt bloat ("You MUST output valid JSON, no trailing commas EVER") and post-processing parsers. One missing brace → entire tool call dropped, reasoning loop broken. +- **Escaping Injection Hell**: User input with quotes/newlines in strings? Escaping fails → malformed JSON → parser crashes or silent data loss. Red teams exploit this for prompt injection (embed control sequences in "safe" strings). +- **Order & Type Ambiguity**: Objects unordered → tools break on key order assumptions. Numbers as strings → type confusion attacks. Arrays of mixed types → validation impossible without custom code. +- **No Tolerance for Dirt**: LLM streams comments or extra text? JSON parsers choke. No repair — whole response rejected, forcing retries and token waste. +- **Real-World Breaks**: Early OpenAI function calling — endless "invalid JSON" errors until prompts became novels. Projects add custom "JSON repair" libraries — admitting the format's fragility. + +XML + XSD + repair avoids this entirely: exact contracts enforced on wire, dirty streams tolerated via dummy extraction, no escaping quagmires. The organism stays sovereign. + +JSON lost the AI war before it started. +XML (with XSD + C14N) solves these: exact contracts, namespaces, repair tolerance, signing, comments if needed. It's heavier on disk but sovereign on wire — perfect for a bounded organism where security and auditability matter more than minimal bytes. + +JSON won the web. XML wins the swarm. Permanently. \ No newline at end of file