added rant
This commit is contained in:
parent
9a3e7df851
commit
a1ad8b843f
3 changed files with 76 additions and 15 deletions
10
README.md
10
README.md
|
|
@ -4,6 +4,16 @@
|
|||
**January 03, 2026**
|
||||
**Architecture: Autonomous Schema-Driven, Turing-Complete Multi-Agent Organism**
|
||||
|
||||
## The Rant
|
||||
**Why XML?**
|
||||
[Why not JSON?](docs/why-not-json.md)
|
||||
|
||||
XML is the sovereign wire format — standards-based, self-describing, attack-resistant, and evolvable without drift. JSON was a quick hack that escaped into the wild and became the default for everything, including AI tool calling, where its brittleness causes endless prompt surgery and validation headaches.
|
||||
|
||||
This project chooses XML deliberately. The organism enforces contracts exactly (XSD validation, no transcription bugs), tolerates dirty streams (repair + dummy extraction), and keeps reasoning visible. No fragile conventions. No escaping hell. Just bounded, auditable computation.
|
||||
|
||||
Read the full rant [here](docs/why-not-json.md) for the history, pitfalls, and why XML wins permanently.
|
||||
|
||||
## What It Is
|
||||
AgentServer is a production-ready substrate for the `xml-pipeline` nervous system. Version 2.0 stabilizes the design around exact XSD validation, typed dataclass handlers, mandatory hierarchical threading, and strict out-of-band privileged control.
|
||||
|
||||
|
|
|
|||
|
|
@ -1,15 +0,0 @@
|
|||
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
|
||||
targetNamespace="http://xml-pipeline.org/message"
|
||||
xmlns="http://xml-pipeline.org/message"
|
||||
elementFormDefault="qualified">
|
||||
|
||||
<xs:element name="grok-response">
|
||||
<xs:complexType>
|
||||
<xs:simpleContent>
|
||||
<xs:extension base="xs:string">
|
||||
<xs:attribute name="convo_id" type="xs:string" use="required"/>
|
||||
</xs:extension>
|
||||
</xs:simpleContent>
|
||||
</xs:complexType>
|
||||
</xs:element>
|
||||
</xs:schema>
|
||||
66
docs/why-not-json.md
Normal file
66
docs/why-not-json.md
Normal file
|
|
@ -0,0 +1,66 @@
|
|||
# So Why XML and Not JSON?
|
||||
|
||||
XML is the right format for a sovereign, attack-resistant message bus in a multi-agent organism. JSON is not — it's a lightweight data interchange hack that exploded in popularity but carries fundamental flaws that make it brittle, insecure, and unsuitable for precise, evolvable contracts.
|
||||
|
||||
This project uses Exclusive C14N XML on the wire, XSD for exact validation, and repair for tolerance. The result: no drift, no escaping nightmares, no hidden parsing ambiguities. Contracts are enforced by standards, not convention.
|
||||
|
||||
## Where Did JSON Come From?
|
||||
|
||||
JSON (JavaScript Object Notation) was invented in the early 2000s by Douglas Crockford as a subset of JavaScript literal syntax for simple data exchange in web browsers. It was never designed as a general-purpose format — just a quick way to serialize objects for Ajax calls without eval() dangers.
|
||||
|
||||
It started as "fat-free XML" for dynamic web pages, stripping tags for brevity.
|
||||
|
||||
## What Made It Popular?
|
||||
|
||||
- **Simplicity for JS devs**: No schema, human-readable, easy to generate/parse in browsers.
|
||||
- **Web API boom**: REST APIs adopted it over XML (SOAP was verbose/bloated).
|
||||
- **Ecosystem inertia**: Libraries everywhere, low barrier for startups.
|
||||
- **Perceived lightness**: Smaller payloads than verbose XML.
|
||||
|
||||
It spread because it was "good enough" for stateless HTTP requests in the 2010s web era.
|
||||
|
||||
## How It Carried Over to AI
|
||||
|
||||
LLM tool calling adopted JSON because:
|
||||
- OpenAI's function calling API used JSON schemas.
|
||||
- Everyone copied the leader — brittle but "standard".
|
||||
- Prompt engineers learned to coerce models into valid JSON with endless instructions ("always output valid JSON, no trailing commas...").
|
||||
|
||||
Result: Massive prompt bloat, hallucinated formats, post-processing parsers, and constant fixes for escaping/order issues.
|
||||
|
||||
## What Makes It the Wrong Format
|
||||
|
||||
JSON lacks:
|
||||
- **Namespaces** — no way to mix vocabularies safely.
|
||||
- **Schemas as first-class contracts** — JSON Schema is optional, lossy, and rarely enforced on wire.
|
||||
- **Canonicalization** — no standard way to normalize for signing/comparison.
|
||||
- **Comments** — forbidden, forcing side channels.
|
||||
- **Mixed content** — text + structure fragile.
|
||||
|
||||
It's order-sensitive (objects) but unordered by spec, leading to bugs.
|
||||
|
||||
## All the (Really) Bad Things That JSON Brings
|
||||
|
||||
- **Escaping hell**: Strings with quotes/newlines/tabs require manual escaping; easy to break.
|
||||
- **No validation on wire**: Servers trust client JSON — injection attacks common.
|
||||
- **Order ambiguity**: Objects unordered, arrays ordered — inconsistent, parsing surprises.
|
||||
- **No self-description**: Types inferred, no built-in schema reference.
|
||||
- **Brittle AI wiring**: LLMs hallucinate invalid JSON (trailing commas, wrong types) → endless retry loops or custom parsers.
|
||||
- **Drift prone**: No standard evolution path — schemas change, old clients break silently.
|
||||
|
||||
## Red Team Horror Stories from JSON in AI
|
||||
|
||||
JSON's flaws turn into nightmares when LLMs are involved — brittle formats meet hallucination-prone models:
|
||||
|
||||
- **Hallucinated Invalid JSON**: LLMs routinely add trailing commas, forget quotes, or nest wrong — forcing massive prompt bloat ("You MUST output valid JSON, no trailing commas EVER") and post-processing parsers. One missing brace → entire tool call dropped, reasoning loop broken.
|
||||
- **Escaping Injection Hell**: User input with quotes/newlines in strings? Escaping fails → malformed JSON → parser crashes or silent data loss. Red teams exploit this for prompt injection (embed control sequences in "safe" strings).
|
||||
- **Order & Type Ambiguity**: Objects unordered → tools break on key order assumptions. Numbers as strings → type confusion attacks. Arrays of mixed types → validation impossible without custom code.
|
||||
- **No Tolerance for Dirt**: LLM streams comments or extra text? JSON parsers choke. No repair — whole response rejected, forcing retries and token waste.
|
||||
- **Real-World Breaks**: Early OpenAI function calling — endless "invalid JSON" errors until prompts became novels. Projects add custom "JSON repair" libraries — admitting the format's fragility.
|
||||
|
||||
XML + XSD + repair avoids this entirely: exact contracts enforced on wire, dirty streams tolerated via dummy extraction, no escaping quagmires. The organism stays sovereign.
|
||||
|
||||
JSON lost the AI war before it started.
|
||||
XML (with XSD + C14N) solves these: exact contracts, namespaces, repair tolerance, signing, comments if needed. It's heavier on disk but sovereign on wire — perfect for a bounded organism where security and auditability matter more than minimal bytes.
|
||||
|
||||
JSON won the web. XML wins the swarm. Permanently.
|
||||
Loading…
Reference in a new issue