Three workstreams implemented: W1 (Repo Split): Remove proprietary BloxServer files and docs, update pyproject.toml URLs to public GitHub, clean doc references, add CI workflow (.github/workflows/ci.yml) and CONTRIBUTING.md. W2 (Message Journal): Add DispatchHook protocol for dispatch lifecycle events, SQLite-backed MessageJournal with WAL mode for certified-mail delivery guarantees (PENDING→DISPATCHED→ACKED/FAILED), integrate hooks into StreamPump._dispatch_to_handlers(), add journal REST endpoints, and aiosqlite dependency. W3 (Hot Deployment): Add RestartOrchestrator for graceful restart with queue drain and journal stats collection, SIGHUP signal handler in CLI, POST /organism/restart endpoint, restart-aware app lifespan with journal recovery on boot, and os.execv/subprocess re-exec for Unix/Windows. All 439 tests pass (37 new tests for W2/W3). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
513 lines
20 KiB
Markdown
513 lines
20 KiB
Markdown
# Librarian Architecture — RLM-Powered Document Intelligence
|
|
|
|
**Status:** Design
|
|
**Date:** January 2026
|
|
|
|
## Overview
|
|
|
|
The Librarian is an agent that ingests, indexes, and queries large document collections using the **Recursive Language Model (RLM)** pattern. It can handle codebases, documentation, and structured data at scales far beyond LLM context windows (10M+ tokens).
|
|
|
|
Key insight from [MIT RLM research](https://arxiv.org/abs/...): Long contexts should be loaded as **variables in a REPL environment**, not fed directly to the neural network. The LLM writes code to examine, decompose, and recursively query chunks.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ RLM-Powered Librarian │
|
|
│ │
|
|
│ ┌───────────────────────────────────────────────────────────┐ │
|
|
│ │ Ingestion Pipeline │ │
|
|
│ │ │ │
|
|
│ │ Source → Detect Type → Select Chunker → Index → Store │ │
|
|
│ └───────────────────────────────────────────────────────────┘ │
|
|
│ │
|
|
│ ┌───────────────────────────────────────────────────────────┐ │
|
|
│ │ Query Engine (RLM Pattern) │ │
|
|
│ │ │ │
|
|
│ │ Query → Search → Filter → Recursive Sub-Query → Answer │ │
|
|
│ └───────────────────────────────────────────────────────────┘ │
|
|
│ │
|
|
│ ┌───────────────────────────────────────────────────────────┐ │
|
|
│ │ Storage Layer │ │
|
|
│ │ │ │
|
|
│ │ eXist-db (XML) + Vector Embeddings + Dependency Graph │ │
|
|
│ └───────────────────────────────────────────────────────────┘ │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
## The RLM Pattern
|
|
|
|
Traditional LLM usage stuffs entire documents into the prompt. This fails at scale:
|
|
- Context windows have hard limits (128K-1M tokens)
|
|
- Performance degrades with context length ("context rot")
|
|
- Cost scales linearly with input size
|
|
|
|
**RLM approach:**
|
|
|
|
1. **Load as Variable**: Documents become references, not inline content
|
|
2. **Programmatic Access**: LLM writes code to peek into chunks
|
|
3. **Recursive Sub-Queries**: `llm_query(chunk, question)` for focused analysis
|
|
4. **Aggregation**: Combine sub-query results into final answer
|
|
|
|
```python
|
|
# RLM-style pseudocode
|
|
async def handle_query(query: str, codebase: CodebaseRef):
|
|
# 1. Search index for relevant chunks (not full content)
|
|
hits = await search_index(codebase, query)
|
|
|
|
# 2. Filter if too many results
|
|
if len(hits) > 10:
|
|
hits = await llm_filter(hits, query) # LLM picks most relevant
|
|
|
|
# 3. Recursive sub-queries on each chunk
|
|
findings = []
|
|
for hit in hits:
|
|
chunk = await load_chunk(hit)
|
|
result = await llm_query(
|
|
f"Analyze this for: {query}\n\n{chunk}"
|
|
)
|
|
findings.append(result)
|
|
|
|
# 4. Aggregate into final answer
|
|
return await llm_synthesize(findings, query)
|
|
```
|
|
|
|
## Hybrid Chunking Architecture
|
|
|
|
Chunking is domain-specific. A C++ class should stay together; a legal clause shouldn't be split mid-sentence. We use a hybrid approach:
|
|
|
|
### Built-in Chunkers (Fast Path)
|
|
|
|
| Chunker | File Types | Strategy | Implementation |
|
|
|---------|------------|----------|----------------|
|
|
| **Code** | .c, .cpp, .py, .js, .rs, ... | AST-aware splitting | tree-sitter |
|
|
| **Markdown/Docs** | .md, .rst, .txt | Heading hierarchy | Custom parser |
|
|
| **Structured Data** | .json, .xml, .yaml | Schema-aware | lxml + json |
|
|
| **Plain Text** | emails, logs, notes | Semantic paragraphs | Sentence boundaries |
|
|
|
|
These cover ~90% of use cases with optimized, predictable behavior.
|
|
|
|
### WASM Factory (Fallback for Unknown Types)
|
|
|
|
For novel formats, the AI generates a custom chunker:
|
|
|
|
```
|
|
User uploads proprietary format
|
|
│
|
|
▼
|
|
┌───────────────────────────────────────────────────────────┐
|
|
│ Step 1: Sample Analysis │
|
|
│ │
|
|
│ AI examines sample files: │
|
|
│ - Structure patterns │
|
|
│ - Record boundaries │
|
|
│ - Semantic units │
|
|
└───────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌───────────────────────────────────────────────────────────┐
|
|
│ Step 2: Generate Chunker (Rust → WASM) │
|
|
│ │
|
|
│ AI writes Rust code implementing the chunker interface │
|
|
└───────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌───────────────────────────────────────────────────────────┐
|
|
│ Step 3: Compile & Validate │
|
|
│ │
|
|
│ cargo build --target wasm32-wasi │
|
|
│ Test on sample files │
|
|
│ AI reviews output quality │
|
|
└───────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌───────────────────────────────────────────────────────────┐
|
|
│ Step 4: Deploy │
|
|
│ │
|
|
│ Store in user's WASM modules │
|
|
│ Optional: publish to marketplace │
|
|
└───────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### WASM Chunker Interface (WIT)
|
|
|
|
```wit
|
|
// chunker.wit
|
|
interface chunker {
|
|
record chunk {
|
|
id: string,
|
|
content: string,
|
|
metadata: list<tuple<string, string>>,
|
|
parent-id: option<string>,
|
|
children: list<string>,
|
|
}
|
|
|
|
record chunker-config {
|
|
file-type: string,
|
|
max-chunk-size: u32,
|
|
preserve-context: bool,
|
|
custom-params: list<tuple<string, string>>,
|
|
}
|
|
|
|
// Analyze sample data, return chunking config
|
|
analyze: func(sample: string, file-type: string) -> chunker-config
|
|
|
|
// Chunk a file using the config
|
|
chunk-file: func(content: string, config: chunker-config) -> list<chunk>
|
|
}
|
|
```
|
|
|
|
## Ingestion Pipeline
|
|
|
|
### Step 1: Source Acquisition
|
|
|
|
```python
|
|
@dataclass
|
|
class IngestionSource:
|
|
type: Literal["git", "upload", "url", "s3"]
|
|
location: str
|
|
filter: str | None = None # e.g., "*.cpp", "docs/**/*.md"
|
|
```
|
|
|
|
Supported sources:
|
|
- **Git repository**: Clone and track branches
|
|
- **File upload**: Direct upload via UI
|
|
- **URL**: Fetch remote documents
|
|
- **S3/Cloud storage**: Enterprise integrations
|
|
|
|
### Step 2: Type Detection
|
|
|
|
```python
|
|
def detect_type(file_path: str, content: bytes) -> FileType:
|
|
# 1. Check extension
|
|
ext = Path(file_path).suffix.lower()
|
|
if ext in CODE_EXTENSIONS:
|
|
return FileType.CODE
|
|
|
|
# 2. Check magic bytes
|
|
if content.startswith(b'%PDF'):
|
|
return FileType.PDF
|
|
|
|
# 3. Content analysis
|
|
if looks_like_markdown(content):
|
|
return FileType.MARKDOWN
|
|
|
|
return FileType.PLAIN_TEXT
|
|
```
|
|
|
|
### Step 3: Chunking
|
|
|
|
```python
|
|
def select_chunker(file_type: FileType, user_config: ChunkerConfig) -> Chunker:
|
|
# User override
|
|
if user_config.custom_wasm:
|
|
return WasmChunker(user_config.custom_wasm)
|
|
|
|
# Built-in chunkers
|
|
match file_type:
|
|
case FileType.CODE:
|
|
return TreeSitterChunker(language=detect_language(file_type))
|
|
case FileType.MARKDOWN:
|
|
return MarkdownChunker()
|
|
case FileType.JSON | FileType.XML | FileType.YAML:
|
|
return StructuredDataChunker()
|
|
case _:
|
|
return PlainTextChunker()
|
|
```
|
|
|
|
### Step 4: Indexing
|
|
|
|
Each chunk is indexed in multiple ways:
|
|
|
|
| Index Type | Purpose | Implementation |
|
|
|------------|---------|----------------|
|
|
| **Full-text** | Keyword search | eXist-db Lucene |
|
|
| **Vector** | Semantic similarity | Embeddings (OpenAI/local) |
|
|
| **Graph** | Relationships | Class hierarchy, imports, references |
|
|
| **Metadata** | Filtering | File path, type, timestamp |
|
|
|
|
### Step 5: Storage
|
|
|
|
```xml
|
|
<!-- Chunk stored in eXist-db -->
|
|
<chunk xmlns="https://xml-pipeline.org/ns/librarian/v1">
|
|
<id>opencascade:BRepBuilderAPI_MakeEdge:constructor_1</id>
|
|
<source>
|
|
<repo>opencascade</repo>
|
|
<path>src/BRepBuilderAPI/BRepBuilderAPI_MakeEdge.cxx</path>
|
|
<lines start="42" end="87"/>
|
|
</source>
|
|
<type>function</type>
|
|
<metadata>
|
|
<class>BRepBuilderAPI_MakeEdge</class>
|
|
<visibility>public</visibility>
|
|
<params>const TopoDS_Vertex&, const TopoDS_Vertex&</params>
|
|
</metadata>
|
|
<content><![CDATA[
|
|
BRepBuilderAPI_MakeEdge::BRepBuilderAPI_MakeEdge(
|
|
const TopoDS_Vertex& V1,
|
|
const TopoDS_Vertex& V2)
|
|
{
|
|
// ... implementation
|
|
}
|
|
]]></content>
|
|
<embedding>[0.023, -0.041, 0.089, ...]</embedding>
|
|
</chunk>
|
|
```
|
|
|
|
## Query Engine
|
|
|
|
### Query Flow
|
|
|
|
```
|
|
User: "How does BRepBuilderAPI_MakeEdge handle degenerate curves?"
|
|
│
|
|
▼
|
|
┌───────────────────────────────────────────────────────────┐
|
|
│ Step 1: Search │
|
|
│ │
|
|
│ - Vector search: find semantically similar chunks │
|
|
│ - Keyword search: "BRepBuilderAPI_MakeEdge" + "degenerate"│
|
|
│ - Graph traversal: class hierarchy, method calls │
|
|
│ │
|
|
│ Result: 47 potentially relevant chunks │
|
|
└───────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌───────────────────────────────────────────────────────────┐
|
|
│ Step 2: Filter (LLM-assisted) │
|
|
│ │
|
|
│ Too many chunks for direct analysis. │
|
|
│ LLM reviews summaries, picks top 8 most relevant. │
|
|
│ │
|
|
│ Selected: │
|
|
│ - BRepBuilderAPI_MakeEdge constructors (3 chunks) │
|
|
│ - Edge validation methods (2 chunks) │
|
|
│ - Degenerate curve handling (2 chunks) │
|
|
│ - Error reporting (1 chunk) │
|
|
└───────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌───────────────────────────────────────────────────────────┐
|
|
│ Step 3: Recursive Sub-Queries │
|
|
│ │
|
|
│ For each chunk, focused LLM query: │
|
|
│ │
|
|
│ llm_query(chunk_1, "How does this handle degenerate...") │
|
|
│ llm_query(chunk_2, "What validation happens here...") │
|
|
│ llm_query(chunk_3, "What errors are raised for...") │
|
|
│ ... │
|
|
│ │
|
|
│ 8 parallel sub-queries → 8 focused findings │
|
|
└───────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌───────────────────────────────────────────────────────────┐
|
|
│ Step 4: Synthesize │
|
|
│ │
|
|
│ LLM combines findings into coherent answer: │
|
|
│ │
|
|
│ "BRepBuilderAPI_MakeEdge handles degenerate curves by: │
|
|
│ 1. Checking curve bounds in the constructor... │
|
|
│ 2. Calling BRepCheck_Edge for validation... │
|
|
│ 3. Setting myError to BRepBuilderAPI_CurveTooSmall..." │
|
|
└───────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Handler Implementation
|
|
|
|
```python
|
|
@xmlify
|
|
@dataclass
|
|
class LibrarianQuery:
|
|
"""Query the librarian for information."""
|
|
collection: str # Which indexed collection
|
|
question: str # Natural language question
|
|
max_chunks: int = 10 # Limit for recursive queries
|
|
include_sources: bool = True
|
|
|
|
@xmlify
|
|
@dataclass
|
|
class LibrarianResponse:
|
|
"""Response from librarian with sources."""
|
|
answer: str
|
|
sources: list[SourceReference]
|
|
confidence: float
|
|
|
|
async def handle_librarian_query(
|
|
payload: LibrarianQuery,
|
|
metadata: HandlerMetadata
|
|
) -> HandlerResponse:
|
|
"""RLM-style query handler."""
|
|
|
|
# 1. Search for relevant chunks
|
|
hits = await search_collection(
|
|
payload.collection,
|
|
payload.question,
|
|
limit=50 # Cast wide net
|
|
)
|
|
|
|
# 2. Filter if needed
|
|
if len(hits) > payload.max_chunks:
|
|
hits = await llm_filter_chunks(
|
|
hits,
|
|
payload.question,
|
|
limit=payload.max_chunks
|
|
)
|
|
|
|
# 3. Recursive sub-queries
|
|
findings = await asyncio.gather(*[
|
|
llm_analyze_chunk(chunk, payload.question)
|
|
for chunk in hits
|
|
])
|
|
|
|
# 4. Synthesize answer
|
|
answer = await llm_synthesize(findings, payload.question)
|
|
|
|
# 5. Build response
|
|
sources = [
|
|
SourceReference(
|
|
path=hit.source_path,
|
|
lines=(hit.start_line, hit.end_line),
|
|
relevance=hit.score
|
|
)
|
|
for hit in hits
|
|
]
|
|
|
|
return HandlerResponse.respond(
|
|
payload=LibrarianResponse(
|
|
answer=answer,
|
|
sources=sources if payload.include_sources else [],
|
|
confidence=calculate_confidence(findings)
|
|
)
|
|
)
|
|
```
|
|
|
|
## Storage Layer
|
|
|
|
### eXist-db (Primary Store)
|
|
|
|
XML-native database for chunk storage and XQuery retrieval.
|
|
|
|
**Why eXist-db:**
|
|
- Native XQuery for complex queries
|
|
- Full-text search with Lucene
|
|
- XML validation against schemas
|
|
- Transactional updates
|
|
|
|
**Collections structure:**
|
|
```
|
|
/db/librarian/
|
|
├── collections/
|
|
│ ├── {user_id}/
|
|
│ │ ├── {collection_id}/
|
|
│ │ │ ├── metadata.xml
|
|
│ │ │ ├── chunks/
|
|
│ │ │ │ ├── chunk_001.xml
|
|
│ │ │ │ ├── chunk_002.xml
|
|
│ │ │ │ └── ...
|
|
│ │ │ └── index/
|
|
│ │ │ └── embeddings.bin
|
|
```
|
|
|
|
### Vector Embeddings
|
|
|
|
For semantic search, chunks are embedded using:
|
|
- OpenAI `text-embedding-3-small` (cloud)
|
|
- Sentence Transformers (local/self-hosted)
|
|
|
|
Embeddings stored alongside chunks or in dedicated vector DB (Qdrant/Pinecone for scale).
|
|
|
|
### Dependency Graph
|
|
|
|
For code collections, track relationships:
|
|
- **Class hierarchy**: inheritance, interfaces
|
|
- **Imports**: file dependencies
|
|
- **Call graph**: function → function references
|
|
|
|
Stored in eXist-db as XML or external graph DB for complex traversals.
|
|
|
|
## Configuration
|
|
|
|
### organism.yaml
|
|
|
|
```yaml
|
|
listeners:
|
|
- name: librarian
|
|
handler: xml_pipeline.tools.librarian.handle_librarian_query
|
|
payload_class: xml_pipeline.tools.librarian.LibrarianQuery
|
|
description: Query indexed document collections
|
|
agent: true
|
|
peers: [] # Terminal handler
|
|
config:
|
|
exist_db:
|
|
url: "http://localhost:8080/exist"
|
|
user_env: EXIST_USER
|
|
password_env: EXIST_PASSWORD
|
|
embeddings:
|
|
provider: openai # or "local"
|
|
model: text-embedding-3-small
|
|
chunkers:
|
|
code:
|
|
max_chunk_size: 2000
|
|
overlap: 200
|
|
markdown:
|
|
split_on_headings: true
|
|
min_heading_level: 2
|
|
```
|
|
|
|
### Ingestion API
|
|
|
|
```python
|
|
# Ingest a git repository
|
|
await librarian.ingest(
|
|
source=GitSource(
|
|
url="https://github.com/Open-Cascade-SAS/OCCT",
|
|
branch="master",
|
|
filter="src/**/*.cxx"
|
|
),
|
|
collection="opencascade",
|
|
chunker_config=CodeChunkerConfig(
|
|
language="cpp",
|
|
max_chunk_size=2000
|
|
)
|
|
)
|
|
|
|
# Query the collection
|
|
response = await librarian.query(
|
|
collection="opencascade",
|
|
question="How does BRepBuilderAPI_MakeEdge handle curves?"
|
|
)
|
|
```
|
|
|
|
## Scaling Considerations
|
|
|
|
| Scale | Storage | Search | Compute |
|
|
|-------|---------|--------|---------|
|
|
| Small (<10K chunks) | eXist-db local | In-DB Lucene | Single node |
|
|
| Medium (10K-1M) | eXist-db cluster | + Vector DB | Multi-worker |
|
|
| Large (1M+) | Sharded storage | Distributed search | GPU embeddings |
|
|
|
|
## Security
|
|
|
|
- **Collection isolation**: Users can only query their own collections
|
|
- **WASM sandbox**: Custom chunkers run in isolated WASM runtime
|
|
- **Rate limiting**: Prevent abuse of recursive queries
|
|
- **Audit logging**: Track all queries for compliance
|
|
|
|
## Future Enhancements
|
|
|
|
1. **Incremental updates**: Re-index only changed files
|
|
2. **Cross-collection queries**: Search across multiple codebases
|
|
3. **Collaborative collections**: Shared team libraries
|
|
4. **Query caching**: Cache common sub-queries
|
|
5. **Streaming ingestion**: Real-time updates from git webhooks
|
|
|
|
---
|
|
|
|
## References
|
|
|
|
- [Recursive Language Models (MIT)](docs/mit-paper.pdf) — Foundational research on RLM pattern
|
|
- [tree-sitter](https://tree-sitter.github.io/) — AST-aware code parsing
|
|
- [eXist-db](http://exist-db.org/) — XML-native database
|
|
- [Core Principles](core-principles-v2.1.md) — Architecture overview
|