Add BloxServer API scaffold + architecture docs

BloxServer API (FastAPI + SQLAlchemy async): - Database models: users, flows, triggers, executions, usage tracking - Clerk JWT auth with dev mode bypass for local testing - SQLite support for local dev, PostgreSQL for production - CRUD routes for flows, triggers, executions - Public webhook endpoint with token auth - Health/readiness endpoints - Pydantic schemas with camelCase aliases for frontend - Docker + docker-compose setup Architecture documentation: - Librarian architecture with RLM-powered query engine - Stripe billing integration (usage-based, trials, webhooks) - LLM abstraction layer (rate limiting, semantic cache, failover) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 22:04:25 -08:00 · 2026-01-22 22:04:25 -08:00 · a5c00c1e90
commit a5c00c1e90
parent d184d22c60
23 changed files with 4681 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -33,3 +33,7 @@ xml_pipeline/config/*.signed.xml
 # OS
 Thumbs.db
 .DS_Store
 # BloxServer local dev
 bloxserver.db
 bloxserver/.env
--- a/bloxserver/.env.example
+++ b/bloxserver/.env.example
@ -0,0 +1,54 @@
 # BloxServer API Environment Variables
 # Copy this file to .env and fill in the values
 # =============================================================================
 # Environment
 # =============================================================================
 ENV=development
 # ENV=production
 # =============================================================================
 # Database (PostgreSQL)
 # =============================================================================
 DATABASE_URL=postgresql+asyncpg://postgres:postgres@localhost:5432/bloxserver
 # Set to true to auto-create tables on startup (disable in production)
 AUTO_CREATE_TABLES=true
 # =============================================================================
 # Clerk Authentication
 # =============================================================================
 CLERK_ISSUER=https://your-clerk-instance.clerk.accounts.dev
 CLERK_AUDIENCE=your-clerk-audience
 # =============================================================================
 # Stripe Billing
 # =============================================================================
 STRIPE_SECRET_KEY=sk_test_...
 STRIPE_WEBHOOK_SECRET=whsec_...
 # =============================================================================
 # API Key Encryption
 # =============================================================================
 # Generate with: python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
 API_KEY_ENCRYPTION_KEY=your-fernet-key-here
 # =============================================================================
 # CORS
 # =============================================================================
 CORS_ORIGINS=http://localhost:3000,https://app.openblox.ai
 # =============================================================================
 # Webhooks
 # =============================================================================
 WEBHOOK_BASE_URL=https://api.openblox.ai/webhooks
 # =============================================================================
 # Redis (optional, for caching/rate limiting)
 # =============================================================================
 # REDIS_URL=redis://localhost:6379
 # =============================================================================
 # Docs
 # =============================================================================
 ENABLE_DOCS=true
--- a/bloxserver/Dockerfile
+++ b/bloxserver/Dockerfile
@ -0,0 +1,58 @@
 # BloxServer API Dockerfile
 # Multi-stage build for smaller production image
 # =============================================================================
 # Build stage
 # =============================================================================
 FROM python:3.12-slim as builder
 WORKDIR /app
 # Install build dependencies
 RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    && rm -rf /var/lib/apt/lists/*
 # Copy requirements first for layer caching
 COPY requirements.txt .
 RUN pip wheel --no-cache-dir --wheel-dir /app/wheels -r requirements.txt
 # =============================================================================
 # Production stage
 # =============================================================================
 FROM python:3.12-slim as production
 WORKDIR /app
 # Create non-root user
 RUN groupadd --gid 1000 bloxserver \
    && useradd --uid 1000 --gid bloxserver --shell /bin/bash --create-home bloxserver
 # Install runtime dependencies
 RUN apt-get update && apt-get install -y --no-install-recommends \
    curl \
    && rm -rf /var/lib/apt/lists/*
 # Copy wheels from builder and install
 COPY --from=builder /app/wheels /wheels
 RUN pip install --no-cache-dir /wheels/* && rm -rf /wheels
 # Copy application code
 COPY --chown=bloxserver:bloxserver . /app/bloxserver
 # Set Python path
 ENV PYTHONPATH=/app
 ENV PYTHONUNBUFFERED=1
 # Switch to non-root user
 USER bloxserver
 # Health check
 HEALTHCHECK --interval=30s --timeout=5s --start-period=5s --retries=3 \
    CMD curl -f http://localhost:8000/health/live || exit 1
 # Expose port
 EXPOSE 8000
 # Run with uvicorn
 CMD ["uvicorn", "bloxserver.api.main:app", "--host", "0.0.0.0", "--port", "8000"]
--- a/bloxserver/README.md
+++ b/bloxserver/README.md
@ -0,0 +1,203 @@
 # BloxServer API
 Backend API for BloxServer (OpenBlox.ai) - Visual AI Agent Workflow Builder.
 ## Quick Start
 ### With Docker Compose (Recommended)
 ```bash
 cd bloxserver
 # Start PostgreSQL, Redis, and API
 docker-compose up -d
 # Check logs
 docker-compose logs -f api
 # API available at http://localhost:8000
 # Docs at http://localhost:8000/docs
 ```
 ### Local Development
 ```bash
 cd bloxserver
 # Create virtual environment
 python -m venv .venv
 source .venv/bin/activate  # Linux/macOS
 # .venv\Scripts\activate   # Windows
 # Install dependencies
 pip install -r requirements.txt
 # Copy environment variables
 cp .env.example .env
 # Edit .env with your settings
 # Start PostgreSQL and Redis (or use Docker)
 docker-compose up -d postgres redis
 # Run the API
 python -m bloxserver.api.main
 # Or with uvicorn directly:
 uvicorn bloxserver.api.main:app --reload
 ```
 ## API Endpoints
 ### Health
 - `GET /health` - Basic health check
 - `GET /health/ready` - Readiness check (includes DB)
 - `GET /health/live` - Liveness check
 ### Flows
 - `GET /api/v1/flows` - List flows
 - `POST /api/v1/flows` - Create flow
 - `GET /api/v1/flows/{id}` - Get flow
 - `PATCH /api/v1/flows/{id}` - Update flow
 - `DELETE /api/v1/flows/{id}` - Delete flow
 - `POST /api/v1/flows/{id}/start` - Start flow
 - `POST /api/v1/flows/{id}/stop` - Stop flow
 ### Triggers
 - `GET /api/v1/flows/{flow_id}/triggers` - List triggers
 - `POST /api/v1/flows/{flow_id}/triggers` - Create trigger
 - `GET /api/v1/flows/{flow_id}/triggers/{id}` - Get trigger
 - `DELETE /api/v1/flows/{flow_id}/triggers/{id}` - Delete trigger
 - `POST /api/v1/flows/{flow_id}/triggers/{id}/regenerate-token` - Regenerate webhook token
 ### Executions
 - `GET /api/v1/flows/{flow_id}/executions` - List executions
 - `GET /api/v1/flows/{flow_id}/executions/{id}` - Get execution
 - `POST /api/v1/flows/{flow_id}/executions/run` - Manual trigger
 - `GET /api/v1/flows/{flow_id}/executions/stats` - Get stats
 ### Webhooks
 - `POST /webhooks/{token}` - Trigger flow via webhook
 - `GET /webhooks/{token}/test` - Test webhook token
 ## Project Structure
 ```
 bloxserver/
 ├── api/
 │   ├── __init__.py
 │   ├── main.py              # FastAPI app entry point
 │   ├── dependencies.py      # Auth, DB session dependencies
 │   ├── schemas.py           # Pydantic request/response models
 │   ├── models/
 │   │   ├── __init__.py
 │   │   ├── database.py      # SQLAlchemy engine/session
 │   │   └── tables.py        # ORM table definitions
 │   └── routes/
 │       ├── __init__.py
 │       ├── flows.py         # Flow CRUD
 │       ├── triggers.py      # Trigger CRUD
 │       ├── executions.py    # Execution history
 │       ├── webhooks.py      # Webhook handler
 │       └── health.py        # Health checks
 ├── requirements.txt
 ├── Dockerfile
 ├── docker-compose.yml
 ├── .env.example
 └── README.md
 ```
 ## Authentication
 Uses Clerk for JWT authentication. All `/api/v1/*` endpoints require a valid JWT.
 ```bash
 curl -H "Authorization: Bearer <clerk-jwt>" \
     http://localhost:8000/api/v1/flows
 ```
 ## Environment Variables
 See `.env.example` for all configuration options.
 Key variables:
 - `DATABASE_URL` - PostgreSQL connection string
 - `CLERK_ISSUER` - Clerk JWT issuer URL
 - `STRIPE_SECRET_KEY` - Stripe API key
 - `API_KEY_ENCRYPTION_KEY` - Fernet key for encrypting user API keys
 ## Database Migrations
 Using Alembic for migrations (not yet set up):
 ```bash
 # Initialize (first time)
 alembic init alembic
 # Create migration
 alembic revision --autogenerate -m "description"
 # Apply migrations
 alembic upgrade head
 ```
 ## Testing
 ```bash
 # Install test dependencies
 pip install pytest pytest-asyncio httpx
 # Run tests
 pytest tests/ -v
 ```
 ## Deployment
 ### Railway / Render / Fly.io
 1. Connect your repo
 2. Set environment variables
 3. Deploy
 ### Kubernetes
 ```yaml
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: bloxserver-api
 spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: api
        image: your-registry/bloxserver-api:latest
        ports:
        - containerPort: 8000
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: bloxserver-secrets
              key: database-url
        livenessProbe:
          httpGet:
            path: /health/live
            port: 8000
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 8000
 ```
 ## Next Steps
 - [ ] Alembic migrations setup
 - [ ] Stripe webhook handlers
 - [ ] Redis rate limiting
 - [ ] Container orchestration integration
 - [ ] WebSocket for real-time logs
--- a/bloxserver/init.py
+++ b/bloxserver/init.py
@ -0,0 +1,7 @@
 """
 BloxServer - Visual AI Agent Workflow Builder
 SaaS backend for OpenBlox.ai
 """
 __version__ = "0.1.0"
--- a/bloxserver/api/init.py
+++ b/bloxserver/api/init.py
@ -0,0 +1 @@
 """BloxServer API package."""
--- a/bloxserver/api/dependencies.py
+++ b/bloxserver/api/dependencies.py
@ -0,0 +1,236 @@
 """
 FastAPI dependencies for authentication and database access.
 Uses Clerk for JWT validation.
 """
 from __future__ import annotations
 import os
 from typing import Annotated
 from uuid import UUID
 import httpx
 from fastapi import Depends, HTTPException, Request, status
 from fastapi.security import HTTPAuthorizationCredentials, HTTPBearer
 from sqlalchemy import select
 from sqlalchemy.ext.asyncio import AsyncSession
 from bloxserver.api.models.database import get_db
 from bloxserver.api.models.tables import UserRecord
 # Dev mode - skip auth for local testing
 DEV_MODE = os.getenv("ENV", "development") == "development" and not os.getenv("CLERK_ISSUER")
 # Clerk configuration
 CLERK_ISSUER = os.getenv("CLERK_ISSUER", "")
 CLERK_JWKS_URL = f"{CLERK_ISSUER}/.well-known/jwks.json" if CLERK_ISSUER else ""
 # Security scheme
 security = HTTPBearer(auto_error=False)
 # =============================================================================
 # JWT Validation (Clerk)
 # =============================================================================
 async def get_clerk_jwks() -> dict:
    """Fetch Clerk's JWKS for JWT validation."""
    async with httpx.AsyncClient() as client:
        response = await client.get(CLERK_JWKS_URL)
        response.raise_for_status()
        return response.json()
 async def validate_clerk_token(token: str) -> dict:
    """
    Validate a Clerk JWT token and return the payload.
    In production, use a proper JWT library with caching.
    This is a simplified version for the scaffold.
    """
    import jwt
    from jwt import PyJWKClient
    try:
        # Get signing key from Clerk's JWKS
        jwks_client = PyJWKClient(CLERK_JWKS_URL)
        signing_key = jwks_client.get_signing_key_from_jwt(token)
        # Decode and validate
        payload = jwt.decode(
            token,
            signing_key.key,
            algorithms=["RS256"],
            audience=os.getenv("CLERK_AUDIENCE"),
            issuer=CLERK_ISSUER,
        )
        return payload
    except jwt.ExpiredSignatureError:
        raise HTTPException(
            status_code=status.HTTP_401_UNAUTHORIZED,
            detail="Token has expired",
        )
    except jwt.InvalidTokenError as e:
        raise HTTPException(
            status_code=status.HTTP_401_UNAUTHORIZED,
            detail=f"Invalid token: {e}",
        )
 # =============================================================================
 # Current User Dependency
 # =============================================================================
 class CurrentUser:
    """Authenticated user context."""
    def __init__(self, user: UserRecord, clerk_payload: dict):
        self.user = user
        self.clerk_payload = clerk_payload
    @property
    def id(self) -> UUID:
        return self.user.id
    @property
    def clerk_id(self) -> str:
        return self.user.clerk_id
    @property
    def email(self) -> str:
        return self.user.email
    @property
    def tier(self) -> str:
        return self.user.tier.value
 async def get_current_user(
    request: Request,
    credentials: Annotated[HTTPAuthorizationCredentials | None, Depends(security)],
    db: Annotated[AsyncSession, Depends(get_db)],
 ) -> CurrentUser:
    """
    Dependency that validates the JWT and returns the current user.
    Creates the user record if this is their first request (synced from Clerk).
    In DEV_MODE without Clerk configured, returns a test user.
    """
    # Dev mode - create/return a test user without auth
    if DEV_MODE:
        dev_clerk_id = "dev_user_001"
        result = await db.execute(
            select(UserRecord).where(UserRecord.clerk_id == dev_clerk_id)
        )
        user = result.scalar_one_or_none()
        if not user:
            from bloxserver.api.models.tables import Tier
            user = UserRecord(
                clerk_id=dev_clerk_id,
                email="dev@localhost",
                name="Dev User",
                tier=Tier.PRO,  # Give dev user Pro access
            )
            db.add(user)
            await db.flush()
        return CurrentUser(user=user, clerk_payload={"sub": dev_clerk_id, "dev": True})
    # Production mode - require Clerk auth
    if not credentials:
        raise HTTPException(
            status_code=status.HTTP_401_UNAUTHORIZED,
            detail="Missing authentication token",
            headers={"WWW-Authenticate": "Bearer"},
        )
    # Validate JWT
    payload = await validate_clerk_token(credentials.credentials)
    clerk_id = payload.get("sub")
    if not clerk_id:
        raise HTTPException(
            status_code=status.HTTP_401_UNAUTHORIZED,
            detail="Invalid token: missing subject",
        )
    # Look up or create user
    result = await db.execute(
        select(UserRecord).where(UserRecord.clerk_id == clerk_id)
    )
    user = result.scalar_one_or_none()
    if not user:
        # First login - create user record from Clerk data
        user = UserRecord(
            clerk_id=clerk_id,
            email=payload.get("email", f"{clerk_id}@unknown"),
            name=payload.get("name"),
            avatar_url=payload.get("image_url"),
        )
        db.add(user)
        await db.flush()  # Get the ID without committing
    return CurrentUser(user=user, clerk_payload=payload)
 # Type alias for cleaner route signatures
 AuthenticatedUser = Annotated[CurrentUser, Depends(get_current_user)]
 DbSession = Annotated[AsyncSession, Depends(get_db)]
 # =============================================================================
 # Optional Auth (for public endpoints)
 # =============================================================================
 async def get_optional_user(
    request: Request,
    credentials: Annotated[HTTPAuthorizationCredentials | None, Depends(security)],
    db: Annotated[AsyncSession, Depends(get_db)],
 ) -> CurrentUser | None:
    """
    Like get_current_user, but returns None instead of raising if not authenticated.
    """
    if not credentials:
        return None
    try:
        return await get_current_user(request, credentials, db)
    except HTTPException:
        return None
 OptionalUser = Annotated[CurrentUser | None, Depends(get_optional_user)]
 # =============================================================================
 # Tier Checks
 # =============================================================================
 def require_tier(*allowed_tiers: str):
    """
    Dependency factory that requires the user to be on one of the allowed tiers.
    Usage:
        @router.post("/wasm", dependencies=[Depends(require_tier("pro", "enterprise"))])
    """
    async def check_tier(user: AuthenticatedUser) -> None:
        if user.tier not in allowed_tiers:
            raise HTTPException(
                status_code=status.HTTP_403_FORBIDDEN,
                detail=f"This feature requires one of: {', '.join(allowed_tiers)}",
            )
    return check_tier
 RequirePro = Depends(require_tier("pro", "enterprise", "high_frequency"))
 RequireEnterprise = Depends(require_tier("enterprise", "high_frequency"))
--- a/bloxserver/api/main.py
+++ b/bloxserver/api/main.py
@ -0,0 +1,166 @@
 """
 BloxServer API - FastAPI Application
 Main entry point for the BloxServer backend API.
 """
 from __future__ import annotations
 import os
 from contextlib import asynccontextmanager
 from typing import AsyncGenerator
 from fastapi import FastAPI, Request, status
 from fastapi.exceptions import RequestValidationError
 from fastapi.middleware.cors import CORSMiddleware
 from fastapi.responses import JSONResponse
 from bloxserver.api.models.database import init_db
 from bloxserver.api.routes import executions, flows, health, triggers, webhooks
 from bloxserver.api.schemas import ApiError
@asynccontextmanager
 async def lifespan(app: FastAPI) -> AsyncGenerator[None, None]:
    """Application lifespan - startup and shutdown events."""
    # Startup
    print("Starting BloxServer API...")
    # Initialize database tables
    if os.getenv("AUTO_CREATE_TABLES", "true").lower() == "true":
        await init_db()
        print("Database tables initialized")
    yield
    # Shutdown
    print("Shutting down BloxServer API...")
 # Create FastAPI app
 app = FastAPI(
    title="BloxServer API",
    description="Backend API for BloxServer - Visual AI Agent Workflow Builder",
    version="0.1.0",
    lifespan=lifespan,
    docs_url="/docs" if os.getenv("ENABLE_DOCS", "true").lower() == "true" else None,
    redoc_url="/redoc" if os.getenv("ENABLE_DOCS", "true").lower() == "true" else None,
 )
 # =============================================================================
 # CORS Middleware
 # =============================================================================
 # Allowed origins (configure via environment)
 CORS_ORIGINS = os.getenv(
    "CORS_ORIGINS",
    "http://localhost:3000,https://app.openblox.ai",
 ).split(",")
 app.add_middleware(
    CORSMiddleware,
    allow_origins=CORS_ORIGINS,
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
 )
 # =============================================================================
 # Exception Handlers
 # =============================================================================
@app.exception_handler(RequestValidationError)
 async def validation_exception_handler(
    request: Request, exc: RequestValidationError
 ) -> JSONResponse:
    """Convert validation errors to standard API error format."""
    errors = exc.errors()
    details = {
        ".".join(str(loc) for loc in err["loc"]): err["msg"]
        for err in errors
    }
    return JSONResponse(
        status_code=status.HTTP_422_UNPROCESSABLE_ENTITY,
        content=ApiError(
            code="validation_error",
            message="Request validation failed",
            details=details,
        ).model_dump(by_alias=True),
    )
@app.exception_handler(Exception)
 async def general_exception_handler(
    request: Request, exc: Exception
 ) -> JSONResponse:
    """Catch-all exception handler."""
    # In production, don't expose internal errors
    if os.getenv("ENV", "development") == "production":
        return JSONResponse(
            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
            content=ApiError(
                code="internal_error",
                message="An unexpected error occurred",
            ).model_dump(by_alias=True),
        )
    # In development, include error details
    return JSONResponse(
        status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
        content=ApiError(
            code="internal_error",
            message=str(exc),
            details={"type": type(exc).__name__},
        ).model_dump(by_alias=True),
    )
 # =============================================================================
 # Routes
 # =============================================================================
 # Health checks (no auth)
 app.include_router(health.router)
 # Webhook endpoint (token-based auth)
 app.include_router(webhooks.router)
 # Protected API routes
 app.include_router(flows.router, prefix="/api/v1")
 app.include_router(triggers.router, prefix="/api/v1")
 app.include_router(executions.router, prefix="/api/v1")
 # =============================================================================
 # Root endpoint
 # =============================================================================
@app.get("/")
 async def root() -> dict:
    """Root endpoint - API info."""
    return {
        "name": "BloxServer API",
        "version": "0.1.0",
        "docs": "/docs",
        "health": "/health",
    }
 # =============================================================================
 # Run with uvicorn
 # =============================================================================
 if __name__ == "__main__":
    import uvicorn
    uvicorn.run(
        "bloxserver.api.main:app",
        host=os.getenv("HOST", "0.0.0.0"),
        port=int(os.getenv("PORT", "8000")),
        reload=os.getenv("ENV", "development") == "development",
    )
--- a/bloxserver/api/models/init.py
+++ b/bloxserver/api/models/init.py
@ -0,0 +1,23 @@
 """Database and Pydantic models."""
 from bloxserver.api.models.database import Base, get_db, init_db
 from bloxserver.api.models.tables import (
    ExecutionRecord,
    FlowRecord,
    TriggerRecord,
    UserApiKeyRecord,
    UserRecord,
    UsageRecord,
 )
 __all__ = [
    "Base",
    "get_db",
    "init_db",
    "UserRecord",
    "FlowRecord",
    "TriggerRecord",
    "ExecutionRecord",
    "UserApiKeyRecord",
    "UsageRecord",
 ]
--- a/bloxserver/api/models/database.py
+++ b/bloxserver/api/models/database.py
@ -0,0 +1,84 @@
 """
 Database connection and session management.
 Uses SQLAlchemy async with PostgreSQL.
 """
 from __future__ import annotations
 import os
 from collections.abc import AsyncGenerator
 from contextlib import asynccontextmanager
 from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine
 from sqlalchemy.orm import DeclarativeBase
 class Base(DeclarativeBase):
    """Base class for all ORM models."""
    pass
 # Database URL from environment
 # Supports both PostgreSQL and SQLite (for local testing)
 DATABASE_URL = os.getenv(
    "DATABASE_URL",
    "sqlite+aiosqlite:///./bloxserver.db",  # SQLite default for easy local testing
 )
 # Create async engine with appropriate settings
 _is_sqlite = DATABASE_URL.startswith("sqlite")
 if _is_sqlite:
    # SQLite doesn't support pool settings
    engine = create_async_engine(
        DATABASE_URL,
        echo=os.getenv("SQL_ECHO", "false").lower() == "true",
        connect_args={"check_same_thread": False},
    )
 else:
    # PostgreSQL with connection pooling
    engine = create_async_engine(
        DATABASE_URL,
        echo=os.getenv("SQL_ECHO", "false").lower() == "true",
        pool_pre_ping=True,
        pool_size=10,
        max_overflow=20,
    )
 # Session factory
 async_session_maker = async_sessionmaker(
    engine,
    class_=AsyncSession,
    expire_on_commit=False,
 )
 async def init_db() -> None:
    """Create all tables. Call once at startup."""
    async with engine.begin() as conn:
        await conn.run_sync(Base.metadata.create_all)
 async def get_db() -> AsyncGenerator[AsyncSession, None]:
    """Dependency for FastAPI routes. Yields a database session."""
    async with async_session_maker() as session:
        try:
            yield session
            await session.commit()
        except Exception:
            await session.rollback()
            raise
@asynccontextmanager
 async def get_db_context() -> AsyncGenerator[AsyncSession, None]:
    """Context manager for use outside of FastAPI routes."""
    async with async_session_maker() as session:
        try:
            yield session
            await session.commit()
        except Exception:
            await session.rollback()
            raise
--- a/bloxserver/api/models/tables.py
+++ b/bloxserver/api/models/tables.py
@ -0,0 +1,381 @@
 """
 SQLAlchemy ORM models for BloxServer.
 These map to the Pydantic models in schemas.py and TypeScript types in types.ts.
 """
 from __future__ import annotations
 import enum
 from datetime import datetime
 from typing import Any
 from uuid import uuid4
 from sqlalchemy import (
    JSON,
    Boolean,
    DateTime,
    Enum,
    ForeignKey,
    Index,
    Integer,
    LargeBinary,
    Numeric,
    String,
    Text,
    func,
 )
 from sqlalchemy.dialects.postgresql import UUID
 from sqlalchemy.orm import Mapped, mapped_column, relationship
 from bloxserver.api.models.database import Base
 # =============================================================================
 # Enums
 # =============================================================================
 class Tier(str, enum.Enum):
    """User subscription tier."""
    FREE = "free"
    PRO = "pro"
    ENTERPRISE = "enterprise"
    HIGH_FREQUENCY = "high_frequency"
 class BillingStatus(str, enum.Enum):
    """Subscription billing status."""
    ACTIVE = "active"
    TRIALING = "trialing"
    PAST_DUE = "past_due"
    CANCELED = "canceled"
    CANCELING = "canceling"
 class FlowStatus(str, enum.Enum):
    """Flow runtime status."""
    STOPPED = "stopped"
    STARTING = "starting"
    RUNNING = "running"
    STOPPING = "stopping"
    ERROR = "error"
 class TriggerType(str, enum.Enum):
    """How a flow can be triggered."""
    WEBHOOK = "webhook"
    SCHEDULE = "schedule"
    MANUAL = "manual"
 class ExecutionStatus(str, enum.Enum):
    """Status of a flow execution."""
    RUNNING = "running"
    SUCCESS = "success"
    ERROR = "error"
    TIMEOUT = "timeout"
 # =============================================================================
 # Users (synced from Clerk)
 # =============================================================================
 class UserRecord(Base):
    """User account, synced from Clerk."""
    __tablename__ = "users"
    id: Mapped[UUID] = mapped_column(
        UUID(as_uuid=True), primary_key=True, default=uuid4
    )
    clerk_id: Mapped[str] = mapped_column(String(255), unique=True, nullable=False)
    email: Mapped[str] = mapped_column(String(255), nullable=False)
    name: Mapped[str | None] = mapped_column(String(255))
    avatar_url: Mapped[str | None] = mapped_column(Text)
    # Stripe integration
    stripe_customer_id: Mapped[str | None] = mapped_column(String(255), unique=True)
    stripe_subscription_id: Mapped[str | None] = mapped_column(String(255))
    stripe_subscription_item_id: Mapped[str | None] = mapped_column(String(255))
    # Billing state (cached from Stripe)
    tier: Mapped[Tier] = mapped_column(Enum(Tier), default=Tier.FREE)
    billing_status: Mapped[BillingStatus] = mapped_column(
        Enum(BillingStatus), default=BillingStatus.ACTIVE
    )
    trial_ends_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
    current_period_start: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
    current_period_end: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
    # Timestamps
    created_at: Mapped[datetime] = mapped_column(
        DateTime(timezone=True), server_default=func.now()
    )
    updated_at: Mapped[datetime] = mapped_column(
        DateTime(timezone=True), server_default=func.now(), onupdate=func.now()
    )
    # Relationships
    flows: Mapped[list[FlowRecord]] = relationship(back_populates="user", cascade="all, delete-orphan")
    api_keys: Mapped[list[UserApiKeyRecord]] = relationship(back_populates="user", cascade="all, delete-orphan")
    usage_records: Mapped[list[UsageRecord]] = relationship(back_populates="user", cascade="all, delete-orphan")
    __table_args__ = (
        Index("idx_users_clerk_id", "clerk_id"),
        Index("idx_users_stripe_customer", "stripe_customer_id"),
    )
 # =============================================================================
 # Flows
 # =============================================================================
 class FlowRecord(Base):
    """A user's workflow/flow."""
    __tablename__ = "flows"
    id: Mapped[UUID] = mapped_column(
        UUID(as_uuid=True), primary_key=True, default=uuid4
    )
    user_id: Mapped[UUID] = mapped_column(
        UUID(as_uuid=True), ForeignKey("users.id", ondelete="CASCADE"), nullable=False
    )
    name: Mapped[str] = mapped_column(String(100), nullable=False)
    description: Mapped[str | None] = mapped_column(String(500))
    # The actual workflow definition
    organism_yaml: Mapped[str] = mapped_column(Text, nullable=False, default="")
    # React Flow canvas state (JSON)
    canvas_state: Mapped[dict[str, Any] | None] = mapped_column(JSON)
    # Runtime state
    status: Mapped[FlowStatus] = mapped_column(Enum(FlowStatus), default=FlowStatus.STOPPED)
    container_id: Mapped[str | None] = mapped_column(String(255))
    error_message: Mapped[str | None] = mapped_column(Text)
    # Timestamps
    created_at: Mapped[datetime] = mapped_column(
        DateTime(timezone=True), server_default=func.now()
    )
    updated_at: Mapped[datetime] = mapped_column(
        DateTime(timezone=True), server_default=func.now(), onupdate=func.now()
    )
    # Relationships
    user: Mapped[UserRecord] = relationship(back_populates="flows")
    triggers: Mapped[list[TriggerRecord]] = relationship(back_populates="flow", cascade="all, delete-orphan")
    executions: Mapped[list[ExecutionRecord]] = relationship(back_populates="flow", cascade="all, delete-orphan")
    __table_args__ = (
        Index("idx_flows_user_id", "user_id"),
        Index("idx_flows_status", "status"),
    )
 # =============================================================================
 # Triggers
 # =============================================================================
 class TriggerRecord(Base):
    """A trigger that can start a flow."""
    __tablename__ = "triggers"
    id: Mapped[UUID] = mapped_column(
        UUID(as_uuid=True), primary_key=True, default=uuid4
    )
    flow_id: Mapped[UUID] = mapped_column(
        UUID(as_uuid=True), ForeignKey("flows.id", ondelete="CASCADE"), nullable=False
    )
    type: Mapped[TriggerType] = mapped_column(Enum(TriggerType), nullable=False)
    name: Mapped[str] = mapped_column(String(100), nullable=False)
    # Trigger configuration (JSON)
    config: Mapped[dict[str, Any]] = mapped_column(JSON, nullable=False, default=dict)
    # Webhook-specific fields
    webhook_token: Mapped[str | None] = mapped_column(String(64), unique=True)
    webhook_url: Mapped[str | None] = mapped_column(Text)
    # Timestamps
    created_at: Mapped[datetime] = mapped_column(
        DateTime(timezone=True), server_default=func.now()
    )
    # Relationships
    flow: Mapped[FlowRecord] = relationship(back_populates="triggers")
    executions: Mapped[list[ExecutionRecord]] = relationship(back_populates="trigger")
    __table_args__ = (
        Index("idx_triggers_flow_id", "flow_id"),
        Index("idx_triggers_webhook_token", "webhook_token"),
    )
 # =============================================================================
 # Executions
 # =============================================================================
 class ExecutionRecord(Base):
    """A single execution/run of a flow."""
    __tablename__ = "executions"
    id: Mapped[UUID] = mapped_column(
        UUID(as_uuid=True), primary_key=True, default=uuid4
    )
    flow_id: Mapped[UUID] = mapped_column(
        UUID(as_uuid=True), ForeignKey("flows.id", ondelete="CASCADE"), nullable=False
    )
    trigger_id: Mapped[UUID | None] = mapped_column(
        UUID(as_uuid=True), ForeignKey("triggers.id", ondelete="SET NULL")
    )
    trigger_type: Mapped[TriggerType] = mapped_column(Enum(TriggerType), nullable=False)
    # Execution state
    status: Mapped[ExecutionStatus] = mapped_column(
        Enum(ExecutionStatus), default=ExecutionStatus.RUNNING
    )
    error_message: Mapped[str | None] = mapped_column(Text)
    # Payloads (JSON strings for flexibility)
    input_payload: Mapped[str | None] = mapped_column(Text)
    output_payload: Mapped[str | None] = mapped_column(Text)
    # Timing
    started_at: Mapped[datetime] = mapped_column(
        DateTime(timezone=True), server_default=func.now()
    )
    completed_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
    duration_ms: Mapped[int | None] = mapped_column(Integer)
    # Relationships
    flow: Mapped[FlowRecord] = relationship(back_populates="executions")
    trigger: Mapped[TriggerRecord | None] = relationship(back_populates="executions")
    __table_args__ = (
        Index("idx_executions_flow_id", "flow_id"),
        Index("idx_executions_started_at", "started_at"),
        Index("idx_executions_status", "status"),
    )
 # =============================================================================
 # User API Keys (BYOK)
 # =============================================================================
 class UserApiKeyRecord(Base):
    """User's own API keys for BYOK (Bring Your Own Key)."""
    __tablename__ = "user_api_keys"
    id: Mapped[UUID] = mapped_column(
        UUID(as_uuid=True), primary_key=True, default=uuid4
    )
    user_id: Mapped[UUID] = mapped_column(
        UUID(as_uuid=True), ForeignKey("users.id", ondelete="CASCADE"), nullable=False
    )
    provider: Mapped[str] = mapped_column(String(50), nullable=False)
    # Encrypted API key
    encrypted_key: Mapped[bytes] = mapped_column(LargeBinary, nullable=False)
    key_hint: Mapped[str | None] = mapped_column(String(20))  # Last few chars for display
    # Validation state
    is_valid: Mapped[bool] = mapped_column(Boolean, default=True)
    last_error: Mapped[str | None] = mapped_column(String(255))
    last_used_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
    # Timestamps
    created_at: Mapped[datetime] = mapped_column(
        DateTime(timezone=True), server_default=func.now()
    )
    # Relationships
    user: Mapped[UserRecord] = relationship(back_populates="api_keys")
    __table_args__ = (
        Index("idx_user_api_keys_user_provider", "user_id", "provider", unique=True),
    )
 # =============================================================================
 # Usage Tracking
 # =============================================================================
 class UsageRecord(Base):
    """Usage tracking for billing."""
    __tablename__ = "usage_records"
    id: Mapped[UUID] = mapped_column(
        UUID(as_uuid=True), primary_key=True, default=uuid4
    )
    user_id: Mapped[UUID] = mapped_column(
        UUID(as_uuid=True), ForeignKey("users.id", ondelete="CASCADE"), nullable=False
    )
    period_start: Mapped[datetime] = mapped_column(
        DateTime(timezone=True), nullable=False
    )
    # Metrics
    workflow_runs: Mapped[int] = mapped_column(Integer, default=0)
    llm_tokens_in: Mapped[int] = mapped_column(Integer, default=0)
    llm_tokens_out: Mapped[int] = mapped_column(Integer, default=0)
    wasm_cpu_seconds: Mapped[float] = mapped_column(Numeric(10, 2), default=0)
    storage_gb_hours: Mapped[float] = mapped_column(Numeric(10, 2), default=0)
    # Stripe sync state
    last_synced_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
    last_synced_runs: Mapped[int] = mapped_column(Integer, default=0)
    # Timestamps
    created_at: Mapped[datetime] = mapped_column(
        DateTime(timezone=True), server_default=func.now()
    )
    updated_at: Mapped[datetime] = mapped_column(
        DateTime(timezone=True), server_default=func.now(), onupdate=func.now()
    )
    # Relationships
    user: Mapped[UserRecord] = relationship(back_populates="usage_records")
    __table_args__ = (
        Index("idx_usage_user_period", "user_id", "period_start", unique=True),
    )
 # =============================================================================
 # Stripe Events (Idempotency)
 # =============================================================================
 class StripeEventRecord(Base):
    """Processed Stripe webhook events for idempotency."""
    __tablename__ = "stripe_events"
    event_id: Mapped[str] = mapped_column(String(255), primary_key=True)
    event_type: Mapped[str] = mapped_column(String(100), nullable=False)
    processed_at: Mapped[datetime] = mapped_column(
        DateTime(timezone=True), server_default=func.now()
    )
    payload: Mapped[dict[str, Any] | None] = mapped_column(JSON)
    __table_args__ = (
        Index("idx_stripe_events_processed", "processed_at"),
    )
--- a/bloxserver/api/routes/init.py
+++ b/bloxserver/api/routes/init.py
@ -0,0 +1 @@
 """API route modules."""
--- a/bloxserver/api/routes/executions.py
+++ b/bloxserver/api/routes/executions.py
@ -0,0 +1,204 @@
 """
 Execution history and manual trigger endpoints.
 Executions are immutable records of flow runs.
 """
 from __future__ import annotations
 from datetime import datetime
 from uuid import UUID
 from fastapi import APIRouter, HTTPException, status
 from sqlalchemy import func, select
 from bloxserver.api.dependencies import AuthenticatedUser, DbSession
 from bloxserver.api.models.tables import (
    ExecutionRecord,
    ExecutionStatus,
    FlowRecord,
    TriggerType,
 )
 from bloxserver.api.schemas import Execution, ExecutionSummary, PaginatedResponse
 router = APIRouter(prefix="/flows/{flow_id}/executions", tags=["executions"])
@router.get("", response_model=PaginatedResponse[ExecutionSummary])
 async def list_executions(
    flow_id: UUID,
    user: AuthenticatedUser,
    db: DbSession,
    page: int = 1,
    page_size: int = 50,
    status_filter: ExecutionStatus | None = None,
 ) -> PaginatedResponse[ExecutionSummary]:
    """List execution history for a flow."""
    # Verify flow ownership
    flow_query = select(FlowRecord).where(
        FlowRecord.id == flow_id,
        FlowRecord.user_id == user.id,
    )
    flow = (await db.execute(flow_query)).scalar_one_or_none()
    if not flow:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail="Flow not found",
        )
    offset = (page - 1) * page_size
    # Build query
    base_query = select(ExecutionRecord).where(ExecutionRecord.flow_id == flow_id)
    if status_filter:
        base_query = base_query.where(ExecutionRecord.status == status_filter)
    # Get total count
    count_query = select(func.count()).select_from(base_query.subquery())
    total = (await db.execute(count_query)).scalar() or 0
    # Get page
    query = base_query.order_by(ExecutionRecord.started_at.desc()).offset(offset).limit(page_size)
    result = await db.execute(query)
    executions = result.scalars().all()
    return PaginatedResponse(
        items=[ExecutionSummary.model_validate(e) for e in executions],
        total=total,
        page=page,
        page_size=page_size,
        has_more=offset + len(executions) < total,
    )
@router.get("/{execution_id}", response_model=Execution)
 async def get_execution(
    flow_id: UUID,
    execution_id: UUID,
    user: AuthenticatedUser,
    db: DbSession,
 ) -> Execution:
    """Get details of a single execution."""
    # Verify flow ownership
    flow_query = select(FlowRecord).where(
        FlowRecord.id == flow_id,
        FlowRecord.user_id == user.id,
    )
    flow = (await db.execute(flow_query)).scalar_one_or_none()
    if not flow:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail="Flow not found",
        )
    # Get execution
    query = select(ExecutionRecord).where(
        ExecutionRecord.id == execution_id,
        ExecutionRecord.flow_id == flow_id,
    )
    result = await db.execute(query)
    execution = result.scalar_one_or_none()
    if not execution:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail="Execution not found",
        )
    return Execution.model_validate(execution)
@router.post("/run", response_model=Execution, status_code=status.HTTP_201_CREATED)
 async def run_flow_manually(
    flow_id: UUID,
    user: AuthenticatedUser,
    db: DbSession,
    input_payload: str | None = None,
 ) -> Execution:
    """
    Manually trigger a flow execution.
    The flow must be in 'running' state.
    """
    # Verify flow ownership
    flow_query = select(FlowRecord).where(
        FlowRecord.id == flow_id,
        FlowRecord.user_id == user.id,
    )
    flow = (await db.execute(flow_query)).scalar_one_or_none()
    if not flow:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail="Flow not found",
        )
    if flow.status != "running":
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail=f"Flow must be running to execute (current: {flow.status})",
        )
    # Create execution record
    execution = ExecutionRecord(
        flow_id=flow_id,
        trigger_type=TriggerType.MANUAL,
        status=ExecutionStatus.RUNNING,
        input_payload=input_payload,
    )
    db.add(execution)
    await db.flush()
    # TODO: Actually dispatch to the running container
    # For now, just return the execution record
    return Execution.model_validate(execution)
 # =============================================================================
 # Stats endpoint
 # =============================================================================
@router.get("/stats", response_model=dict)
 async def get_execution_stats(
    flow_id: UUID,
    user: AuthenticatedUser,
    db: DbSession,
 ) -> dict:
    """Get execution statistics for a flow."""
    # Verify flow ownership
    flow_query = select(FlowRecord).where(
        FlowRecord.id == flow_id,
        FlowRecord.user_id == user.id,
    )
    flow = (await db.execute(flow_query)).scalar_one_or_none()
    if not flow:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail="Flow not found",
        )
    # Calculate stats
    stats_query = select(
        func.count().label("total"),
        func.count().filter(ExecutionRecord.status == ExecutionStatus.SUCCESS).label("success"),
        func.count().filter(ExecutionRecord.status == ExecutionStatus.ERROR).label("error"),
        func.avg(ExecutionRecord.duration_ms).label("avg_duration_ms"),
        func.max(ExecutionRecord.started_at).label("last_executed_at"),
    ).where(ExecutionRecord.flow_id == flow_id)
    result = await db.execute(stats_query)
    row = result.one()
    return {
        "flowId": str(flow_id),
        "executionsTotal": row.total or 0,
        "executionsSuccess": row.success or 0,
        "executionsError": row.error or 0,
        "avgDurationMs": float(row.avg_duration_ms) if row.avg_duration_ms else 0,
        "lastExecutedAt": row.last_executed_at.isoformat() if row.last_executed_at else None,
    }
--- a/bloxserver/api/routes/flows.py
+++ b/bloxserver/api/routes/flows.py
@ -0,0 +1,269 @@
 """
 Flow CRUD endpoints.
 Flows are the core entity - a user's workflow definition.
 """
 from __future__ import annotations
 from uuid import UUID
 from fastapi import APIRouter, HTTPException, status
 from sqlalchemy import func, select
 from bloxserver.api.dependencies import AuthenticatedUser, DbSession
 from bloxserver.api.models.tables import FlowRecord, Tier
 from bloxserver.api.schemas import (
    CreateFlowRequest,
    Flow,
    FlowSummary,
    PaginatedResponse,
    UpdateFlowRequest,
 )
 router = APIRouter(prefix="/flows", tags=["flows"])
 # Default organism.yaml template for new flows
 DEFAULT_ORGANISM_YAML = """organism:
  name: my-flow
 listeners:
  - name: greeter
    payload_class: handlers.hello.Greeting
    handler: handlers.hello.handle_greeting
    description: A friendly greeter agent
    agent: true
    peers: []
 """
 # Tier limits
 TIER_FLOW_LIMITS = {
    Tier.FREE: 1,
    Tier.PRO: 100,  # Effectively unlimited for most users
    Tier.ENTERPRISE: 1000,
    Tier.HIGH_FREQUENCY: 1000,
 }
@router.get("", response_model=PaginatedResponse[FlowSummary])
 async def list_flows(
    user: AuthenticatedUser,
    db: DbSession,
    page: int = 1,
    page_size: int = 20,
 ) -> PaginatedResponse[FlowSummary]:
    """List all flows for the current user."""
    offset = (page - 1) * page_size
    # Get total count
    count_query = select(func.count()).select_from(FlowRecord).where(
        FlowRecord.user_id == user.id
    )
    total = (await db.execute(count_query)).scalar() or 0
    # Get page of flows
    query = (
        select(FlowRecord)
        .where(FlowRecord.user_id == user.id)
        .order_by(FlowRecord.updated_at.desc())
        .offset(offset)
        .limit(page_size)
    )
    result = await db.execute(query)
    flows = result.scalars().all()
    return PaginatedResponse(
        items=[FlowSummary.model_validate(f) for f in flows],
        total=total,
        page=page,
        page_size=page_size,
        has_more=offset + len(flows) < total,
    )
@router.post("", response_model=Flow, status_code=status.HTTP_201_CREATED)
 async def create_flow(
    user: AuthenticatedUser,
    db: DbSession,
    request: CreateFlowRequest,
 ) -> Flow:
    """Create a new flow."""
    # Check tier limits
    count_query = select(func.count()).select_from(FlowRecord).where(
        FlowRecord.user_id == user.id
    )
    current_count = (await db.execute(count_query)).scalar() or 0
    limit = TIER_FLOW_LIMITS.get(user.user.tier, 1)
    if current_count >= limit:
        raise HTTPException(
            status_code=status.HTTP_403_FORBIDDEN,
            detail=f"Flow limit reached ({limit}). Upgrade to create more flows.",
        )
    # Create flow
    flow = FlowRecord(
        user_id=user.id,
        name=request.name,
        description=request.description,
        organism_yaml=request.organism_yaml or DEFAULT_ORGANISM_YAML,
    )
    db.add(flow)
    await db.flush()
    return Flow.model_validate(flow)
@router.get("/{flow_id}", response_model=Flow)
 async def get_flow(
    flow_id: UUID,
    user: AuthenticatedUser,
    db: DbSession,
 ) -> Flow:
    """Get a single flow by ID."""
    query = select(FlowRecord).where(
        FlowRecord.id == flow_id,
        FlowRecord.user_id == user.id,
    )
    result = await db.execute(query)
    flow = result.scalar_one_or_none()
    if not flow:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail="Flow not found",
        )
    return Flow.model_validate(flow)
@router.patch("/{flow_id}", response_model=Flow)
 async def update_flow(
    flow_id: UUID,
    user: AuthenticatedUser,
    db: DbSession,
    request: UpdateFlowRequest,
 ) -> Flow:
    """Update a flow."""
    query = select(FlowRecord).where(
        FlowRecord.id == flow_id,
        FlowRecord.user_id == user.id,
    )
    result = await db.execute(query)
    flow = result.scalar_one_or_none()
    if not flow:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail="Flow not found",
        )
    # Update fields that were provided
    if request.name is not None:
        flow.name = request.name
    if request.description is not None:
        flow.description = request.description
    if request.organism_yaml is not None:
        flow.organism_yaml = request.organism_yaml
    if request.canvas_state is not None:
        flow.canvas_state = request.canvas_state.model_dump()
    await db.flush()
    return Flow.model_validate(flow)
@router.delete("/{flow_id}", status_code=status.HTTP_204_NO_CONTENT)
 async def delete_flow(
    flow_id: UUID,
    user: AuthenticatedUser,
    db: DbSession,
 ) -> None:
    """Delete a flow."""
    query = select(FlowRecord).where(
        FlowRecord.id == flow_id,
        FlowRecord.user_id == user.id,
    )
    result = await db.execute(query)
    flow = result.scalar_one_or_none()
    if not flow:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail="Flow not found",
        )
    await db.delete(flow)
 # =============================================================================
 # Flow Actions (Start/Stop)
 # =============================================================================
@router.post("/{flow_id}/start", response_model=Flow)
 async def start_flow(
    flow_id: UUID,
    user: AuthenticatedUser,
    db: DbSession,
 ) -> Flow:
    """Start a flow (deploy container)."""
    query = select(FlowRecord).where(
        FlowRecord.id == flow_id,
        FlowRecord.user_id == user.id,
    )
    result = await db.execute(query)
    flow = result.scalar_one_or_none()
    if not flow:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail="Flow not found",
        )
    if flow.status not in ("stopped", "error"):
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail=f"Cannot start flow in {flow.status} state",
        )
    # TODO: Actually start the container
    # This is where we'd call the container orchestration layer
    # For now, just update the status
    flow.status = "starting"
    flow.error_message = None
    await db.flush()
    return Flow.model_validate(flow)
@router.post("/{flow_id}/stop", response_model=Flow)
 async def stop_flow(
    flow_id: UUID,
    user: AuthenticatedUser,
    db: DbSession,
 ) -> Flow:
    """Stop a running flow."""
    query = select(FlowRecord).where(
        FlowRecord.id == flow_id,
        FlowRecord.user_id == user.id,
    )
    result = await db.execute(query)
    flow = result.scalar_one_or_none()
    if not flow:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail="Flow not found",
        )
    if flow.status not in ("running", "starting", "error"):
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail=f"Cannot stop flow in {flow.status} state",
        )
    # TODO: Actually stop the container
    flow.status = "stopping"
    await db.flush()
    return Flow.model_validate(flow)
--- a/bloxserver/api/routes/health.py
+++ b/bloxserver/api/routes/health.py
@ -0,0 +1,77 @@
 """
 Health check and status endpoints.
 """
 from __future__ import annotations
 from datetime import datetime
 from fastapi import APIRouter
 from sqlalchemy import text
 from bloxserver.api.models.database import async_session_maker
 router = APIRouter(tags=["health"])
@router.get("/health")
 async def health_check() -> dict:
    """
    Basic health check.
    Returns 200 if the service is running.
    """
    return {
        "status": "healthy",
        "timestamp": datetime.utcnow().isoformat(),
        "service": "bloxserver-api",
    }
@router.get("/health/ready")
 async def readiness_check() -> dict:
    """
    Readiness check - verifies database connectivity.
    Used by Kubernetes/load balancers to determine if the service
    is ready to receive traffic.
    """
    errors = []
    # Check database
    try:
        async with async_session_maker() as session:
            await session.execute(text("SELECT 1"))
    except Exception as e:
        errors.append(f"database: {e}")
    # TODO: Check Redis
    # TODO: Check other dependencies
    if errors:
        return {
            "status": "unhealthy",
            "timestamp": datetime.utcnow().isoformat(),
            "errors": errors,
        }
    return {
        "status": "ready",
        "timestamp": datetime.utcnow().isoformat(),
        "checks": {
            "database": "ok",
        },
    }
@router.get("/health/live")
 async def liveness_check() -> dict:
    """
    Liveness check - just confirms the process is running.
    If this fails, Kubernetes should restart the pod.
    """
    return {
        "status": "alive",
        "timestamp": datetime.utcnow().isoformat(),
    }
--- a/bloxserver/api/routes/triggers.py
+++ b/bloxserver/api/routes/triggers.py
@ -0,0 +1,221 @@
 """
 Trigger CRUD endpoints.
 Triggers define how flows are started: webhook, schedule, or manual.
 """
 from __future__ import annotations
 import secrets
 from uuid import UUID
 from fastapi import APIRouter, HTTPException, status
 from sqlalchemy import select
 from bloxserver.api.dependencies import AuthenticatedUser, DbSession
 from bloxserver.api.models.tables import FlowRecord, TriggerRecord, TriggerType
 from bloxserver.api.schemas import CreateTriggerRequest, Trigger
 router = APIRouter(prefix="/flows/{flow_id}/triggers", tags=["triggers"])
 # Base URL for webhooks (configured via environment)
 import os
 WEBHOOK_BASE_URL = os.getenv("WEBHOOK_BASE_URL", "https://api.openblox.ai/webhooks")
 def generate_webhook_token() -> str:
    """Generate a secure random token for webhook URLs."""
    return secrets.token_urlsafe(32)
@router.get("", response_model=list[Trigger])
 async def list_triggers(
    flow_id: UUID,
    user: AuthenticatedUser,
    db: DbSession,
 ) -> list[Trigger]:
    """List all triggers for a flow."""
    # Verify flow ownership
    flow_query = select(FlowRecord).where(
        FlowRecord.id == flow_id,
        FlowRecord.user_id == user.id,
    )
    flow = (await db.execute(flow_query)).scalar_one_or_none()
    if not flow:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail="Flow not found",
        )
    # Get triggers
    query = select(TriggerRecord).where(TriggerRecord.flow_id == flow_id)
    result = await db.execute(query)
    triggers = result.scalars().all()
    return [Trigger.model_validate(t) for t in triggers]
@router.post("", response_model=Trigger, status_code=status.HTTP_201_CREATED)
 async def create_trigger(
    flow_id: UUID,
    user: AuthenticatedUser,
    db: DbSession,
    request: CreateTriggerRequest,
 ) -> Trigger:
    """Create a new trigger for a flow."""
    # Verify flow ownership
    flow_query = select(FlowRecord).where(
        FlowRecord.id == flow_id,
        FlowRecord.user_id == user.id,
    )
    flow = (await db.execute(flow_query)).scalar_one_or_none()
    if not flow:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail="Flow not found",
        )
    # Create trigger
    trigger = TriggerRecord(
        flow_id=flow_id,
        type=TriggerType(request.type.value),
        name=request.name,
        config=request.config,
    )
    # Generate webhook URL for webhook triggers
    if request.type == TriggerType.WEBHOOK:
        trigger.webhook_token = generate_webhook_token()
        trigger.webhook_url = f"{WEBHOOK_BASE_URL}/{trigger.webhook_token}"
    db.add(trigger)
    await db.flush()
    return Trigger.model_validate(trigger)
@router.get("/{trigger_id}", response_model=Trigger)
 async def get_trigger(
    flow_id: UUID,
    trigger_id: UUID,
    user: AuthenticatedUser,
    db: DbSession,
 ) -> Trigger:
    """Get a single trigger by ID."""
    # Verify flow ownership
    flow_query = select(FlowRecord).where(
        FlowRecord.id == flow_id,
        FlowRecord.user_id == user.id,
    )
    flow = (await db.execute(flow_query)).scalar_one_or_none()
    if not flow:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail="Flow not found",
        )
    # Get trigger
    query = select(TriggerRecord).where(
        TriggerRecord.id == trigger_id,
        TriggerRecord.flow_id == flow_id,
    )
    result = await db.execute(query)
    trigger = result.scalar_one_or_none()
    if not trigger:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail="Trigger not found",
        )
    return Trigger.model_validate(trigger)
@router.delete("/{trigger_id}", status_code=status.HTTP_204_NO_CONTENT)
 async def delete_trigger(
    flow_id: UUID,
    trigger_id: UUID,
    user: AuthenticatedUser,
    db: DbSession,
 ) -> None:
    """Delete a trigger."""
    # Verify flow ownership
    flow_query = select(FlowRecord).where(
        FlowRecord.id == flow_id,
        FlowRecord.user_id == user.id,
    )
    flow = (await db.execute(flow_query)).scalar_one_or_none()
    if not flow:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail="Flow not found",
        )
    # Get and delete trigger
    query = select(TriggerRecord).where(
        TriggerRecord.id == trigger_id,
        TriggerRecord.flow_id == flow_id,
    )
    result = await db.execute(query)
    trigger = result.scalar_one_or_none()
    if not trigger:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail="Trigger not found",
        )
    await db.delete(trigger)
@router.post("/{trigger_id}/regenerate-token", response_model=Trigger)
 async def regenerate_webhook_token(
    flow_id: UUID,
    trigger_id: UUID,
    user: AuthenticatedUser,
    db: DbSession,
 ) -> Trigger:
    """Regenerate the webhook token for a webhook trigger."""
    # Verify flow ownership
    flow_query = select(FlowRecord).where(
        FlowRecord.id == flow_id,
        FlowRecord.user_id == user.id,
    )
    flow = (await db.execute(flow_query)).scalar_one_or_none()
    if not flow:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail="Flow not found",
        )
    # Get trigger
    query = select(TriggerRecord).where(
        TriggerRecord.id == trigger_id,
        TriggerRecord.flow_id == flow_id,
    )
    result = await db.execute(query)
    trigger = result.scalar_one_or_none()
    if not trigger:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail="Trigger not found",
        )
    if trigger.type != TriggerType.WEBHOOK:
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail="Can only regenerate token for webhook triggers",
        )
    # Regenerate
    trigger.webhook_token = generate_webhook_token()
    trigger.webhook_url = f"{WEBHOOK_BASE_URL}/{trigger.webhook_token}"
    await db.flush()
    return Trigger.model_validate(trigger)
--- a/bloxserver/api/routes/webhooks.py
+++ b/bloxserver/api/routes/webhooks.py
@ -0,0 +1,125 @@
 """
 Webhook trigger endpoint.
 This handles incoming webhook requests that trigger flows.
 """
 from __future__ import annotations
 from datetime import datetime
 from fastapi import APIRouter, HTTPException, Request, status
 from sqlalchemy import select
 from bloxserver.api.models.database import get_db_context
 from bloxserver.api.models.tables import (
    ExecutionRecord,
    ExecutionStatus,
    FlowRecord,
    TriggerRecord,
    TriggerType,
 )
 router = APIRouter(prefix="/webhooks", tags=["webhooks"])
@router.post("/{webhook_token}")
 async def handle_webhook(
    webhook_token: str,
    request: Request,
 ) -> dict:
    """
    Handle incoming webhook request.
    This endpoint is public (no auth) - the token IS the authentication.
    """
    async with get_db_context() as db:
        # Look up trigger by token
        query = select(TriggerRecord).where(
            TriggerRecord.webhook_token == webhook_token,
            TriggerRecord.type == TriggerType.WEBHOOK,
        )
        result = await db.execute(query)
        trigger = result.scalar_one_or_none()
        if not trigger:
            raise HTTPException(
                status_code=status.HTTP_404_NOT_FOUND,
                detail="Webhook not found",
            )
        # Get the flow
        flow_query = select(FlowRecord).where(FlowRecord.id == trigger.flow_id)
        flow = (await db.execute(flow_query)).scalar_one_or_none()
        if not flow:
            raise HTTPException(
                status_code=status.HTTP_404_NOT_FOUND,
                detail="Flow not found",
            )
        if flow.status != "running":
            raise HTTPException(
                status_code=status.HTTP_400_BAD_REQUEST,
                detail=f"Flow is not running (status: {flow.status})",
            )
        # Get request body
        try:
            body = await request.body()
            input_payload = body.decode("utf-8") if body else None
        except Exception:
            input_payload = None
        # Create execution record
        execution = ExecutionRecord(
            flow_id=flow.id,
            trigger_id=trigger.id,
            trigger_type=TriggerType.WEBHOOK,
            status=ExecutionStatus.RUNNING,
            input_payload=input_payload,
        )
        db.add(execution)
        await db.commit()
        # TODO: Actually dispatch to the running container
        # This would send the payload to the flow's container
        return {
            "status": "accepted",
            "executionId": str(execution.id),
            "message": "Webhook received and execution started",
        }
@router.get("/{webhook_token}/test")
 async def test_webhook(webhook_token: str) -> dict:
    """
    Test that a webhook token is valid.
    Returns info about the trigger without actually executing.
    """
    async with get_db_context() as db:
        query = select(TriggerRecord).where(
            TriggerRecord.webhook_token == webhook_token,
            TriggerRecord.type == TriggerType.WEBHOOK,
        )
        result = await db.execute(query)
        trigger = result.scalar_one_or_none()
        if not trigger:
            raise HTTPException(
                status_code=status.HTTP_404_NOT_FOUND,
                detail="Webhook not found",
            )
        # Get the flow
        flow_query = select(FlowRecord).where(FlowRecord.id == trigger.flow_id)
        flow = (await db.execute(flow_query)).scalar_one_or_none()
        return {
            "valid": True,
            "triggerName": trigger.name,
            "flowName": flow.name if flow else None,
            "flowStatus": flow.status.value if flow else None,
        }
--- a/bloxserver/api/schemas.py
+++ b/bloxserver/api/schemas.py
@ -0,0 +1,322 @@
 """
 Pydantic schemas for API request/response validation.
 These match the TypeScript types in types.ts for frontend compatibility.
 Uses camelCase aliases for JSON serialization.
 """
 from __future__ import annotations
 from datetime import datetime
 from enum import Enum
 from typing import Any, Generic, Literal, TypeVar
 from uuid import UUID
 from pydantic import BaseModel, ConfigDict, Field
 # =============================================================================
 # Config for camelCase serialization
 # =============================================================================
 def to_camel(string: str) -> str:
    """Convert snake_case to camelCase."""
    components = string.split("_")
    return components[0] + "".join(x.title() for x in components[1:])
 class CamelModel(BaseModel):
    """Base model with camelCase JSON serialization."""
    model_config = ConfigDict(
        alias_generator=to_camel,
        populate_by_name=True,
        from_attributes=True,
    )
 # =============================================================================
 # Common Types
 # =============================================================================
 T = TypeVar("T")
 class PaginatedResponse(CamelModel, Generic[T]):
    """Paginated list response."""
    items: list[T]
    total: int
    page: int
    page_size: int
    has_more: bool
 class ApiError(CamelModel):
    """API error response."""
    code: str
    message: str
    details: dict[str, Any] | None = None
 # =============================================================================
 # Enums
 # =============================================================================
 class Tier(str, Enum):
    FREE = "free"
    PRO = "pro"
    ENTERPRISE = "enterprise"
    HIGH_FREQUENCY = "high_frequency"
 class FlowStatus(str, Enum):
    STOPPED = "stopped"
    STARTING = "starting"
    RUNNING = "running"
    STOPPING = "stopping"
    ERROR = "error"
 class TriggerType(str, Enum):
    WEBHOOK = "webhook"
    SCHEDULE = "schedule"
    MANUAL = "manual"
 class ExecutionStatus(str, Enum):
    RUNNING = "running"
    SUCCESS = "success"
    ERROR = "error"
    TIMEOUT = "timeout"
 # =============================================================================
 # User
 # =============================================================================
 class User(CamelModel):
    """User account (synced from Clerk)."""
    id: UUID
    clerk_id: str
    email: str
    name: str | None = None
    avatar_url: str | None = None
    tier: Tier = Tier.FREE
    created_at: datetime
 # =============================================================================
 # Canvas State (React Flow)
 # =============================================================================
 class CanvasNode(CamelModel):
    """A node in the React Flow canvas."""
    id: str
    type: str
    position: dict[str, float]
    data: dict[str, Any]
 class CanvasEdge(CamelModel):
    """An edge connecting nodes in the canvas."""
    id: str
    source: str
    target: str
    source_handle: str | None = None
    target_handle: str | None = None
 class CanvasState(CamelModel):
    """React Flow canvas state."""
    nodes: list[CanvasNode]
    edges: list[CanvasEdge]
    viewport: dict[str, float]
 # =============================================================================
 # Flows
 # =============================================================================
 class Flow(CamelModel):
    """A user's workflow/flow."""
    id: UUID
    user_id: UUID
    name: str
    description: str | None = None
    organism_yaml: str
    canvas_state: CanvasState | None = None
    status: FlowStatus = FlowStatus.STOPPED
    container_id: str | None = None
    error_message: str | None = None
    created_at: datetime
    updated_at: datetime
 class FlowSummary(CamelModel):
    """Abbreviated flow for list views."""
    id: UUID
    name: str
    description: str | None = None
    status: FlowStatus
    updated_at: datetime
 class CreateFlowRequest(CamelModel):
    """Request to create a new flow."""
    name: str = Field(min_length=1, max_length=100)
    description: str | None = Field(default=None, max_length=500)
    organism_yaml: str | None = None
 class UpdateFlowRequest(CamelModel):
    """Request to update a flow."""
    name: str | None = Field(default=None, min_length=1, max_length=100)
    description: str | None = Field(default=None, max_length=500)
    organism_yaml: str | None = None
    canvas_state: CanvasState | None = None
 # =============================================================================
 # Triggers
 # =============================================================================
 class WebhookTriggerConfig(CamelModel):
    """Config for webhook triggers."""
    type: Literal["webhook"] = "webhook"
 class ScheduleTriggerConfig(CamelModel):
    """Config for scheduled triggers."""
    type: Literal["schedule"] = "schedule"
    cron: str = Field(description="Cron expression")
    timezone: str = "UTC"
 class ManualTriggerConfig(CamelModel):
    """Config for manual triggers."""
    type: Literal["manual"] = "manual"
 TriggerConfig = WebhookTriggerConfig | ScheduleTriggerConfig | ManualTriggerConfig
 class Trigger(CamelModel):
    """A trigger that can start a flow."""
    id: UUID
    flow_id: UUID
    type: TriggerType
    name: str
    config: dict[str, Any]
    webhook_token: str | None = None
    webhook_url: str | None = None
    created_at: datetime
 class CreateTriggerRequest(CamelModel):
    """Request to create a trigger."""
    type: TriggerType
    name: str = Field(min_length=1, max_length=100)
    config: dict[str, Any]
 # =============================================================================
 # Executions
 # =============================================================================
 class Execution(CamelModel):
    """A single execution/run of a flow."""
    id: UUID
    flow_id: UUID
    trigger_id: UUID | None = None
    trigger_type: TriggerType
    status: ExecutionStatus
    started_at: datetime
    completed_at: datetime | None = None
    duration_ms: int | None = None
    error_message: str | None = None
    input_payload: str | None = None
    output_payload: str | None = None
 class ExecutionSummary(CamelModel):
    """Abbreviated execution for list views."""
    id: UUID
    status: ExecutionStatus
    trigger_type: TriggerType
    started_at: datetime
    duration_ms: int | None = None
 # =============================================================================
 # Usage & Stats
 # =============================================================================
 class UsageDashboard(CamelModel):
    """Current usage for user dashboard."""
    period_start: datetime
    period_end: datetime | None
    runs_used: int
    runs_limit: int
    runs_percentage: float
    tokens_used: int
    estimated_overage: float
    days_remaining: int
 class FlowStats(CamelModel):
    """Statistics for a single flow."""
    flow_id: UUID
    executions_total: int
    executions_success: int
    executions_error: int
    avg_duration_ms: float
    last_executed_at: datetime | None = None
 # =============================================================================
 # API Keys (BYOK)
 # =============================================================================
 class ApiKeyInfo(CamelModel):
    """Info about a stored API key (never exposes the key itself)."""
    provider: str
    key_hint: str | None  # Last few chars: "...abc123"
    is_valid: bool
    last_used_at: datetime | None
    created_at: datetime
 class AddApiKeyRequest(CamelModel):
    """Request to add a user's API key."""
    provider: str = Field(description="Provider name: openai, anthropic, xai")
    api_key: str = Field(min_length=10, description="The API key")
--- a/bloxserver/docker-compose.yml
+++ b/bloxserver/docker-compose.yml
@ -0,0 +1,72 @@
 # BloxServer Development Docker Compose
 # Run with: docker-compose up -d
 version: '3.8'
 services:
  # ==========================================================================
  # PostgreSQL Database
  # ==========================================================================
  postgres:
    image: postgres:16-alpine
    container_name: bloxserver-postgres
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      POSTGRES_DB: bloxserver
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5
  # ==========================================================================
  # Redis (for caching, rate limiting, queues)
  # ==========================================================================
  redis:
    image: redis:7-alpine
    container_name: bloxserver-redis
    ports:
      - "6379:6379"
    volumes:
      - redis_data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 5
  # ==========================================================================
  # BloxServer API
  # ==========================================================================
  api:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: bloxserver-api
    ports:
      - "8000:8000"
    environment:
      - ENV=development
      - DATABASE_URL=postgresql+asyncpg://postgres:postgres@postgres:5432/bloxserver
      - REDIS_URL=redis://redis:6379
      - AUTO_CREATE_TABLES=true
      - ENABLE_DOCS=true
      - CORS_ORIGINS=http://localhost:3000
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
    volumes:
      # Mount source for hot reload in development
      - .:/app/bloxserver:ro
    command: uvicorn bloxserver.api.main:app --host 0.0.0.0 --port 8000 --reload
 volumes:
  postgres_data:
  redis_data:
--- a/bloxserver/requirements.txt
+++ b/bloxserver/requirements.txt
@ -0,0 +1,31 @@
 # BloxServer API Dependencies
 # Web framework
 fastapi>=0.109.0
 uvicorn[standard]>=0.27.0
 # Database
 sqlalchemy[asyncio]>=2.0.0
 asyncpg>=0.29.0
 alembic>=1.13.0
 # Authentication (Clerk JWT validation)
 pyjwt[crypto]>=2.8.0
 httpx>=0.27.0
 # Validation & serialization
 pydantic>=2.5.0
 pydantic-settings>=2.1.0
 # Utilities
 python-dotenv>=1.0.0
 humps>=0.2.2
 # Stripe billing
 stripe>=8.0.0
 # Redis (for caching/rate limiting)
 redis>=5.0.0
 # Cryptography (for API key encryption)
 cryptography>=42.0.0
--- a/docs/bloxserver-billing.md
+++ b/docs/bloxserver-billing.md
@ -0,0 +1,668 @@
 # BloxServer Billing Integration — Stripe
 **Status:** Design
 **Date:** January 2026
 ## Overview
 BloxServer uses Stripe for subscription management, usage-based billing, and payment processing. This document specifies the integration architecture, webhook handlers, and usage tracking system.
 ## Pricing Tiers
 | Tier | Price | Runs/Month | Features |
 |------|-------|------------|----------|
 | **Free** | $0 | 1,000 | 1 workflow, built-in tools, community support |
 | **Pro** | $29 | 100,000 | Unlimited workflows, marketplace, WASM, project memory, priority support |
 | **Enterprise** | Custom | Unlimited | SSO/SAML, SLA, dedicated support, private marketplace |
 ### Overage Pricing (Pro)
 | Metric | Included | Overage Rate |
 |--------|----------|--------------|
 | Workflow runs | 100K/mo | $0.50 per 1K |
 | Storage | 10 GB | $0.10 per GB |
 | WASM execution | 1000 CPU-sec | $0.01 per CPU-sec |
 ## Stripe Product Structure
 ```
 Products:
 ├── bloxserver_free
 │   └── price_free_monthly ($0/month, metered runs)
 ├── bloxserver_pro
 │   ├── price_pro_monthly ($29/month base)
 │   ├── price_pro_runs_overage (metered, $0.50/1K)
 │   └── price_pro_storage_overage (metered, $0.10/GB)
 └── bloxserver_enterprise
    └── price_enterprise_custom (quoted per customer)
 ```
 ### Stripe Configuration
 ```python
 # One-time setup (or via Stripe Dashboard)
 # Free tier product
 free_product = stripe.Product.create(
    name="BloxServer Free",
    description="Build AI agent swarms, visually",
 )
 free_price = stripe.Price.create(
    product=free_product.id,
    unit_amount=0,
    currency="usd",
    recurring={"interval": "month"},
    metadata={"tier": "free", "runs_included": "1000"}
 )
 # Pro tier product
 pro_product = stripe.Product.create(
    name="BloxServer Pro",
    description="Unlimited workflows, marketplace access, custom WASM",
 )
 pro_base_price = stripe.Price.create(
    product=pro_product.id,
    unit_amount=2900,  # $29.00
    currency="usd",
    recurring={"interval": "month"},
    metadata={"tier": "pro", "runs_included": "100000"}
 )
 pro_runs_overage = stripe.Price.create(
    product=pro_product.id,
    currency="usd",
    recurring={
        "interval": "month",
        "usage_type": "metered",
        "aggregate_usage": "sum",
    },
    unit_amount_decimal="0.05",  # $0.0005 per run = $0.50 per 1K
    metadata={"type": "runs_overage"}
 )
 ```
 ## Database Schema
 ```sql
 -- Users table (synced from Clerk + Stripe)
 CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    clerk_id VARCHAR(255) UNIQUE NOT NULL,
    email VARCHAR(255) NOT NULL,
    name VARCHAR(255),
    -- Stripe fields
    stripe_customer_id VARCHAR(255) UNIQUE,
    stripe_subscription_id VARCHAR(255),
    stripe_subscription_item_id VARCHAR(255),  -- For usage reporting
    -- Billing state (cached from Stripe)
    tier VARCHAR(50) DEFAULT 'free',  -- free, pro, enterprise
    billing_status VARCHAR(50) DEFAULT 'active',  -- active, past_due, canceled
    trial_ends_at TIMESTAMPTZ,
    current_period_start TIMESTAMPTZ,
    current_period_end TIMESTAMPTZ,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
 );
 -- Usage tracking (local, for dashboard + Stripe sync)
 CREATE TABLE usage_records (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID REFERENCES users(id),
    period_start DATE NOT NULL,  -- Billing period start
    -- Metrics
    workflow_runs INT DEFAULT 0,
    llm_tokens_in INT DEFAULT 0,
    llm_tokens_out INT DEFAULT 0,
    wasm_cpu_seconds DECIMAL(10,2) DEFAULT 0,
    storage_gb_hours DECIMAL(10,2) DEFAULT 0,
    -- Stripe sync state
    last_synced_at TIMESTAMPTZ,
    last_synced_runs INT DEFAULT 0,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW(),
    UNIQUE(user_id, period_start)
 );
 -- Stripe webhook events (idempotency)
 CREATE TABLE stripe_events (
    event_id VARCHAR(255) PRIMARY KEY,
    event_type VARCHAR(100) NOT NULL,
    processed_at TIMESTAMPTZ DEFAULT NOW(),
    payload JSONB
 );
 -- Index for cleanup
 CREATE INDEX idx_stripe_events_processed ON stripe_events(processed_at);
 ```
 ## Usage Tracking
 ### Real-Time Counting (Redis)
 ```python
 # On every workflow execution
 async def record_workflow_run(user_id: str):
    """Increment run counter in Redis."""
    key = f"usage:{user_id}:runs:{get_current_period()}"
    await redis.incr(key)
    await redis.expire(key, 86400 * 35)  # 35 days TTL
    # Track users with usage for batch sync
    await redis.sadd("users:with_usage", user_id)
 async def record_llm_tokens(user_id: str, tokens_in: int, tokens_out: int):
    """Track LLM token usage."""
    period = get_current_period()
    await redis.incrby(f"usage:{user_id}:tokens_in:{period}", tokens_in)
    await redis.incrby(f"usage:{user_id}:tokens_out:{period}", tokens_out)
 ```
 ### Periodic Sync to Stripe (Hourly)
 ```python
 async def sync_usage_to_stripe():
    """Hourly job: push usage increments to Stripe."""
    user_ids = await redis.smembers("users:with_usage")
    for user_id in user_ids:
        user = await get_user(user_id)
        if not user.stripe_subscription_item_id:
            continue  # Free tier without Stripe subscription
        # Get usage since last sync
        period = get_current_period()
        runs_key = f"usage:{user_id}:runs:{period}"
        current_runs = int(await redis.get(runs_key) or 0)
        last_synced = await get_last_synced_runs(user_id, period)
        delta = current_runs - last_synced
        if delta <= 0:
            continue
        # Check if over included limit
        tier_limit = get_tier_runs_limit(user.tier)  # 1000 or 100000
        if current_runs <= tier_limit:
            # Still within included runs, just track locally
            await update_last_synced(user_id, period, current_runs)
            continue
        # Calculate overage to report
        overage_start = max(last_synced, tier_limit)
        overage_runs = current_runs - overage_start
        if overage_runs > 0:
            # Report to Stripe
            await stripe.subscription_items.create_usage_record(
                user.stripe_subscription_item_id,
                quantity=overage_runs,
                timestamp=int(time.time()),
                action='increment'
            )
        await update_last_synced(user_id, period, current_runs)
    # Clear the tracking set (will rebuild next hour)
    await redis.delete("users:with_usage")
 ```
 ### Dashboard Query
 ```python
 async def get_usage_dashboard(user_id: str) -> UsageDashboard:
    """Get current usage for user dashboard."""
    user = await get_user(user_id)
    period = get_current_period()
    # Get real-time counts from Redis
    runs = int(await redis.get(f"usage:{user_id}:runs:{period}") or 0)
    tokens_in = int(await redis.get(f"usage:{user_id}:tokens_in:{period}") or 0)
    tokens_out = int(await redis.get(f"usage:{user_id}:tokens_out:{period}") or 0)
    tier_limit = get_tier_runs_limit(user.tier)
    return UsageDashboard(
        period_start=period,
        period_end=user.current_period_end,
        runs_used=runs,
        runs_limit=tier_limit,
        runs_percentage=min(100, (runs / tier_limit) * 100),
        tokens_used=tokens_in + tokens_out,
        estimated_overage=calculate_overage_cost(runs, tier_limit),
        days_remaining=(user.current_period_end - datetime.now()).days,
    )
 ```
 ## Subscription Lifecycle
 ### Signup Flow
 ```
 User clicks "Start Free Trial"
        │
        ▼
 ┌───────────────────────────────────────────────────────────┐
 │ 1. Create Stripe Customer                                  │
 │                                                            │
 │    customer = stripe.Customer.create(                      │
 │        email=user.email,                                   │
 │        metadata={"clerk_id": user.clerk_id}                │
 │    )                                                       │
 └───────────────────────────────────────────────────────────┘
        │
        ▼
 ┌───────────────────────────────────────────────────────────┐
 │ 2. Create Checkout Session (hosted payment page)           │
 │                                                            │
 │    session = stripe.checkout.Session.create(               │
 │        customer=customer.id,                               │
 │        mode='subscription',                                │
 │        line_items=[{                                       │
 │            'price': 'price_pro_monthly',                   │
 │            'quantity': 1                                   │
 │        }, {                                                │
 │            'price': 'price_pro_runs_overage',  # metered   │
 │        }],                                                 │
 │        subscription_data={                                 │
 │            'trial_period_days': 14,                        │
 │        },                                                  │
 │        success_url='https://app.openblox.ai/welcome',      │
 │        cancel_url='https://app.openblox.ai/pricing',       │
 │    )                                                       │
 │                                                            │
 │    → Redirect user to session.url                          │
 └───────────────────────────────────────────────────────────┘
        │
        ▼
 ┌───────────────────────────────────────────────────────────┐
 │ 3. User enters payment details on Stripe Checkout          │
 │                                                            │
 │    Card validated but NOT charged (trial)                  │
 └───────────────────────────────────────────────────────────┘
        │
        ▼
 ┌───────────────────────────────────────────────────────────┐
 │ 4. Webhook: checkout.session.completed                     │
 │                                                            │
 │    → Update user with stripe_customer_id                   │
 │    → Update user with stripe_subscription_id               │
 │    → Set tier = 'pro'                                      │
 │    → Set trial_ends_at                                     │
 └───────────────────────────────────────────────────────────┘
 ```
 ### Trial End
 ```
 Day 11 of 14-day trial
        │
        ▼
 ┌───────────────────────────────────────────────────────────┐
 │ Scheduled job: Trial ending soon emails                    │
 │                                                            │
 │ SELECT * FROM users                                        │
 │ WHERE trial_ends_at BETWEEN NOW() AND NOW() + INTERVAL '3d'│
 │ AND billing_status = 'trialing'                            │
 │                                                            │
 │ → Send "Your trial ends in 3 days" email                   │
 └───────────────────────────────────────────────────────────┘
        │
        ▼
 Day 14: Trial ends
        │
        ▼
 ┌───────────────────────────────────────────────────────────┐
 │ Stripe automatically:                                      │
 │ 1. Charges the card on file                               │
 │ 2. Sends invoice.payment_succeeded webhook                │
 │                                                            │
 │ Our webhook handler:                                       │
 │ → Update billing_status = 'active'                        │
 │ → Send "Welcome to Pro!" email                            │
 └───────────────────────────────────────────────────────────┘
 ```
 ### Cancellation
 ```python
 # User clicks "Cancel subscription" in Customer Portal
 # Stripe sends webhook
@webhook("customer.subscription.updated")
 async def handle_subscription_updated(event):
    subscription = event.data.object
    user = await get_user_by_stripe_subscription(subscription.id)
    if subscription.cancel_at_period_end:
        # User requested cancellation (takes effect at period end)
        await send_email(user, "subscription_canceled", {
            "effective_date": subscription.current_period_end
        })
        await db.execute("""
            UPDATE users
            SET billing_status = 'canceling',
                updated_at = NOW()
            WHERE id = $1
        """, user.id)
@webhook("customer.subscription.deleted")
 async def handle_subscription_deleted(event):
    subscription = event.data.object
    user = await get_user_by_stripe_subscription(subscription.id)
    # Subscription actually ended
    await db.execute("""
        UPDATE users
        SET tier = 'free',
            billing_status = 'canceled',
            stripe_subscription_id = NULL,
            stripe_subscription_item_id = NULL,
            updated_at = NOW()
        WHERE id = $1
    """, user.id)
    await send_email(user, "downgraded_to_free")
 ```
 ## Webhook Handlers
 ### Endpoint Setup
 ```python
 from fastapi import FastAPI, Request, HTTPException
 import stripe
 app = FastAPI()
@app.post("/webhooks/stripe")
 async def stripe_webhook(request: Request):
    payload = await request.body()
    sig_header = request.headers.get("stripe-signature")
    try:
        event = stripe.Webhook.construct_event(
            payload, sig_header, settings.STRIPE_WEBHOOK_SECRET
        )
    except ValueError:
        raise HTTPException(400, "Invalid payload")
    except stripe.error.SignatureVerificationError:
        raise HTTPException(400, "Invalid signature")
    # Idempotency check
    if await is_event_processed(event.id):
        return {"status": "already_processed"}
    # Route to handler
    handler = WEBHOOK_HANDLERS.get(event.type)
    if handler:
        await handler(event)
    else:
        logger.info(f"Unhandled webhook: {event.type}")
    # Mark processed
    await mark_event_processed(event)
    return {"status": "success"}
 ```
 ### Handler Registry
 ```python
 WEBHOOK_HANDLERS = {
    # Checkout
    "checkout.session.completed": handle_checkout_completed,
    # Subscriptions
    "customer.subscription.created": handle_subscription_created,
    "customer.subscription.updated": handle_subscription_updated,
    "customer.subscription.deleted": handle_subscription_deleted,
    "customer.subscription.trial_will_end": handle_trial_ending,
    # Payments
    "invoice.payment_succeeded": handle_payment_succeeded,
    "invoice.payment_failed": handle_payment_failed,
    "invoice.upcoming": handle_invoice_upcoming,
    # Customer
    "customer.updated": handle_customer_updated,
 }
 ```
 ### Key Handlers
 ```python
@webhook("checkout.session.completed")
 async def handle_checkout_completed(event):
    """User completed checkout - provision their account."""
    session = event.data.object
    # Get or create user
    user = await get_user_by_clerk_id(session.client_reference_id)
    # Update with Stripe IDs
    subscription = await stripe.Subscription.retrieve(session.subscription)
    await db.execute("""
        UPDATE users SET
            stripe_customer_id = $1,
            stripe_subscription_id = $2,
            stripe_subscription_item_id = $3,
            tier = $4,
            billing_status = $5,
            trial_ends_at = $6,
            current_period_start = $7,
            current_period_end = $8,
            updated_at = NOW()
        WHERE id = $9
    """,
        session.customer,
        subscription.id,
        subscription['items'].data[0].id,  # First item for usage reporting
        'pro',
        subscription.status,  # 'trialing' or 'active'
        datetime.fromtimestamp(subscription.trial_end) if subscription.trial_end else None,
        datetime.fromtimestamp(subscription.current_period_start),
        datetime.fromtimestamp(subscription.current_period_end),
        user.id
    )
@webhook("invoice.payment_failed")
 async def handle_payment_failed(event):
    """Payment failed - notify user, potentially downgrade."""
    invoice = event.data.object
    user = await get_user_by_stripe_customer(invoice.customer)
    attempt_count = invoice.attempt_count
    if attempt_count == 1:
        # First failure - soft warning
        await send_email(user, "payment_failed_soft", {
            "amount": invoice.amount_due / 100,
            "update_url": await get_customer_portal_url(user)
        })
    elif attempt_count == 2:
        # Second failure - stronger warning
        await send_email(user, "payment_failed_warning", {
            "amount": invoice.amount_due / 100,
            "days_until_downgrade": 3
        })
    else:
        # Final failure - downgrade
        await db.execute("""
            UPDATE users SET
                tier = 'free',
                billing_status = 'past_due',
                updated_at = NOW()
            WHERE id = $1
        """, user.id)
        await send_email(user, "downgraded_payment_failed")
@webhook("customer.subscription.trial_will_end")
 async def handle_trial_ending(event):
    """Trial ending in 3 days - Stripe sends this automatically."""
    subscription = event.data.object
    user = await get_user_by_stripe_subscription(subscription.id)
    await send_email(user, "trial_ending", {
        "trial_end_date": datetime.fromtimestamp(subscription.trial_end),
        "amount": 29.00,  # Pro price
        "manage_url": await get_customer_portal_url(user)
    })
 ```
 ## Customer Portal
 Stripe's hosted portal for self-service billing management.
 ```python
 async def get_customer_portal_url(user: User) -> str:
    """Generate a portal session URL for the user."""
    session = await stripe.billing_portal.Session.create(
        customer=user.stripe_customer_id,
        return_url="https://app.openblox.ai/settings/billing"
    )
    return session.url
 ```
 **Portal capabilities:**
 - Update payment method
 - View invoices and receipts
 - Cancel subscription
 - Upgrade/downgrade plan (if configured)
 ## Email Templates
 | Trigger | Template | Content |
 |---------|----------|---------|
 | Trial started | `trial_started` | Welcome, trial ends on X |
 | Trial ending (3 days) | `trial_ending` | Your trial ends soon, card will be charged |
 | Trial converted | `trial_converted` | Welcome to Pro! |
 | Payment succeeded | `payment_succeeded` | Receipt attached |
 | Payment failed (1st) | `payment_failed_soft` | Please update your card |
 | Payment failed (2nd) | `payment_failed_warning` | Service will be interrupted |
 | Payment failed (final) | `downgraded_payment_failed` | You've been downgraded |
 | Subscription canceled | `subscription_canceled` | Access until period end |
 | Downgraded | `downgraded_to_free` | You're now on Free |
 ## Rate Limiting & Abuse Prevention
 ### Soft Limits (Warning)
 ```python
 async def check_usage_limits(user_id: str) -> UsageLimitResult:
    """Check if user is approaching limits."""
    usage = await get_current_usage(user_id)
    user = await get_user(user_id)
    tier_limit = get_tier_runs_limit(user.tier)
    percentage = (usage.runs / tier_limit) * 100
    if percentage >= 100:
        return UsageLimitResult(
            allowed=True,  # Still allow, but warn
            warning="You've exceeded your included runs. Overage charges apply.",
            overage_rate="$0.50 per 1,000 runs"
        )
    elif percentage >= 80:
        return UsageLimitResult(
            allowed=True,
            warning=f"You've used {percentage:.0f}% of your monthly runs."
        )
    return UsageLimitResult(allowed=True)
 ```
 ### Hard Limits (Free Tier)
 ```python
 async def enforce_free_tier_limits(user_id: str) -> bool:
    """Free tier has hard limits - no overage allowed."""
    user = await get_user(user_id)
    if user.tier != "free":
        return True  # Paid tiers have soft limits
    usage = await get_current_usage(user_id)
    if usage.runs >= 1000:
        raise UsageLimitExceeded(
            "You've reached the Free tier limit of 1,000 runs/month. "
            "Upgrade to Pro for unlimited workflows."
        )
    return True
 ```
 ## Testing
 ### Test Mode
 Stripe provides test mode with test API keys and test card numbers.
 ```python
 # .env
 STRIPE_SECRET_KEY=sk_test_...  # Test mode
 STRIPE_WEBHOOK_SECRET=whsec_...
 # Test cards
 # 4242424242424242 - Succeeds
 # 4000000000000002 - Declined
 # 4000002500003155 - Requires 3D Secure
 ```
 ### Webhook Testing
 ```bash
 # Use Stripe CLI to forward webhooks locally
 stripe listen --forward-to localhost:8000/webhooks/stripe
 # Trigger test events
 stripe trigger invoice.payment_succeeded
 stripe trigger customer.subscription.trial_will_end
 ```
 ## Monitoring & Alerts
 | Metric | Alert Threshold |
 |--------|-----------------|
 | Webhook processing time | > 5 seconds |
 | Webhook failure rate | > 1% |
 | Payment failure rate | > 5% |
 | Usage sync lag | > 2 hours |
 | Stripe API errors | Any 5xx |
 ## Security Checklist
 - [ ] Webhook signature verification
 - [ ] Idempotent event processing
 - [ ] API keys in environment variables (never in code)
 - [ ] Customer portal for sensitive operations (not custom UI)
 - [ ] PCI compliance via Stripe Checkout (no card data touches our servers)
 - [ ] Audit log for billing events
 ---
 ## References
 - [Stripe Billing](https://stripe.com/docs/billing)
 - [Stripe Webhooks](https://stripe.com/docs/webhooks)
 - [Stripe Checkout](https://stripe.com/docs/payments/checkout)
 - [Stripe Customer Portal](https://stripe.com/docs/billing/subscriptions/customer-portal)
 - [Metered Billing](https://stripe.com/docs/billing/subscriptions/metered-billing)
--- a/docs/bloxserver-llm-layer.md
+++ b/docs/bloxserver-llm-layer.md
@ -0,0 +1,961 @@
 # BloxServer LLM Abstraction Layer — Resilient Multi-Provider Architecture
 **Status:** Design
 **Date:** January 2026
 ## Overview
 The LLM abstraction layer is the critical path for all AI operations in BloxServer. It must handle:
 - **Viral growth**: 100 → 10,000 users overnight
 - **Provider outages**: Single provider down ≠ platform down
 - **Fair access**: Paid users prioritized, free users served fairly
 - **Cost control**: Platform keys vs BYOK (Bring Your Own Key)
 - **Low latency**: Sub-second for simple calls, reasonable for complex
 This document specifies the defense-in-depth architecture that survives success.
 ## Architecture
 ```
 ┌─────────────────────────────────────────────────────────────────┐
 │                     LLM Abstraction Layer                        │
 │                                                                  │
 │  Request → [Rate Limit] → [Cache Check] → [Queue] → [Dispatch]  │
 │                │               │              │          │       │
 │                ▼               ▼              ▼          ▼       │
 │           Per-user        Semantic       Priority   Provider    │
 │           per-tier        cache          queues     pool +      │
 │           limits          (30%+ hits)    (by tier)  failover    │
 │                                                                  │
 │  ┌─────────────────────────────────────────────────────────────┐│
 │  │ BYOK (Bring Your Own Key)                                   ││
 │  │ Pro+ users with own API keys bypass platform limits         ││
 │  └─────────────────────────────────────────────────────────────┘│
 │                                                                  │
 │  ┌─────────────────────────────────────────────────────────────┐│
 │  │ High Frequency Tier                                         ││
 │  │ Dedicated capacity, custom SLA — contact sales              ││
 │  └─────────────────────────────────────────────────────────────┘│
 └─────────────────────────────────────────────────────────────────┘
 ```
 ## Tier Limits
 | Tier | Price | Requests/min | Tokens/min | Concurrent | Latency SLA |
 |------|-------|--------------|------------|------------|-------------|
 | **Free** | $0 | 10 | 10,000 | 2 | Best effort |
 | **Pro** | $29/mo | 60 | 100,000 | 10 | < 30s P95 |
 | **Enterprise** | Custom | 300 | 500,000 | 50 | < 10s P95 |
 | **High Frequency** | Custom | Custom | Custom | Dedicated | Custom SLA |
 | **BYOK** (any tier) | — | Unlimited* | Unlimited* | 20 | User's provider |
 *BYOK users are limited only by their own provider's rate limits.
 ### High Frequency Tier
 For users requiring:
 - **Low latency**: Sub-second response times
 - **High throughput**: Thousands of requests per minute
 - **Guaranteed capacity**: Dedicated provider allocations
 - **Custom models**: Fine-tuned or private deployments
 **Use cases:**
 - Real-time trading signals
 - Live customer support at scale
 - High-volume content generation
 - Latency-sensitive applications
 **Pricing:** Custom — based on capacity reservation, SLA requirements, and volume.
 **Landing page CTA:**
 ```
 ┌─────────────────────────────────────────────────────────────┐
 │                                                              │
 │  Need High Frequency?                                        │
 │                                                              │
 │  Building something that needs thousands of requests per     │
 │  minute with sub-second latency? Let's talk dedicated        │
 │  capacity and custom SLAs.                                   │
 │                                                              │
 │  [Contact Sales →]                                           │
 │                                                              │
 └─────────────────────────────────────────────────────────────┘
 ```
 ## Layer 1: Intake Rate Limiting
 First line of defense. Rejects requests before they consume resources.
 ### Implementation
 ```python
 from dataclasses import dataclass
 from enum import Enum
 import time
 class Tier(Enum):
    FREE = "free"
    PRO = "pro"
    ENTERPRISE = "enterprise"
    HIGH_FREQUENCY = "high_frequency"
@dataclass
 class TierLimits:
    requests_per_minute: int
    tokens_per_minute: int
    max_concurrent: int
 TIER_LIMITS = {
    Tier.FREE: TierLimits(10, 10_000, 2),
    Tier.PRO: TierLimits(60, 100_000, 10),
    Tier.ENTERPRISE: TierLimits(300, 500_000, 50),
    Tier.HIGH_FREQUENCY: TierLimits(10_000, 10_000_000, 500),  # Custom per customer
 }
@dataclass
 class RateLimitResult:
    allowed: bool
    use_user_key: bool = False
    retry_after: int | None = None
    reason: str | None = None
    concurrent_key: str | None = None
 async def rate_limit_check(user: User, request: LLMRequest) -> RateLimitResult:
    """Check if user can make this request."""
    # BYOK users bypass platform limits
    if user.has_own_api_key(request.provider):
        return RateLimitResult(allowed=True, use_user_key=True)
    limits = TIER_LIMITS[user.tier]
    # Check requests per minute (sliding window)
    rpm_key = f"ratelimit:{user.id}:rpm"
    now = time.time()
    window_start = now - 60
    # Remove old entries, add new one, count
    pipe = redis.pipeline()
    pipe.zremrangebyscore(rpm_key, 0, window_start)
    pipe.zadd(rpm_key, {str(now): now})
    pipe.zcard(rpm_key)
    pipe.expire(rpm_key, 120)
    _, _, current_rpm, _ = await pipe.execute()
    if current_rpm > limits.requests_per_minute:
        return RateLimitResult(
            allowed=False,
            retry_after=int(60 - (now - window_start)),
            reason=f"Rate limit: {limits.requests_per_minute} requests/minute"
        )
    # Check concurrent requests
    concurrent_key = f"ratelimit:{user.id}:concurrent"
    current_concurrent = await redis.incr(concurrent_key)
    await redis.expire(concurrent_key, 300)  # 5 min TTL as safety
    if current_concurrent > limits.max_concurrent:
        await redis.decr(concurrent_key)
        return RateLimitResult(
            allowed=False,
            retry_after=1,
            reason=f"Max concurrent: {limits.max_concurrent} requests"
        )
    return RateLimitResult(allowed=True, concurrent_key=concurrent_key)
 async def release_concurrent(concurrent_key: str):
    """Release concurrent slot after request completes."""
    if concurrent_key:
        await redis.decr(concurrent_key)
 ```
 ### Rate Limit Headers
 Return standard headers so clients can self-regulate:
 ```python
 def rate_limit_headers(user: User) -> dict:
    limits = TIER_LIMITS[user.tier]
    current = await get_current_usage(user.id)
    return {
        "X-RateLimit-Limit": str(limits.requests_per_minute),
        "X-RateLimit-Remaining": str(max(0, limits.requests_per_minute - current.rpm)),
        "X-RateLimit-Reset": str(int(time.time()) + 60),
    }
 ```
 ## Layer 2: Semantic Cache
 Identical requests return cached responses. Reduces load and cost.
 ### Cache Key Generation
 ```python
 import hashlib
 import json
 def hash_request(request: LLMRequest) -> str:
    """Generate deterministic cache key for request."""
    # Include all parameters that affect output
    cache_input = {
        "model": request.model,
        "messages": [
            {"role": m.role, "content": m.content}
            for m in request.messages
        ],
        "temperature": request.temperature,
        "max_tokens": request.max_tokens,
        "tools": request.tools,  # Tool definitions matter
        # Exclude: user_id, timestamps, request_id
    }
    serialized = json.dumps(cache_input, sort_keys=True)
    return hashlib.sha256(serialized.encode()).hexdigest()[:32]
 ```
 ### Cache Logic
 ```python
@dataclass
 class CachedResponse:
    response: LLMResponse
    cached_at: float
    hit_count: int
 async def check_semantic_cache(request: LLMRequest) -> LLMResponse | None:
    """Check if we've seen this exact request before."""
    cache_key = f"llmcache:{hash_request(request)}"
    cached = await redis.get(cache_key)
    if cached:
        data = json.loads(cached)
        # Update hit count for analytics
        await redis.hincrby(f"llmcache:stats", "hits", 1)
        return LLMResponse(
            content=data["content"],
            model=data["model"],
            usage=data["usage"],
            cached=True,
        )
    await redis.hincrby(f"llmcache:stats", "misses", 1)
    return None
 async def cache_response(request: LLMRequest, response: LLMResponse):
    """Cache response with TTL based on determinism."""
    # Don't cache errors or empty responses
    if response.error or not response.content:
        return
    cache_key = f"llmcache:{hash_request(request)}"
    # TTL based on temperature (determinism)
    if request.temperature == 0:
        ttl = 86400  # 24 hours for deterministic
    elif request.temperature < 0.3:
        ttl = 3600   # 1 hour
    elif request.temperature < 0.7:
        ttl = 300    # 5 minutes
    else:
        return       # Don't cache high-temperature responses
    cache_data = {
        "content": response.content,
        "model": response.model,
        "usage": response.usage,
        "cached_at": time.time(),
    }
    await redis.setex(cache_key, ttl, json.dumps(cache_data))
 ```
 ### Expected Cache Performance
 | Use Case | Temperature | Expected Hit Rate |
 |----------|-------------|-------------------|
 | Tool calls (same inputs) | 0 | 70-90% |
 | Structured extraction | 0-0.3 | 50-70% |
 | Agent reasoning | 0.5-0.7 | 20-40% |
 | Creative content | 0.8-1.0 | ~0% |
 **Aggregate impact:** 30-40% reduction in API calls for typical workloads.
 ## Layer 3: Priority Queues
 Paid users get priority. Free users are served fairly but can be shed under load.
 ### Queue Structure
 ```python
 # Redis sorted set with composite score
 # Score = (priority * 1B) + timestamp
 # Lower score = higher priority + earlier arrival
 QUEUE_PRIORITIES = {
    Tier.HIGH_FREQUENCY: 0,  # Highest priority (dedicated customers)
    Tier.ENTERPRISE: 1,
    Tier.PRO: 2,
    "trial": 2,              # Trials get Pro priority (first impression)
    Tier.FREE: 3,            # Lowest priority
 }
@dataclass
 class QueuedRequest:
    ticket_id: str
    user_id: str
    tier: str
    request: LLMRequest
    enqueued_at: float
    use_user_key: bool = False
 async def enqueue_request(user: User, request: LLMRequest, use_user_key: bool) -> str:
    """Add request to priority queue, return ticket ID."""
    ticket_id = f"ticket:{uuid.uuid4().hex}"
    priority = QUEUE_PRIORITIES.get(user.tier, 3)
    # Composite score: priority (billions) + timestamp (seconds)
    score = priority * 1_000_000_000 + time.time()
    queued = QueuedRequest(
        ticket_id=ticket_id,
        user_id=str(user.id),
        tier=user.tier,
        request=request,
        enqueued_at=time.time(),
        use_user_key=use_user_key,
    )
    await redis.zadd("llm:queue", {json.dumps(asdict(queued)): score})
    # Set a result placeholder
    await redis.setex(f"llm:result:{ticket_id}", 300, "pending")
    return ticket_id
 ```
 ### Queue Workers
 ```python
 async def queue_worker():
    """Process requests from the queue."""
    while True:
        # Get highest priority item (lowest score)
        items = await redis.zpopmin("llm:queue", count=1)
        if not items:
            await asyncio.sleep(0.1)  # Brief pause if queue empty
            continue
        data, score = items[0]
        queued = QueuedRequest(**json.loads(data))
        try:
            # Select provider and execute
            response = await execute_llm_request(queued)
            # Store result
            await redis.setex(
                f"llm:result:{queued.ticket_id}",
                300,
                json.dumps({"status": "success", "response": asdict(response)})
            )
        except Exception as e:
            await redis.setex(
                f"llm:result:{queued.ticket_id}",
                300,
                json.dumps({"status": "error", "error": str(e)})
            )
 async def wait_for_result(ticket_id: str, timeout: float = 120) -> LLMResponse:
    """Wait for queued request to complete."""
    deadline = time.time() + timeout
    while time.time() < deadline:
        result = await redis.get(f"llm:result:{ticket_id}")
        if result and result != "pending":
            data = json.loads(result)
            if data["status"] == "success":
                return LLMResponse(**data["response"])
            else:
                raise LLMError(data["error"])
        await asyncio.sleep(0.1)
    raise RequestTimeout("Request timed out")
 ```
 ### Queue Health Monitoring
 ```python
@dataclass
 class QueueHealth:
    size: int
    oldest_wait_seconds: float
    by_tier: dict[str, int]
    status: str  # healthy, degraded, critical
 async def get_queue_health() -> QueueHealth:
    """Get queue metrics for monitoring and load shedding."""
    queue_size = await redis.zcard("llm:queue")
    # Get oldest item
    oldest = await redis.zrange("llm:queue", 0, 0, withscores=True)
    if oldest:
        oldest_score = oldest[0][1]
        oldest_time = oldest_score % 1_000_000_000
        wait_time = time.time() - oldest_time
    else:
        wait_time = 0
    # Count by tier
    all_items = await redis.zrange("llm:queue", 0, -1)
    by_tier = {}
    for item in all_items:
        data = json.loads(item)
        tier = data.get("tier", "unknown")
        by_tier[tier] = by_tier.get(tier, 0) + 1
    # Determine status
    if queue_size < 500:
        status = "healthy"
    elif queue_size < 2000:
        status = "degraded"
    else:
        status = "critical"
    return QueueHealth(
        size=queue_size,
        oldest_wait_seconds=wait_time,
        by_tier=by_tier,
        status=status,
    )
 ```
 ## Layer 4: Multi-Provider Pool with Circuit Breakers
 Never depend on a single provider.
 ### Provider Configuration
 ```python
@dataclass
 class ProviderConfig:
    name: str
    base_url: str
    api_key_env: str
    models: list[str]
    max_concurrent: int
    priority: int  # Lower = preferred
    timeout: float = 60.0
 PROVIDERS = {
    "anthropic": ProviderConfig(
        name="anthropic",
        base_url="https://api.anthropic.com/v1",
        api_key_env="ANTHROPIC_API_KEY",
        models=["claude-sonnet-4-20250514", "claude-opus-4-20250514", "claude-haiku-3"],
        max_concurrent=100,
        priority=1,
    ),
    "openai": ProviderConfig(
        name="openai",
        base_url="https://api.openai.com/v1",
        api_key_env="OPENAI_API_KEY",
        models=["gpt-4o", "gpt-4o-mini", "o1", "o3-mini"],
        max_concurrent=50,
        priority=2,
    ),
    "xai": ProviderConfig(
        name="xai",
        base_url="https://api.x.ai/v1",
        api_key_env="XAI_API_KEY",
        models=["grok-3", "grok-3-mini"],
        max_concurrent=50,
        priority=1,
    ),
    "together": ProviderConfig(
        name="together",
        base_url="https://api.together.xyz/v1",
        api_key_env="TOGETHER_API_KEY",
        models=["llama-3-70b", "mixtral-8x7b"],
        max_concurrent=100,
        priority=3,  # Fallback
    ),
 }
 ```
 ### Circuit Breaker State
 ```python
@dataclass
 class CircuitState:
    provider: str
    healthy: bool = True
    failures: int = 0
    successes: int = 0
    last_failure: float = 0
    circuit_open_until: float = 0
    current_load: int = 0
 # In-memory state (could be Redis for distributed)
 CIRCUIT_STATES: dict[str, CircuitState] = {
    name: CircuitState(provider=name)
    for name in PROVIDERS
 }
 CIRCUIT_CONFIG = {
    "failure_threshold": 5,      # Failures before opening
    "success_threshold": 3,      # Successes before closing
    "open_duration": 30,         # Seconds circuit stays open
    "half_open_requests": 1,     # Requests allowed in half-open state
 }
 async def record_success(provider: str):
    """Record successful request."""
    state = CIRCUIT_STATES[provider]
    state.successes += 1
    state.failures = 0
    if not state.healthy and state.successes >= CIRCUIT_CONFIG["success_threshold"]:
        state.healthy = True
        logger.info(f"Circuit closed for {provider}")
 async def record_failure(provider: str, error: Exception):
    """Record failed request, potentially open circuit."""
    state = CIRCUIT_STATES[provider]
    state.failures += 1
    state.successes = 0
    state.last_failure = time.time()
    if state.failures >= CIRCUIT_CONFIG["failure_threshold"]:
        state.healthy = False
        state.circuit_open_until = time.time() + CIRCUIT_CONFIG["open_duration"]
        logger.error(f"Circuit opened for {provider}: {error}")
        await alert_ops(f"LLM provider {provider} circuit opened")
 def is_provider_available(provider: str) -> bool:
    """Check if provider can accept requests."""
    state = CIRCUIT_STATES[provider]
    config = PROVIDERS[provider]
    # Circuit open?
    if not state.healthy:
        if time.time() < state.circuit_open_until:
            return False
        # Half-open: allow limited requests to probe
    # At capacity?
    if state.current_load >= config.max_concurrent:
        return False
    return True
 ```
 ### Provider Selection
 ```python
 def get_providers_for_model(model: str) -> list[str]:
    """Get providers that support this model."""
    return [
        name for name, config in PROVIDERS.items()
        if model in config.models or any(model.startswith(m.split("-")[0]) for m in config.models)
    ]
 async def select_provider(request: LLMRequest, user_key: str | None = None) -> tuple[str, str]:
    """Select best available provider, return (provider_name, api_key)."""
    candidates = get_providers_for_model(request.model)
    if not candidates:
        raise UnsupportedModel(f"No provider supports model: {request.model}")
    # Filter to available providers
    available = [p for p in candidates if is_provider_available(p)]
    if not available:
        raise NoProvidersAvailable(
            "All providers for this model are currently unavailable. "
            "Please try again in a few seconds."
        )
    # Sort by priority, then by current load
    available.sort(key=lambda p: (
        PROVIDERS[p].priority,
        CIRCUIT_STATES[p].current_load / PROVIDERS[p].max_concurrent
    ))
    selected = available[0]
    # Determine API key
    if user_key:
        api_key = user_key
    else:
        api_key = os.environ[PROVIDERS[selected].api_key_env]
    return selected, api_key
 ```
 ## Layer 5: BYOK (Bring Your Own Key)
 Pro+ users can add their own API keys to bypass platform limits.
 ### Database Schema
 ```sql
 CREATE TABLE user_api_keys (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID REFERENCES users(id) ON DELETE CASCADE,
    provider VARCHAR(50) NOT NULL,
    encrypted_key BYTEA NOT NULL,
    key_hint VARCHAR(20),  -- Last 4 chars for display: "...abc123"
    is_valid BOOLEAN DEFAULT true,
    last_used_at TIMESTAMPTZ,
    last_error VARCHAR(255),
    created_at TIMESTAMPTZ DEFAULT NOW(),
    UNIQUE(user_id, provider)
 );
 CREATE INDEX idx_user_api_keys_user ON user_api_keys(user_id);
 ```
 ### Key Encryption
 ```python
 from cryptography.fernet import Fernet
 # Platform encryption key (from environment, rotated periodically)
 ENCRYPTION_KEY = Fernet(os.environ["API_KEY_ENCRYPTION_KEY"])
 def encrypt_api_key(key: str) -> bytes:
    """Encrypt user's API key for storage."""
    return ENCRYPTION_KEY.encrypt(key.encode())
 def decrypt_api_key(encrypted: bytes) -> str:
    """Decrypt user's API key for use."""
    return ENCRYPTION_KEY.decrypt(encrypted).decode()
 async def store_user_api_key(user_id: str, provider: str, api_key: str):
    """Store encrypted API key for user."""
    # Validate key format
    if not validate_key_format(provider, api_key):
        raise InvalidAPIKey(f"Invalid {provider} API key format")
    # Test the key
    if not await test_api_key(provider, api_key):
        raise InvalidAPIKey(f"API key validation failed for {provider}")
    encrypted = encrypt_api_key(api_key)
    key_hint = f"...{api_key[-6:]}"
    await db.execute("""
        INSERT INTO user_api_keys (user_id, provider, encrypted_key, key_hint)
        VALUES ($1, $2, $3, $4)
        ON CONFLICT (user_id, provider)
        DO UPDATE SET encrypted_key = $3, key_hint = $4, is_valid = true, last_error = NULL
    """, user_id, provider, encrypted, key_hint)
 async def get_user_api_key(user_id: str, provider: str) -> str | None:
    """Get decrypted API key for user, if they have one."""
    row = await db.fetchrow("""
        SELECT encrypted_key, is_valid
        FROM user_api_keys
        WHERE user_id = $1 AND provider = $2
    """, user_id, provider)
    if not row or not row["is_valid"]:
        return None
    return decrypt_api_key(row["encrypted_key"])
 ```
 ### BYOK Request Flow
 ```python
 async def execute_with_byok(user: User, request: LLMRequest) -> LLMResponse:
    """Execute request, preferring user's own key if available."""
    # Check for user's key
    user_key = await get_user_api_key(user.id, get_provider_for_model(request.model))
    if user_key:
        # Use user's key - bypass platform rate limits
        try:
            response = await call_provider_direct(request, user_key)
            # Update last used
            await db.execute("""
                UPDATE user_api_keys
                SET last_used_at = NOW(), last_error = NULL
                WHERE user_id = $1 AND provider = $2
            """, user.id, request.provider)
            return response
        except AuthenticationError:
            # Key is invalid - mark it and fall back to platform
            await db.execute("""
                UPDATE user_api_keys
                SET is_valid = false, last_error = 'Authentication failed'
                WHERE user_id = $1 AND provider = $2
            """, user.id, request.provider)
            # Notify user
            await send_notification(user, "api_key_invalid", {
                "provider": request.provider
            })
            # Fall through to platform key
    # Use platform key (with rate limiting)
    return await execute_with_platform_key(user, request)
 ```
 ## Layer 6: Backpressure & Graceful Degradation
 When overwhelmed, fail gracefully and prioritize paid users.
 ### Load Shedding
 ```python
 async def should_shed_load(user: User, queue_health: QueueHealth) -> bool:
    """Determine if this request should be rejected to protect the system."""
    # High Frequency and Enterprise never shed
    if user.tier in [Tier.HIGH_FREQUENCY, Tier.ENTERPRISE]:
        return False
    # Pro shed only in critical
    if user.tier == Tier.PRO and queue_health.status != "critical":
        return False
    # Free tier shed in degraded or critical
    if user.tier == Tier.FREE and queue_health.status in ["degraded", "critical"]:
        # Probabilistic shedding based on queue size
        shed_probability = min(0.9, (queue_health.size - 500) / 2000)
        return random.random() < shed_probability
    return False
 ```
 ### Graceful Error Messages
 ```python
 class ServiceDegraded(Exception):
    """Raised when load shedding rejects a request."""
    def __init__(self, tier: str, queue_health: QueueHealth):
        if tier == Tier.FREE:
            message = (
                "We're experiencing high demand. Free tier requests are "
                "temporarily paused. Upgrade to Pro for priority access, "
                "or try again in a few minutes."
            )
            retry_after = 60
        else:
            message = (
                "High demand is causing delays. Your request has been queued. "
                "Expected wait time: ~{} seconds."
            ).format(int(queue_health.oldest_wait_seconds * 1.5))
            retry_after = 30
        self.message = message
        self.retry_after = retry_after
        super().__init__(message)
 ```
 ### Timeout Handling
 ```python
 async def execute_with_timeout(request: LLMRequest, provider: str, api_key: str) -> LLMResponse:
    """Execute request with appropriate timeout."""
    # Timeout based on expected response size
    if request.max_tokens and request.max_tokens > 2000:
        timeout = 120  # Long responses need more time
    else:
        timeout = 60
    try:
        async with asyncio.timeout(timeout):
            return await call_provider(request, provider, api_key)
    except asyncio.TimeoutError:
        await record_failure(provider, TimeoutError("Request timed out"))
        raise RequestTimeout(
            f"Request timed out after {timeout}s. "
            "Try reducing max_tokens or simplifying the prompt."
        )
 ```
 ## Main Entry Point
 ```python
 async def handle_llm_request(user: User, request: LLMRequest) -> LLMResponse:
    """
    Main entry point for all LLM requests.
    Implements full defense-in-depth stack.
    """
    concurrent_key = None
    try:
        # Layer 1: Rate limiting
        rate_result = await rate_limit_check(user, request)
        if not rate_result.allowed:
            raise RateLimitExceeded(
                message=rate_result.reason,
                retry_after=rate_result.retry_after
            )
        concurrent_key = rate_result.concurrent_key
        # Layer 2: Semantic cache
        cached = await check_semantic_cache(request)
        if cached:
            return cached
        # Layer 3: Check queue health for load shedding
        queue_health = await get_queue_health()
        if await should_shed_load(user, queue_health):
            raise ServiceDegraded(user.tier, queue_health)
        # Layer 4: Enqueue with priority
        ticket_id = await enqueue_request(user, request, rate_result.use_user_key)
        # Layer 5: Wait for result
        response = await wait_for_result(ticket_id, timeout=120)
        # Layer 6: Cache successful response
        await cache_response(request, response)
        return response
    finally:
        # Always release concurrent slot
        if concurrent_key:
            await release_concurrent(concurrent_key)
 ```
 ## Monitoring & Alerts
 ### Key Metrics
 | Metric | Source | Warning | Critical |
 |--------|--------|---------|----------|
 | Queue depth | Redis ZCARD | > 500 | > 2000 |
 | P50 latency | Request timing | > 10s | > 30s |
 | P99 latency | Request timing | > 60s | > 120s |
 | Cache hit rate | Redis stats | < 25% | < 10% |
 | Provider error rate | Circuit state | > 5% | > 20% |
 | Circuit breaker open | Circuit state | Any | Multiple |
 | Free tier rejection rate | Load shedding | > 20% | > 50% |
 ### Alerting
 ```python
 # PagerDuty / Slack alerts
 ALERTS = {
    "queue_critical": {
        "condition": lambda h: h.size > 2000,
        "severity": "critical",
        "message": "LLM queue depth critical: {size} requests backed up"
    },
    "provider_down": {
        "condition": lambda p: not p.healthy,
        "severity": "warning",
        "message": "Provider {name} circuit breaker open"
    },
    "all_providers_down": {
        "condition": lambda: all(not s.healthy for s in CIRCUIT_STATES.values()),
        "severity": "critical",
        "message": "ALL LLM providers are down!"
    },
 }
 ```
 ### Dashboard Queries
 ```sql
 -- Requests per minute by tier
 SELECT
    date_trunc('minute', created_at) as minute,
    tier,
    COUNT(*) as requests
 FROM llm_requests
 WHERE created_at > NOW() - INTERVAL '1 hour'
 GROUP BY 1, 2
 ORDER BY 1 DESC;
 -- Error rate by provider
 SELECT
    provider,
    COUNT(*) FILTER (WHERE status = 'error') * 100.0 / COUNT(*) as error_rate
 FROM llm_requests
 WHERE created_at > NOW() - INTERVAL '1 hour'
 GROUP BY provider;
 -- BYOK adoption
 SELECT
    tier,
    COUNT(*) FILTER (WHERE used_user_key) * 100.0 / COUNT(*) as byok_percentage
 FROM llm_requests
 WHERE created_at > NOW() - INTERVAL '24 hours'
 GROUP BY tier;
 ```
 ## Viral Day Playbook
 What to do when that tweet hits:
 ### Hour 0-1: Detection
 - Alert: Queue depth > 500
 - Action: Monitor, no intervention needed
 ### Hour 1-2: Escalation
 - Alert: Queue depth > 1000, latency spiking
 - Action:
  - Verify all provider circuits are healthy
  - Check cache hit rate (should be climbing)
  - Prepare to enable aggressive load shedding
 ### Hour 2-4: Peak
 - Alert: Queue depth > 2000, free tier rejections > 30%
 - Action:
  - Enable aggressive load shedding for free tier
  - Send "high demand" email to free users with upgrade CTA
  - Monitor Pro/Enterprise latency (must stay < 30s)
  - Tweet acknowledgment: "We're experiencing high demand due to [reason]. Pro users unaffected."
 ### Hour 4-8: Stabilization
 - Queue draining as cache warms and load shedding works
 - Many users convert to Pro or add BYOK keys
 - Circuits recovering as providers stabilize
 ### Post-Mortem
 - Review metrics: peak queue, rejection rate, conversion rate
 - Adjust tier limits if needed
 - Consider adding provider capacity for sustained growth
 ---
 ## References
 - [Stripe-style rate limiting](https://stripe.com/docs/rate-limits)
 - [Circuit breaker pattern](https://martinfowler.com/bliki/CircuitBreaker.html)
 - [Token bucket algorithm](https://en.wikipedia.org/wiki/Token_bucket)
 - [BloxServer Billing](bloxserver-billing.md) — Tier definitions and pricing
--- a/docs/librarian-architecture.md
+++ b/docs/librarian-architecture.md
@ -0,0 +1,513 @@
 # Librarian Architecture — RLM-Powered Document Intelligence
 **Status:** Design
 **Date:** January 2026
 ## Overview
 The Librarian is an agent that ingests, indexes, and queries large document collections using the **Recursive Language Model (RLM)** pattern. It can handle codebases, documentation, and structured data at scales far beyond LLM context windows (10M+ tokens).
 Key insight from [MIT RLM research](https://arxiv.org/abs/...): Long contexts should be loaded as **variables in a REPL environment**, not fed directly to the neural network. The LLM writes code to examine, decompose, and recursively query chunks.
 ## Architecture
 ```
 ┌─────────────────────────────────────────────────────────────────┐
 │  RLM-Powered Librarian                                          │
 │                                                                  │
 │  ┌───────────────────────────────────────────────────────────┐  │
 │  │ Ingestion Pipeline                                         │  │
 │  │                                                            │  │
 │  │  Source → Detect Type → Select Chunker → Index → Store    │  │
 │  └───────────────────────────────────────────────────────────┘  │
 │                                                                  │
 │  ┌───────────────────────────────────────────────────────────┐  │
 │  │ Query Engine (RLM Pattern)                                 │  │
 │  │                                                            │  │
 │  │  Query → Search → Filter → Recursive Sub-Query → Answer   │  │
 │  └───────────────────────────────────────────────────────────┘  │
 │                                                                  │
 │  ┌───────────────────────────────────────────────────────────┐  │
 │  │ Storage Layer                                              │  │
 │  │                                                            │  │
 │  │  eXist-db (XML) + Vector Embeddings + Dependency Graph    │  │
 │  └───────────────────────────────────────────────────────────┘  │
 └─────────────────────────────────────────────────────────────────┘
 ```
 ## The RLM Pattern
 Traditional LLM usage stuffs entire documents into the prompt. This fails at scale:
 - Context windows have hard limits (128K-1M tokens)
 - Performance degrades with context length ("context rot")
 - Cost scales linearly with input size
 **RLM approach:**
 1. **Load as Variable**: Documents become references, not inline content
 2. **Programmatic Access**: LLM writes code to peek into chunks
 3. **Recursive Sub-Queries**: `llm_query(chunk, question)` for focused analysis
 4. **Aggregation**: Combine sub-query results into final answer
 ```python
 # RLM-style pseudocode
 async def handle_query(query: str, codebase: CodebaseRef):
    # 1. Search index for relevant chunks (not full content)
    hits = await search_index(codebase, query)
    # 2. Filter if too many results
    if len(hits) > 10:
        hits = await llm_filter(hits, query)  # LLM picks most relevant
    # 3. Recursive sub-queries on each chunk
    findings = []
    for hit in hits:
        chunk = await load_chunk(hit)
        result = await llm_query(
            f"Analyze this for: {query}\n\n{chunk}"
        )
        findings.append(result)
    # 4. Aggregate into final answer
    return await llm_synthesize(findings, query)
 ```
 ## Hybrid Chunking Architecture
 Chunking is domain-specific. A C++ class should stay together; a legal clause shouldn't be split mid-sentence. We use a hybrid approach:
 ### Built-in Chunkers (Fast Path)
 | Chunker | File Types | Strategy | Implementation |
 |---------|------------|----------|----------------|
 | **Code** | .c, .cpp, .py, .js, .rs, ... | AST-aware splitting | tree-sitter |
 | **Markdown/Docs** | .md, .rst, .txt | Heading hierarchy | Custom parser |
 | **Structured Data** | .json, .xml, .yaml | Schema-aware | lxml + json |
 | **Plain Text** | emails, logs, notes | Semantic paragraphs | Sentence boundaries |
 These cover ~90% of use cases with optimized, predictable behavior.
 ### WASM Factory (Fallback for Unknown Types)
 For novel formats, the AI generates a custom chunker:
 ```
 User uploads proprietary format
        │
        ▼
 ┌───────────────────────────────────────────────────────────┐
 │ Step 1: Sample Analysis                                    │
 │                                                            │
 │ AI examines sample files:                                 │
 │ - Structure patterns                                      │
 │ - Record boundaries                                       │
 │ - Semantic units                                          │
 └───────────────────────────────────────────────────────────┘
        │
        ▼
 ┌───────────────────────────────────────────────────────────┐
 │ Step 2: Generate Chunker (Rust → WASM)                    │
 │                                                            │
 │ AI writes Rust code implementing the chunker interface    │
 └───────────────────────────────────────────────────────────┘
        │
        ▼
 ┌───────────────────────────────────────────────────────────┐
 │ Step 3: Compile & Validate                                │
 │                                                            │
 │ cargo build --target wasm32-wasi                          │
 │ Test on sample files                                      │
 │ AI reviews output quality                                 │
 └───────────────────────────────────────────────────────────┘
        │
        ▼
 ┌───────────────────────────────────────────────────────────┐
 │ Step 4: Deploy                                            │
 │                                                            │
 │ Store in user's WASM modules                              │
 │ Optional: publish to marketplace                          │
 └───────────────────────────────────────────────────────────┘
 ```
 ### WASM Chunker Interface (WIT)
 ```wit
 // chunker.wit
 interface chunker {
    record chunk {
        id: string,
        content: string,
        metadata: list<tuple<string, string>>,
        parent-id: option<string>,
        children: list<string>,
    }
    record chunker-config {
        file-type: string,
        max-chunk-size: u32,
        preserve-context: bool,
        custom-params: list<tuple<string, string>>,
    }
    // Analyze sample data, return chunking config
    analyze: func(sample: string, file-type: string) -> chunker-config
    // Chunk a file using the config
    chunk-file: func(content: string, config: chunker-config) -> list<chunk>
 }
 ```
 ## Ingestion Pipeline
 ### Step 1: Source Acquisition
 ```python
@dataclass
 class IngestionSource:
    type: Literal["git", "upload", "url", "s3"]
    location: str
    filter: str | None = None  # e.g., "*.cpp", "docs/**/*.md"
 ```
 Supported sources:
 - **Git repository**: Clone and track branches
 - **File upload**: Direct upload via UI
 - **URL**: Fetch remote documents
 - **S3/Cloud storage**: Enterprise integrations
 ### Step 2: Type Detection
 ```python
 def detect_type(file_path: str, content: bytes) -> FileType:
    # 1. Check extension
    ext = Path(file_path).suffix.lower()
    if ext in CODE_EXTENSIONS:
        return FileType.CODE
    # 2. Check magic bytes
    if content.startswith(b'%PDF'):
        return FileType.PDF
    # 3. Content analysis
    if looks_like_markdown(content):
        return FileType.MARKDOWN
    return FileType.PLAIN_TEXT
 ```
 ### Step 3: Chunking
 ```python
 def select_chunker(file_type: FileType, user_config: ChunkerConfig) -> Chunker:
    # User override
    if user_config.custom_wasm:
        return WasmChunker(user_config.custom_wasm)
    # Built-in chunkers
    match file_type:
        case FileType.CODE:
            return TreeSitterChunker(language=detect_language(file_type))
        case FileType.MARKDOWN:
            return MarkdownChunker()
        case FileType.JSON | FileType.XML | FileType.YAML:
            return StructuredDataChunker()
        case _:
            return PlainTextChunker()
 ```
 ### Step 4: Indexing
 Each chunk is indexed in multiple ways:
 | Index Type | Purpose | Implementation |
 |------------|---------|----------------|
 | **Full-text** | Keyword search | eXist-db Lucene |
 | **Vector** | Semantic similarity | Embeddings (OpenAI/local) |
 | **Graph** | Relationships | Class hierarchy, imports, references |
 | **Metadata** | Filtering | File path, type, timestamp |
 ### Step 5: Storage
 ```xml
 <!-- Chunk stored in eXist-db -->
 <chunk xmlns="https://bloxserver.io/ns/librarian/v1">
  <id>opencascade:BRepBuilderAPI_MakeEdge:constructor_1</id>
  <source>
    <repo>opencascade</repo>
    <path>src/BRepBuilderAPI/BRepBuilderAPI_MakeEdge.cxx</path>
    <lines start="42" end="87"/>
  </source>
  <type>function</type>
  <metadata>
    <class>BRepBuilderAPI_MakeEdge</class>
    <visibility>public</visibility>
    <params>const TopoDS_Vertex&amp;, const TopoDS_Vertex&amp;</params>
  </metadata>
  <content><![CDATA[
 BRepBuilderAPI_MakeEdge::BRepBuilderAPI_MakeEdge(
    const TopoDS_Vertex& V1,
    const TopoDS_Vertex& V2)
 {
    // ... implementation
 }
  ]]></content>
  <embedding>[0.023, -0.041, 0.089, ...]</embedding>
 </chunk>
 ```
 ## Query Engine
 ### Query Flow
 ```
 User: "How does BRepBuilderAPI_MakeEdge handle degenerate curves?"
                    │
                    ▼
 ┌───────────────────────────────────────────────────────────┐
 │ Step 1: Search                                            │
 │                                                           │
 │ - Vector search: find semantically similar chunks         │
 │ - Keyword search: "BRepBuilderAPI_MakeEdge" + "degenerate"│
 │ - Graph traversal: class hierarchy, method calls          │
 │                                                           │
 │ Result: 47 potentially relevant chunks                    │
 └───────────────────────────────────────────────────────────┘
                    │
                    ▼
 ┌───────────────────────────────────────────────────────────┐
 │ Step 2: Filter (LLM-assisted)                             │
 │                                                           │
 │ Too many chunks for direct analysis.                      │
 │ LLM reviews summaries, picks top 8 most relevant.         │
 │                                                           │
 │ Selected:                                                 │
 │ - BRepBuilderAPI_MakeEdge constructors (3 chunks)        │
 │ - Edge validation methods (2 chunks)                      │
 │ - Degenerate curve handling (2 chunks)                    │
 │ - Error reporting (1 chunk)                               │
 └───────────────────────────────────────────────────────────┘
                    │
                    ▼
 ┌───────────────────────────────────────────────────────────┐
 │ Step 3: Recursive Sub-Queries                             │
 │                                                           │
 │ For each chunk, focused LLM query:                        │
 │                                                           │
 │ llm_query(chunk_1, "How does this handle degenerate...")  │
 │ llm_query(chunk_2, "What validation happens here...")     │
 │ llm_query(chunk_3, "What errors are raised for...")       │
 │ ...                                                       │
 │                                                           │
 │ 8 parallel sub-queries → 8 focused findings               │
 └───────────────────────────────────────────────────────────┘
                    │
                    ▼
 ┌───────────────────────────────────────────────────────────┐
 │ Step 4: Synthesize                                        │
 │                                                           │
 │ LLM combines findings into coherent answer:               │
 │                                                           │
 │ "BRepBuilderAPI_MakeEdge handles degenerate curves by:   │
 │  1. Checking curve bounds in the constructor...           │
 │  2. Calling BRepCheck_Edge for validation...              │
 │  3. Setting myError to BRepBuilderAPI_CurveTooSmall..."   │
 └───────────────────────────────────────────────────────────┘
 ```
 ### Handler Implementation
 ```python
@xmlify
@dataclass
 class LibrarianQuery:
    """Query the librarian for information."""
    collection: str          # Which indexed collection
    question: str            # Natural language question
    max_chunks: int = 10     # Limit for recursive queries
    include_sources: bool = True
@xmlify
@dataclass
 class LibrarianResponse:
    """Response from librarian with sources."""
    answer: str
    sources: list[SourceReference]
    confidence: float
 async def handle_librarian_query(
    payload: LibrarianQuery,
    metadata: HandlerMetadata
 ) -> HandlerResponse:
    """RLM-style query handler."""
    # 1. Search for relevant chunks
    hits = await search_collection(
        payload.collection,
        payload.question,
        limit=50  # Cast wide net
    )
    # 2. Filter if needed
    if len(hits) > payload.max_chunks:
        hits = await llm_filter_chunks(
            hits,
            payload.question,
            limit=payload.max_chunks
        )
    # 3. Recursive sub-queries
    findings = await asyncio.gather(*[
        llm_analyze_chunk(chunk, payload.question)
        for chunk in hits
    ])
    # 4. Synthesize answer
    answer = await llm_synthesize(findings, payload.question)
    # 5. Build response
    sources = [
        SourceReference(
            path=hit.source_path,
            lines=(hit.start_line, hit.end_line),
            relevance=hit.score
        )
        for hit in hits
    ]
    return HandlerResponse.respond(
        payload=LibrarianResponse(
            answer=answer,
            sources=sources if payload.include_sources else [],
            confidence=calculate_confidence(findings)
        )
    )
 ```
 ## Storage Layer
 ### eXist-db (Primary Store)
 XML-native database for chunk storage and XQuery retrieval.
 **Why eXist-db:**
 - Native XQuery for complex queries
 - Full-text search with Lucene
 - XML validation against schemas
 - Transactional updates
 **Collections structure:**
 ```
 /db/librarian/
 ├── collections/
 │   ├── {user_id}/
 │   │   ├── {collection_id}/
 │   │   │   ├── metadata.xml
 │   │   │   ├── chunks/
 │   │   │   │   ├── chunk_001.xml
 │   │   │   │   ├── chunk_002.xml
 │   │   │   │   └── ...
 │   │   │   └── index/
 │   │   │       └── embeddings.bin
 ```
 ### Vector Embeddings
 For semantic search, chunks are embedded using:
 - OpenAI `text-embedding-3-small` (cloud)
 - Sentence Transformers (local/self-hosted)
 Embeddings stored alongside chunks or in dedicated vector DB (Qdrant/Pinecone for scale).
 ### Dependency Graph
 For code collections, track relationships:
 - **Class hierarchy**: inheritance, interfaces
 - **Imports**: file dependencies
 - **Call graph**: function → function references
 Stored in eXist-db as XML or external graph DB for complex traversals.
 ## Configuration
 ### organism.yaml
 ```yaml
 listeners:
  - name: librarian
    handler: xml_pipeline.tools.librarian.handle_librarian_query
    payload_class: xml_pipeline.tools.librarian.LibrarianQuery
    description: Query indexed document collections
    agent: true
    peers: []  # Terminal handler
    config:
      exist_db:
        url: "http://localhost:8080/exist"
        user_env: EXIST_USER
        password_env: EXIST_PASSWORD
      embeddings:
        provider: openai  # or "local"
        model: text-embedding-3-small
      chunkers:
        code:
          max_chunk_size: 2000
          overlap: 200
        markdown:
          split_on_headings: true
          min_heading_level: 2
 ```
 ### Ingestion API
 ```python
 # Ingest a git repository
 await librarian.ingest(
    source=GitSource(
        url="https://github.com/Open-Cascade-SAS/OCCT",
        branch="master",
        filter="src/**/*.cxx"
    ),
    collection="opencascade",
    chunker_config=CodeChunkerConfig(
        language="cpp",
        max_chunk_size=2000
    )
 )
 # Query the collection
 response = await librarian.query(
    collection="opencascade",
    question="How does BRepBuilderAPI_MakeEdge handle curves?"
 )
 ```
 ## Scaling Considerations
 | Scale | Storage | Search | Compute |
 |-------|---------|--------|---------|
 | Small (<10K chunks) | eXist-db local | In-DB Lucene | Single node |
 | Medium (10K-1M) | eXist-db cluster | + Vector DB | Multi-worker |
 | Large (1M+) | Sharded storage | Distributed search | GPU embeddings |
 ## Security
 - **Collection isolation**: Users can only query their own collections
 - **WASM sandbox**: Custom chunkers run in isolated WASM runtime
 - **Rate limiting**: Prevent abuse of recursive queries
 - **Audit logging**: Track all queries for compliance
 ## Future Enhancements
 1. **Incremental updates**: Re-index only changed files
 2. **Cross-collection queries**: Search across multiple codebases
 3. **Collaborative collections**: Shared team libraries
 4. **Query caching**: Cache common sub-queries
 5. **Streaming ingestion**: Real-time updates from git webhooks
 ---
 ## References
 - [Recursive Language Models (MIT)](docs/mit-paper.pdf) — Foundational research on RLM pattern
 - [tree-sitter](https://tree-sitter.github.io/) — AST-aware code parsing
 - [eXist-db](http://exist-db.org/) — XML-native database
 - [BloxServer Architecture](bloxserver-architecture.md) — Platform overview