Ingest into Atomic Evidence Nodes
Documents are parsed into atomic semantic units called nodes. Each node preserves the exact source text, location (page, row, cell), and metadata. Nodes are immutable and versioned. Spreadsheets create hierarchical nodes: summaries, columns, rows, and cells. PDFs create nodes per page or semantic region. Every node carries full provenance: source file, parser version, extraction timestamp, and content hash.
Embed + Deterministic Retrieval
Embeddings are generated exclusively from the deterministic, versioned text field (projection.text). This ensures semantic meaning is preserved correctly and embeddings remain stable across versions. Retrieval uses cosine similarity over L2-normalized vectors. The system enforces exact fact preservation: numbers, entities, and identifiers must match between source and summary. This prevents LLM hallucination by forcing the model to use exact facts from the evidence layer.
Reflection Cycles: Clusters, Summaries, Entropy + Drift
Semantically similar nodes are clustered using HDBSCAN or KMeans. Each cluster generates a truth-preserving summary that copies all facts exactly from source nodes. The system measures cluster coherence (semantic consistency) and stability (drift over time). Reflection cycles reduce entropy by organizing knowledge hierarchically. Clusters become reflection nodes that can themselves be clustered at higher levels, creating a recursive knowledge structure. The system detects fixpoints: stable knowledge states where further reflection produces no change.
Checkpoints / Versioned Knowledge State
The entire knowledge state can be checkpointed: all nodes, clusters, embeddings, and metadata. Checkpoints enable rollback, audit trails, and compliance. You can diff between checkpoints to see what changed: new nodes, modified clusters, entropy shifts. This versioning makes Amnesis suitable for regulated environments where auditability is required. Knowledge state is deterministic and reproducible.
Why Amnesis Is Not Commodity RAG
Typical RAG is probabilistic: similar chunks land in a prompt and the model may still lean on pre-training priors—neither stable RAG nor truthful RAG for sign-off. Amnesis is an embeddings-backed memory plane that injects your nodes and vectors into inference so answers are conditioned on preserved, attributable evidence, with truth preservation enforced in the system. That is better RAG (right material), stable RAG (versioned checkpoints, measured drift), and truthful RAG (provable chain to sources)—perfect recall of what you governed, while the base LLM stays unchanged when facts and policy evolve.
Enterprise Use Cases
Legal document analysis with exact citation preservation. Financial reporting with auditable fact chains. Medical records with versioned knowledge states. Regulatory compliance with full provenance tracking. Research documentation with deterministic retrieval. Contract analysis with truth-preserving summaries. Knowledge bases that must maintain accuracy over time without silent degradation. Any use case where "good enough" retrieval is insufficient and exact fact preservation is required.