← Ask the Declaration
Under the Hood

How This Works

A complete retrieval pipeline with no servers, no API keys, and no tracking. Your question never leaves your browser. Here is the full engineering story.

§1 The Pipeline

1

Chunk

The Declaration, the Constitution, the Bill of Rights, and Federalist Nos. 10, 51 & 78 are parsed into 91 citation-ready passages along the documents' own structure (more on why below).

2

Embed

Each passage is encoded offline into a 384-dimension vector with all-MiniLM-L6-v2 (mean pooling, L2-normalized). The whole searchable index ships as one static file under half a megabyte. No vector database at this scale, and that is a feature, not a shortcut.

3

Search

Your question is embedded in this browser tab by the same model, running as quantized ONNX via Transformers.js. Because all vectors are normalized, cosine similarity reduces to a dot product: 91 × 384 multiplications, done in under a millisecond. The same query embedding is also matched against a set of curated questions; when it lands close to one, you get a human-written short answer above the passages.

4

Cite

The top passages are returned with their source, section, and year. No generation step means no hallucination: every quoted word was written in 1776, 1787, 1788, or 1791.

§2 Why 91 Passages, Not 500-Token Windows

The default way to chunk documents for retrieval is mechanical: slice the text into fixed windows of a few hundred tokens, with some overlap so nothing falls between the cracks. It works, and it is also why so many retrieval systems cite "chunk 47" and hand you a passage that starts mid-sentence.

These documents deserve better, and they make it easy. The founding documents carry their own semantic boundaries: each grievance in the Declaration is one complete complaint against the king. Each section of the Constitution is one power or one limit. Each amendment is one right. Chunk along those seams and every unit of retrieval is a complete thought with a name a human already understands.

That is why a result here says Article I, Section 8 or Grievance 17 instead of a byte offset. A journalist can quote it. A teacher can assign it. A lawyer would recognize the citation. The retrieval system speaks the same language as the document it searches.

The transferable rule: before reaching for a token splitter, ask what the document's own atomic unit is. Contracts have clauses. API docs have endpoints. Runbooks have steps. Structure-aware chunking costs one afternoon of parsing and pays out every single query.

§3 The Economics of Zero

First, precision: this is not a "local LLM." It is a local embedding model, about 25 MB of quantized ONNX, running in your browser tab. There is no text generation anywhere in the system, which is why it cannot hallucinate and why the numbers below look the way they do.

Cost lineTypical hosted RAGThis site
Query embeddingAPI call, metered per token$0 — computed in your browser
Vector searchHosted vector DB, monthly fee$0 — a dot product over a static file
Answer generationLLM call, cents per query$0 — curated text, written once
Keys & rate limitsKeys to protect, quotas to hitNone exist
Cost if it goes viralScales with every visitorFlat. CDN serves static files

The only real cost is the one-time model download on a visitor's first search, cached by the browser afterward. That is the honest tradeoff, and for a public demo it is the right one: a demo that costs money per query dies the day it goes viral. This one cannot, because its marginal cost per query is zero at ten visitors and zero at ten million.

§4 When to Use This Pattern, and When Not To

Use it for small, public, read-heavy corpora: documentation sites, legal and policy texts, product manuals, FAQ knowledge bases. Anywhere under a few thousand chunks where the content is not secret and the questions repeat.

Do not use it for private data (the whole index ships to every visitor), for large corpora (the index download outgrows its welcome), or when users need generated prose rather than retrieved passages. Engineering maturity is matching the architecture to the problem, not to the hype cycle.

§5 What Is Curated vs. Computed

Honesty about the seams: the AI here does retrieval and question-matching, nothing else. The plain-words explainers under each passage, and the short answers for common questions, were written by a person at build time and shipped as static text. The founders' words are quoted exactly from the public domain Project Gutenberg editions. Each layer is labeled in the interface, so you always know who is talking: 1776, or 2026.

✦ ✦ ✦

Retrieval that citizens can quote, at a cost that virality cannot kill. The architecture is the product decision.

The entire build is open source: github.com/swapniltamse/ask-the-declaration