Three Architectures for Grounded Retrieval

Extract-First, Retrieve-First, Look-Up-First

A working paper from ARAMAI · April 2026

—

1. The moment we're in

Within the span of a few months, the GraphRAG conversation has gone from a niche pattern to a category. Microsoft shipped GraphRAG. Neo4j released its GraphRAG Python package. Tencent's Youtu Lab published a "vertically unified" GraphRAG agent paradigm. Samyama announced a unified graph-vector engine with agentic enrichment. Cardiff University's Wassim Jabi added a GraphRAG class to TopologicPy. Qdrant has documented hybrid GraphRAG patterns. Cloudflare and Anthropic are shipping primitives that — explicitly or implicitly — assume something like graph-backed grounding underneath.

The consensus is real. The disagreement is buried.

Underneath the converging language, three distinct architectural stances are visible. They are not three implementations of the same idea. They are three different answers to a deeper question: where does authority live in an AI system? And because hallucination is at root an authority problem, the answer determines how the system fails — not whether.

This paper names the three stances, draws them, and proposes a frame for choosing between them. The names we'll use are:

Extract-first — the LLM constructs the graph, then the system retrieves from it.
Retrieve-first — the graph (or vector store) exists already; the system retrieves chunks and the LLM synthesizes.
Look-up-first — schema authority is queried first, and a measured sufficiency threshold gates whether the LLM fires at all.

The three are not substitutes. They are complementary, but only when the system knows which one is governing at any given moment. We'll close with that synthesis.

—

2. Why "where authority lives" is the right frame

Most architectural debates in this space proceed in the wrong currency. They argue about retrieval mechanics — vector vs. graph, dense vs. sparse, hybrid weighting, reranker choice. These matter, but they're downstream. The upstream question is older and simpler: when the system commits to what it considers true, who or what made that call?

Three candidates are on offer:

The model itself. The system trusts what the LLM produced — at extraction time, at synthesis time, or both. Authority is constructed by inference.
The substrate. The system retrieves something previously stored and lets the LLM use it as advisory context. Authority exists but doesn't constrain generation.
The schema. The system queries a formal authority before the LLM is invited into the loop, and only invites it when authority is measurably insufficient. Authority governs generation.

The choice between these isn't aesthetic. It determines where a deployed system can be wrong without knowing it. Plausibility is not verifiability. A confidently rendered answer that is structurally ungrounded fails differently — and worse — than a refusal grounded in declared insufficiency.

This is the heart of the principle our work has been organized around: look up before you make up. The principle is older than ARAMAI, but most of the field treats it as advice. We've been treating it as architecture.

—

3. Architecture 1: Extract-First

Pattern in plain terms: Take a corpus of unstructured text. Send it to an LLM. Ask the LLM to identify entities and relationships. Store what it produces as a graph. At query time, retrieve a subgraph and send it back to an LLM (the same one or a different one) for synthesis.

The reference implementations are well-known. Microsoft's GraphRAG popularized the pattern at scale; entity extraction, community detection, hierarchical summarization, then retrieval at multiple levels of abstraction. TopologicPy's v0.9.22 GraphRAG class offers a more deliberate variant — node-by-node construction with optional human-in-the-loop review. Tencent's Youtu-GraphRAG generalizes the approach with a "seed schema" that bounds what the extraction agent looks for, then a retrieval agent interpreting the same schema at query time. The variations differ in care, but the load-bearing assumption is shared: the LLM is the structuring agent.

Where authority lives: constructed by the LLM during ingestion. The graph is not consulted as a prior — it is a precipitate of inference, and the system has no independent reason to disagree with what it produced.

What this architecture is good at. Discovery in domains with no prior structure. Sense-making across previously-unindexed corpora. Generating a "first-pass" graph on a body of literature where no formal schema exists.

Failure modes that don't go away. Three are structural: (1) extraction errors crystallize into "facts" the system cannot disagree with; (2) the system never disagrees with its own constructed authority — closed loop; (3) human-in-the-loop helps and doesn't scale, and the reviewer is reviewing the LLM's output, not ground truth.

Where it belongs. First-pass discovery in genuinely unstructured domains. Treat the output as draft, not as authority.

—

4. Architecture 2: Retrieve-First

Pattern in plain terms: A graph or vector store already exists. At query time, embed or parse the query, retrieve relevant chunks or subgraphs, and send the retrieved context to an LLM along with the original question. The LLM synthesizes the answer.

This is the architectural workhorse of contemporary RAG. Classic vector RAG sits here. Most production "GraphRAG" deployments — including Neo4j's GraphRAG Python package retrieval flows and the various Qdrant + graph hybrids — are retrieve-first.

Where authority lives: in the substrate, but not at the point of generation. The retrieved context is advisory. The LLM remains free to ignore it, partially use it, or extrapolate beyond it.

Failure modes that don't go away. (1) The 68% production accuracy ceiling on schema-dependent workloads. The Sage Bionetworks deployment quantified the gap precisely with an 87% error reduction by moving off chunked retrieval. (2) No principled "I don't know" — the LLM with retrieved context will always synthesize something; the user cannot distinguish a grounded answer from one reconstructed when retrieval was thin.

A subtler failure mode is what the SCR formal specification calls the component versus chunk problem. Algorithmic chunk boundaries destroy semantic structure regardless of retrieval quality.

Where it belongs. Augmentation tasks. Treat retrieve-first as a useful tool with a known accuracy ceiling, not as a verification architecture.

—

5. Architecture 3: Look-Up-First

Pattern in plain terms: A query arrives. The system first consults schema and graph authority. A formal sufficiency check determines whether the authority is enough to answer the question deterministically. If it is, the answer comes from authority directly, with provenance. If it isn't, the system declares the deficit explicitly and either refuses or invokes the LLM with the deficit declared, verifying the LLM's output back against the schema before returning it.

This is the architecture our work has been organized around. Component pieces: SCR (Structured Context Retrieval), ROSETTA (schema orchestration), SAFE (schema-in-the-loop), EDFL/ISR (mathematical sufficiency), BNS Tripartite Verification (L1 SHACL + L2 ISR/EDFL + L3 SMT/Z3). The principle: FP-002: Look Up Before You Make Up.

Where authority lives: prior, governing, and measurable. Schema authority is the structural harness that makes self-invalidation non-optional (Forest Mars formulation, March 2026).

Production evidence. Mayo Clinic, PayPal, Microsoft, Adobe, Cisco, UnitedHealthcare, Eli Lilly: 92% accuracy versus 68% ceiling, 90% hallucination reduction.

Costs. (1) Requires structured authority — cannot start where no schema exists. (2) Demands product-level honesty about insufficiency: calibrated refusal as a feature, not a bug.

Where it belongs. Anywhere "the system was confident" is not enough. Anywhere provenance is required. Anywhere verifiability is the threshold property.

—

6. The unifying view

The three architectures are most cleanly seen along a single axis: when does authority enter the loop?

Extract-first places authority after generation.
Retrieve-first places authority around generation.
Look-up-first places authority before generation.

The choice is not academic. It determines, structurally, where the system can be wrong without knowing it. Extract-first systems are wrong about the structure they constructed and have no way to detect it. Retrieve-first systems are wrong about the synthesis the LLM produced from the retrieval. Look-up-first systems are wrong only when the schema is wrong — which is auditable, governable, and improvable as a first-class concern.

—

7. ISR — the measurement that makes look-up-first rigorous

ISR = (K_available + K_fetched) / K_required

When ISR ≥ 1.0, the schema-grounded retrieval is sufficient and the system can answer deterministically. When ISR < 1.0, the system has measured its own deficit and must either enrich (fetch more) or refuse. There is no "make something up to fill the gap" path, because the gap is named.

ISR is one layer in a broader verification stack — BNS Tripartite Verification: L1 Symbolic (SHACL / ShEx) validates that shape is structurally correct; L2 Bayesian (ISR / EDFL) validates that the pattern is statistically plausible given priors; L3 Formal (SMT / Z3) validates that the full chain is logically consistent.

The look-up-first architecture requires ISR. Without it, "schema-first" collapses back into "schema-flavored" — schema as decoration rather than as authority.

—

8. The deeper synthesis

The three architectures map onto a deeper triad: K-ANABASIS-TRIAD — Look Up / Learn Up / Make Up.

A mature system uses all three. They are not in competition. More learning creates more authority to look up. More looking up creates behavioral data to learn from. Less making-up means higher accuracy, more trust, more adoption, more learning.

Extract-first over-indexes on Learn Up + Make Up — closed loop, no genuine prior.
Retrieve-first over-indexes on Make Up — authority unenforced.
Look-up-first, when paired with ANABASIS as a learning layer, realizes the full triad — authority governs, grows, and yields to inference only when measurably insufficient.

The mistake the field is making is presenting one architecture as the answer rather than as a part of an answer. The discipline is to know which architecture is governing at any given moment, and to ensure the highest-stakes paths are governed by the most authority-respecting one.

—

9. Practical implications

High-stakes, regulated, or verifiable domain? Look-up-first is required.
Greenfield with no schema? Extract-first as first pass, with intent to graduate.
Augmentation where users verify by inspection? Retrieve-first acceptable. Be honest about its ceiling.
Mature system spanning all three? Use them in their proper layer. Always know which is governing.

T-SchemaContext frames our position: foundational to context retrieval tools, not competitive with them. The market is converging on context retrieval. The unsolved problem is context governance.

—

10. Closing

"Look up before you make up" is not a slogan. It is an architectural commitment about where authority lives in an AI system. The next decade of AI will be defined less by model capability than by how systems answer the authority question — at scale, in production, under regulatory and reputational pressure.

The principle is older than ARAMAI. The practice of operationalizing it is what we've been doing the longest.