Beyond RAG: The Fix Isn't a Better Prompt

You read the first article, and maybe the second. Somewhere around the middle of the second one, you were waiting for me to give you the solution. You may have wondered if I had a better prompt template, or had figured out a setting which never gets turned on, or if there is one weird tool that makes this all work.

In truth, there is no prompt trick, special setting, or weird tool. I thought it would be a good idea to touch on the various issues connected with this topic over the course of multiple articles. But a LinkedIn reader jabbed me in the side in a nice but creative way, essentially asking me for the big reveal. That reader was right, and this is a very important topic that I really don’t think many people have found a solution to. So let’s finish the discovery of the solution together.

The reason your AI can’t find your documents, and can’t cite its sources, and gets worse the more files you give it, is not a problem you can fix at the prompt. It is not a problem you can fix with a bigger context window, or an agentic loop, or a smarter chunking strategy, or a reranker bolted onto the end of the pipeline. All of those are patches. They get you from “broken” to “broken in a different way”. None of them gets you to “problem solved”.

What it takes to actually work is a different architecture. Not a different prompt, or a different model. Not a different vector database. What is needed is a different foundation. And the reason you probably haven’t seen it yet is that building it takes years of engineering, and the AI industry spent the last three years raising capital on the easier thing.

The Fixes People Try First

Before I tell you what works, let me knock down the four things people try when they realize RAG is letting them down. You’ve probably tried at least one of these. You may be in the middle of evaluating a vendor right now who is promising you one of them.

Better prompts. You write a longer system prompt. You add instructions like “only answer if you are confident the source supports the claim” or “cite the exact page number.” The AI ignores you when it feels like ignoring you. Prompts are suggestions, not guarantees. The model’s confidence is unrelated to whether the underlying retrieval actually found the right fragment. You are politely asking a system that retrieves by mathematical similarity to stop doing what it was built to do.

Bigger context windows. A new model launches with a one-million-token context, and the pitch is that you can just put all your documents in the prompt and skip retrieval entirely. One million tokens sounds like a lot until you have ten thousand documents averaging five pages each. And even within the window, models exhibit a well-documented “lost in the middle” effect where they pay less attention to information buried in the middle of long contexts. A bigger window is a bigger room to lose things in, not a fundamentally different way of finding them.

Agentic RAG. The model runs retrieval, reads the results, decides it needs to search again with a different query, runs another retrieval, and so on. This is sold as the cure for RAG’s retrieval problems, but is actually just doing RAG multiple times. If the underlying retrieval returns the wrong chunks because vector similarity doesn’t match your intent, doing it ten times gives you ten chances to retrieve the wrong chunks. The errors compound. The latency explodes. The bill gets bigger.

Better chunking / hybrid search / rerankers. These are some of the considerations: smaller chunks, overlapping chunks, semantic chunks, parent-child chunks, BM25 mixed with vector similarity, or a reranking model that sorts the top 100 results before feeding the top 5 to the LLM. These help at the margins. It does not change the fact that your document has been shredded into fragments and that the system is trying to reconstruct meaning from those fragments. A reranker picking the best of ten bad fragments is still picking bad fragments.

Why None of That Works

Read the list again and notice something: Every one of these fixes happens at query time. You ask a question, the system does something clever to try to find the right answer, and then it hands you a response.

The problem is not at query time. The problem happened before you ever asked anything.

The moment your document got chunked and turned into vectors, its structure was destroyed. Headings were separated from their paragraphs. Tables were broken across chunk boundaries. Page numbers were lost. Section hierarchies were flattened. People, companies, phone numbers, and wallet addresses were all dissolved into statistical soup. By the time your question arrives, there is no document to search. There are only fragments to guess from.

No prompt can rebuild what the ingestion pipeline tore apart. No bigger context window can put the headings back on the paragraphs. No agentic loop can reconstruct the relationships between entities that were never extracted in the first place. You cannot patch retrieval into working when retrieval is the wrong tool for the job.

What Actually Works

Forget everything you’ve been told about how AI document tools work. Start from a blank page and ask what you would build if you wanted accuracy, citations, and scale all at the same time.

You would probably want four things, and none of them are vector similarity.

You would want full-text search on the actual text of the actual document. Not fragments. Not vectors. The words. When someone searches for a wallet address or a person’s name or a phrase, you want the system to find every document that literally contains those words. You want the results to point to the exact location in the exact document, because the search engine is matching real text against real text. You want response times in milliseconds, not seconds, because this is a solved problem and has been since Google launched. The technology for doing this right existed twenty-five years before anyone said the word “embedding.”

You would want structured entity extraction during ingestion, not after. Every person, organization, email address, phone number, wallet address, username — extracted from every document as the document comes in, stored as rows in a real database. Not inferred at query time from fragments the model happens to see. Extracted once, stored with the source document, queryable like any other structured data. If someone asks “show me every document that mentions this person,” the answer is a database query, not a vector search hoping for the best.

You would want a graph of the relationships between those entities. Two shell companies share a registered agent. A phone number appears in a KYC filing and in a corporate incorporation. An email address shows up across three separate matters. The graph surfaces these connections automatically because the entities are structured data and the relationships between them are the first thing you compute. When the AI needs to answer a question that requires connecting things across documents, it queries the graph, not a pile of fragments.

You would want access control that follows the document, not the chunk. Who is allowed to see what is a property of the source material, not an afterthought bolted onto the retrieval layer. When a user searches, the system decides which documents they are permitted to see before it returns any result. Ingestigate does this at the document level, because chunks do not carry ownership, documents do.

If you built all four of those things, you would have an architecture that gets better as you add more documents instead of worse. Citations that are anchored to text a human can read and verify. Relationships that surface automatically and are traceable. Access control you can actually audit.

And nobody would call it RAG, because it isn’t.

What That Looks Like In Practice

I’m going to show you, because at this point in the article you are rightly asking whether any of this is real or just another pitch.

The walkthrough is a fifteen-minute video that lives on the crypto compliance page, linked again at the end of this article. It covers a synthetic cartel-crypto laundering investigation called Operation Broken Circuit. The content is synthetic because we take personally identifiable information seriously, the people, wallets, shell companies, and bank accounts are all fabricated. What is not fake is the shape of the evidence: subpoena responses that arrive as mixed-format dumps, blockchain exports in parquet and JSON, bank statements in PDF and Excel, KYC documents, communications logs, corporate filings. The kind of evidence pile that in a real compliance investigation would take a team weeks to organize before anyone could start investigating.

In it you will see a pile of those files become searchable in minutes. A search for a wallet address pulls every document that contains it, across every file type, with the match highlighted on the page. Entities such as people, companies, phone numbers, and wallet addresses are all extracted automatically and dropped onto a live graph. A shell company connection surfaces because two entities share a registered agent address. Also, because the video is not a slideshow, you can watch the actual response times on the screen. Nothing is sped up.

It is fifteen minutes. It is worth fifteen minutes. If you only have time for the first three, start at the ingestion step and watch the file types scroll by.

The Name of the Thing

The platform in the video is called Ingestigate. I built it because I spent fourteen years processing evidence in federal criminal investigations and I got tired of watching talented people lose weeks to evidence triage that a reasonable architecture would handle in minutes. The enterprise platforms I was issued required months of approvals to upload a single dataset. The AI tools that started showing up around 2023 promised to help and delivered exactly the experience the first two articles described: chunked documents, hallucinated citations, results that degraded with scale.

So I built the thing I wanted to use. Full-text search on the actual text. Entity extraction into structured data during ingestion. A native graph database mapping relationships. Owner-controlled access down to the investigation. The SaaS version starts at $49 per month. The on-premise version runs air-gapped for organizations that need it.

Is there a solution to RAG? Yes, and honestly that solution is Ingestigate. Building Ingestigate is not something that can be easily done. But benefiting from what Ingestigate has solved is something everyone can do.

A Door

If the kind of work you do involves documents where accuracy matters and citations have to hold up, the best next step is fifteen minutes with the walkthrough on the crypto compliance page. The platform is explained in context there.

If you work in legal, insurance, corporate, or compliance instead of financial crimes, the architecture is the same, only the cases change. The solutions pages walk through it in the language of your work: law firms, insurance, corporate legal, and law enforcement.

Or put your own documents in and see what happens. The free trial is fourteen days and does not ask for a credit card.

And if you read the first two articles and felt strung along, thank you for telling me. You were right. The answer deserved to come sooner than article ten. Now it has.

This is the third and final article in the Beyond RAG series. Start at the beginning with Your AI Can’t Find Your Documents or read the previous installment, Your AI Is Hallucinating Its Sources.