Making the Epstein Files Searchable — Why I Built This and How It Works

Adam Rutkowski
February 6, 2026
8 min read
transparencyinvestigationsentity-extractionsearchepstein-files

On January 30, 2026, the Department of Justice released over 3 million pages of Epstein-related files to the public. It was a significant step — the result of the Epstein Files Transparency Act, which passed the House 427-1 and was signed into law in November 2025.

The DOJ made the files available on their website. But the search tools and the quality of the text extraction don’t make the information easily accessible to the average person. Three million pages of PDFs and scanned documents need great tools to be truly useful.

I built a tool to fix that. It’s free, and it’s live now.

Search the Epstein Files →


Who I Am

I’m Adam Rutkowski, a former Federal Special Agent. I spent over 14 years as a Federal Special Agent, where I was the lead agent on what became the largest cryptocurrency enforcement case in U.S. history — multiple petabytes of data over the course of the investigation. My name is specifically mentioned in this Washington Post feature on the unit’s work.

After leaving federal service, I built Ingestigate — a data exploration and investigation platform designed for processing massive volumes of mixed-format files. The platform does what I spent my career needing a tool to do: take mountains of documents and make them searchable, extract the important information automatically, and let users explore how that information connects.


What I Did With the Epstein Files

When the DOJ released the files, I loaded them into Ingestigate. Here’s what the platform does with them:

Search in milliseconds. Every ingested document is fully searchable. Type any name, phone number, email address, or keyword and get results in under a second.

Automatic entity extraction. During ingestion, a custom entity extraction pipeline identifies every person, organization, email address, phone number, and other important identifiers in every document. This happens automatically — no manual tagging, no human review of each document. The entities are immediately available for browsing and exploration.

Relationship graph tool. Ingestigate gives users the ability to explore relationships between extracted entities on a graph. You can see who appears alongside whom, across how many documents, and in what context. You can traverse the network visually — following connections from one entity to another, discovering patterns that would take weeks to find manually.

Improved text extraction. Many of the DOJ’s released documents are scanned images. The text recognition quality in the original release isn’t great. I re-processed the scanned documents to improve the quality of the extracted, searchable text.


What’s Missing

Dataset 9. The DOJ organized the releases into numbered datasets. The .zip file for Dataset 9 was removed from justice.gov and is no longer available for download. Unlike Datasets 10 and 11 — which were archived by third parties, no complete public copy of Dataset 9’s .zip file has surfaced. I’m working on extracting what content remains accessible. If you have a copy or know where to find one, please contact me at team@ingestigate.com.

Ingestion is ongoing. Large swaths of documents have been ingested, with more being processed. The full release is massive. I’ll update this post as ingestion progresses.


What This Version Can and Can’t Do

This is a purpose-built public edition of Ingestigate, tuned specifically for the Epstein files. Here’s what that means:

You can:

  • Search across all ingested documents
  • Browse extracted entities (people, organizations, emails, phone numbers)
  • Explore relationships between entities on a graph
  • View the extracted text of every ingested document

You can’t:

  • Upload new files or data (this is a read-only instance)
  • Download or export original source files — originals are deleted after processing. This tool focuses on the text content within documents. The Epstein files may contain sensitive images, and I’m not in the business of hosting them. The DOJ hosts the originals on justice.gov.

Why I Did This

I spent 14 years processing exactly this kind of evidence — massive, messy document sets where the connections between documents matter as much as any single page. I built Ingestigate because the tools available to investigators were too slow, too expensive, and too locked down.

When the Epstein files dropped, I saw the same problem I’d spent my career fighting: important information buried in a mountain of documents that nobody has the tools to search through effectively. The public deserves high quality tools for something like this. So I made the files searchable, extracted the entities, and gave people the tools to explore the relationships.

It’s free. No catch. Register with an email and password — no credit card, no trial expiration — and start searching.


About Ingestigate

What you see on the Epstein files site is a fraction of what Ingestigate can do. The full platform:

  • Processes over 1,000 file formats — PDFs, Word, Excel, images, CSV, JSON, Parquet, database exports, blockchain data, scanned documents via OCR, and more
  • Extracts entities automatically during ingestion — people, organizations, emails, phone numbers, cryptocurrency addresses, usernames
  • Provides relationship graph tools for exploring connections between entities across documents
  • Delivers sub-millisecond search across millions of documents
  • Enterprise-grade access controls — owner-based authorization, complete audit logging, SSO/MFA support
  • Deploys anywhere — SaaS (live in 5 minutes), on-premise (your data center), or air-gapped networks
  • API-first architecture — everything the UI can do, the API can do

If you work in law enforcement, at a law firm, in financial compliance, research, or any field where you need to make sense of large volumes of documents — visit ingestigate.com to learn more.


Get In Touch