Every operator has the same drawer. Leases. Vendor agreements. Insurance binders. Signed PDFs nobody opens until a renewal date sneaks up or someone asks a question you can't answer from memory.
Mine lived across hundreds of files. The answer to "what's the renewal option on this commercial lease" was in there. So was the percentage-rent clause, the CAM share, the insurance limit. I just couldn't get to any of it without opening the right PDF, scrolling to the right page, and squinting at a number I'd then have to trust I read correctly.
So I pointed my OpenClaw (@steipete's agent) at the whole drawer and made it answer questions. Not "summarize this contract." Answer the actual question, with the clause, the page, and the signed document open as proof.
The part most people get wrong: snippets lie
The lazy way to do this is the standard RAG pipeline. Embed every document, retrieve the top few text snippets for a question, hand them to the model, done. That works right up until it doesn't — and on contracts it doesn't.
The number you actually need is usually a fillable form field, a DocuSign overlay, or a clause buried past the ~600-character snippet window. A truncated snippet will confidently hand you the wrong figure. On a signed lease that's not a typo, that's a decision made on bad data.
So this isn't a one-shot retrieval pipeline. It's an agentic loop with two tools. search_index finds the candidate files. Then read_document opens the actual file with PyMuPDF and pulls the real page text — and the form-widget layer with it. That last part matters: page.widgets() reads the values sitting in a PDF's annotation layer, the ones a normal text extract misses entirely. Box numbers on a tax form, fillable contract values, DocuSigned overlays — they come through without OCR.
The system prompt has one non-negotiable rule: always open the real file for tax forms, signed leases, and fillable PDFs. Snippets are for finding the document. They are never the final answer on a number or a clause. The model has to open the page and read it before it's allowed to tell me anything.
The stack, since that's the interesting part
Embeddings: Voyage voyage-3-large. And here's the thing nobody tells you — newer isn't automatically better. I ran a controlled head-to-head on my own documents. voyage-3-large beat voyage-4-large. It beat voyage-context-3 outright, which choked on a long lease because of a per-document token cap. Voyage's own benchmark says 4 edges out 3 by under two points, but that's a general, multilingual number. It doesn't transfer to dense legal English in 500-word chunks. So I test the model on the real corpus before I lock it, every time. The published number is not the answer for your documents.
Retrieval runs over a local index — a rebuildable pickle for a corpus this size, LanceDB when it gets bigger or needs to keep matters walled off from each other. Synthesis is Claude (Opus 4.5) running the tool loop. Scanned PDFs with no text layer fall back to ocrmac — Apple's Vision framework, on-device, free, no document leaving the machine for OCR. A context-graph.sqlite keeps a local map of every source, chunk, and entity, so even a file that failed OCR still returns an exact locator instead of nothing.
One detail I had to learn the hard way: Voyage's embed batch caps at 120K tokens, and its tokenizer runs about two tokens per English word, not the 1.4 you'd guess from GPT-style math. Use the wrong estimate and a naive batch silently drops chunks and burns credit while you think it worked. Token-aware batching with split-on-failure, or you're indexing air.
What it actually does now
I ask it a question about a commercial lease in plain English. It finds the right signed PDF, opens it, reads the clause off the page, and answers with the verbatim language, the dollar figure, and the page number. Renewal options, percentage rent, CAM, insurance limits — the things that live in the drawer until they cost you.
The drawer didn't get smaller. It started answering.
No SaaS does this, and the reason is structural. The off-the-shelf document tools hand the model a truncated snippet and hope. The whole point here is the verification step — open the real file, read the real page, cite it — which is the difference between a tool that summarizes a contract and one that answers a question about it correctly.