Skip to content

We help companies do RAG on Docs

Upload documents, search thousands of pages, and get relevant markdown with one easy API

How it works

  1. Upload PDF, Word, or other unstructured documents
  2. We identify headers, lists, and paragraphs. Those are indexed.
  3. One API calls searches across all documents, generates markdown chunks, reranks, and returns them

You can adjust chunk size and reranker tolerance per query - no need to reprocess the documents!

The Problem

It’s a common story: A customer needs to be able to search their PDFs. You start chunking with LangChain, and it splits a paragraph in half. Time to fiddle with settings and chunking algorithms for a day. You get to “good enough”, but it still separates paragraphs from their headers, and feels kind of arbitrary.

Now you have chunks. Should you store them in Pinecone? Elastic Search? Postgres? What embedding model do you use? Should you do hybrid search?

Once everything is wired together, can it handle a customer uploading thousand page documents? How does that impact search results?

Our solution

We perform layout analysis on unstructured docs (PDF, Word, etc.), find their logical hierarchy, and split them into Content Blocks:

  • Headers
  • Subheaders
  • List
  • Sub lists
  • Paragraphs
  • Tables (coming soon!)

We index and search these content blocks. Then, we combine the query and search results to dynamically build markdown chunks. This lets users adjust the chunks dynamically for their use case without having to re-process a document. Chunks are passed through a reranker and scored before returning from the /api/search endpoint.

Upload a doc, run a search, and let us worry about the rest.