Recipe

PDF Q&A

Extract text from any PDF, chunk it into semantic blocks, embed with a transformer model, and ask natural-language questions against the indexed content.

extractchunkembedanswer

1. Upload PDF

2. Ask a question

How it works

Extract & Chunk

Raw text is pulled from the PDF via a server-side parser. The text is split into overlapping chunks (~512 tokens) with a sliding window to preserve context across boundaries.

Embed & Index

Each chunk is converted to a dense vector using an embedding model. All vectors are stored in an in-memory index for fast cosine-similarity retrieval.

Retrieve

The user question is embedded with the same model. The top-k most similar chunks are fetched via nearest-neighbor search.

Answer

Retrieved chunks are fed as context into an LLM with a system prompt instructing it to answer only from the provided text. Citations are included when possible.