Recipe
PDF Q&A
Extract text from any PDF, chunk it into semantic blocks, embed with a transformer model, and ask natural-language questions against the indexed content.
1. Upload PDF
2. Ask a question
How it works
Extract & Chunk
Raw text is pulled from the PDF via a server-side parser. The text is split into overlapping chunks (~512 tokens) with a sliding window to preserve context across boundaries.
Embed & Index
Each chunk is converted to a dense vector using an embedding model. All vectors are stored in an in-memory index for fast cosine-similarity retrieval.
Retrieve
The user question is embedded with the same model. The top-k most similar chunks are fetched via nearest-neighbor search.
Answer
Retrieved chunks are fed as context into an LLM with a system prompt instructing it to answer only from the provided text. Citations are included when possible.