Chat with your documents
Chat with your documents (sometimes called “chat with PDF”) is a product pattern where users ask natural-language questions and an LLM answers using retrieved passages from their files. Under the hood this is usually RAG: retrieve relevant chunks, then generate an answer conditioned on that context.
Show citations and sources
Users trust answers when they can verify them. Display document name, page or section, and optionally a short quote from the retrieved chunk. This also helps debug retrieval failures.
Handle “not in documents” gracefully
Prompt the model to refuse or ask a clarifying question when context does not support an answer. This reduces hallucinations and sets expectations—especially for enterprise users.
Conversation memory vs retrieval
Multi-turn chat may need a short summary of prior turns, but each turn should still retrieve fresh evidence from the knowledge base so answers stay tied to documents, not only chat history.
Performance and perceived speed
Stream LLM tokens while retrieval runs once per user message. Show “Searching your documents…” states so latency feels intentional.