Semantic search & vector search for documents

Semantic search finds content by meaning, not exact keywords. In practice, text is converted to vectors (embeddings) with a neural model; queries and documents live in the same vector space, and nearest-neighbor search returns the most similar chunks. This is the retrieval layer behind most modern RAG and AI knowledge bases.

Semantic search vs keyword search

Keyword (BM25, inverted index) matches tokens and synonyms lists. It is fast and interpretable but struggles with paraphrases (“cancel subscription” vs “stop billing”). Semantic search captures intent when users don’t use the same words as your docs. Many production systems combine both: hybrid search plus reranking.

How vector search works

Each text chunk is embedded into a high-dimensional vector. The query is embedded with the same model. Similarity is usually cosine similarity or dot product. Results are the top-k vectors closest to the query. Quality depends on embedding model choice, chunking (chunking guide), and domain fit.

Reranking

First-stage retrieval casts a wide net (high recall). A second stage—cross-encoder reranking or an LLM—reorders candidates for precision before generation. This reduces noise in the LLM context window.

RAG connection

Vector search is the “retrieval” in retrieval-augmented generation. After retrieval, chunks feed the LLM. See How to build a RAG application and Embeddings for RAG.

WeKnora

WeKnora includes semantic retrieval, reranking options, and integration with LLMs so you can ship document Q&A without building vector search from scratch.

Semantic retrieval features All guides