Recipe
ChromaDB Primer
Embed, store, and query vector data locally with zero infrastructure.
What is ChromaDB?
ChromaDB is an open-source embedding database that runs in-process. It stores documents as high-dimensional vectors and retrieves them by semantic similarity rather than keyword match. Think of it as a search engine that understands meaning.
Why use it?
- Zero-dependency Python install — pip install chromadb
- Built-in embedding models via Sentence Transformers
- Persists to disk with SQLite backend, no server required
- Metadata filtering alongside vector search
Quick Start
import chromadb
client = chromadb.PersistentClient(path="./my_db")
collection = client.get_or_create_collection("docs")
collection.add(
documents=["ChromaDB is great for RAG"],
ids=["doc_1"]
)
results = collection.query(
query_texts=["vector databases"],
n_results=3
)Key Concepts
Collections
Named groups of embeddings. One per dataset or namespace.
Embeddings
High-dim vectors representing semantic meaning of text.
Metadata
Attach arbitrary key-value filters to every document.
Distance Metrics
Cosine, L2, or IP — choose based on your embedding model.
Pro tip: ChromaDB shines in RAG pipelines. Pair it with an LLM to ground responses in your own documents. See the RAG pipeline recipe for the full stack.