Recipe

ChromaDB Primer

Embed, store, and query vector data locally with zero infrastructure.

What is ChromaDB?

ChromaDB is an open-source embedding database that runs in-process. It stores documents as high-dimensional vectors and retrieves them by semantic similarity rather than keyword match. Think of it as a search engine that understands meaning.

Why use it?

Zero-dependency Python install — pip install chromadb
Built-in embedding models via Sentence Transformers
Persists to disk with SQLite backend, no server required
Metadata filtering alongside vector search

Quick Start

import chromadb

client = chromadb.PersistentClient(path="./my_db")
collection = client.get_or_create_collection("docs")

collection.add(
    documents=["ChromaDB is great for RAG"],
    ids=["doc_1"]
)

results = collection.query(
    query_texts=["vector databases"],
    n_results=3
)

Key Concepts

Collections

Named groups of embeddings. One per dataset or namespace.

Embeddings

High-dim vectors representing semantic meaning of text.

Metadata

Attach arbitrary key-value filters to every document.

Distance Metrics

Cosine, L2, or IP — choose based on your embedding model.

Pro tip: ChromaDB shines in RAG pipelines. Pair it with an LLM to ground responses in your own documents. See the RAG pipeline recipe for the full stack.