pgvector Primer
Embedding storage and similarity search inside PostgreSQL — no separate vector database required.
Why pgvector?
pgvector adds a native vector column type to Postgres. You store embeddings alongside your relational data, run approximate nearest-neighbor queries with ivfflat or hnsw indexes, and keep everything inside one database — simpler ops, fewer moving parts.
Quick setup
CREATE EXTENSION vector;
CREATE TABLE docs (
id bigserial PRIMARY KEY,
body text,
emb vector(1536)
);
CREATE INDEX ON docs
USING hnsw (emb vector_cosine_ops);The index builds in memory and supports cosine, L2, and inner product distance operators.
Similarity search
SELECT id, body, 1 - (emb <=> $1) AS sim
FROM docs
ORDER BY emb <=> $1
LIMIT 10;<=> is cosine distance. Subtract from 1 to get similarity. For L2 use <->.
Embedding generation
Generate embeddings with OpenAI text-embedding-3-small or any compatible model. Store the float array directly into the vector column. Batch inserts with COPY for throughput.
When to use it
- •Semantic search over documentation or knowledge bases
- •Recommendation engines with user + item embeddings
- •RAG pipelines where Postgres already holds the source data
- •Teams avoiding infra sprawl from standalone vector DBs
Next step
Combine pgvector with OpenAI embeddings for a complete semantic search pipeline in under 50 lines.