pgvector Primer

Embedding storage and similarity search inside PostgreSQL — no separate vector database required.

Why pgvector?

pgvector adds a native vector column type to Postgres. You store embeddings alongside your relational data, run approximate nearest-neighbor queries with ivfflat or hnsw indexes, and keep everything inside one database — simpler ops, fewer moving parts.

Quick setup

CREATE EXTENSION vector;

CREATE TABLE docs (
  id    bigserial PRIMARY KEY,
  body  text,
  emb   vector(1536)
);

CREATE INDEX ON docs
  USING hnsw (emb vector_cosine_ops);

The index builds in memory and supports cosine, L2, and inner product distance operators.

Similarity search

SELECT id, body, 1 - (emb <=> $1) AS sim
FROM docs
ORDER BY emb <=> $1
LIMIT 10;

<=> is cosine distance. Subtract from 1 to get similarity. For L2 use <->.

Embedding generation

Generate embeddings with OpenAI text-embedding-3-small or any compatible model. Store the float array directly into the vector column. Batch inserts with COPY for throughput.

When to use it

•Semantic search over documentation or knowledge bases
•Recommendation engines with user + item embeddings
•RAG pipelines where Postgres already holds the source data
•Teams avoiding infra sprawl from standalone vector DBs

Next step

Combine pgvector with OpenAI embeddings for a complete semantic search pipeline in under 50 lines.