Back to DocsRecipe

Image Search (CLIP / Vision)

Design for a semantic image search pipeline powered by CLIP embeddings and a vector database. Users upload a photo and receive visually similar results in under 300ms.

Ingest

  • 1.Upload raw image to S3-compatible bucket
  • 2.Generate 512-d CLIP ViT-B/32 embedding via ONNX Runtime
  • 3.Store vector + metadata in pgvector with IVFFlat index

Query

  • 1.Accept JPEG/PNG/WEBP via presigned upload URL
  • 2.Compute embedding with same CLIP model
  • 3.Cosine-similarity top-K with distance < 0.25 threshold

Stack

ONNX RuntimepgvectorMinIOFastAPIRedisCLIP ViT-B/32

Latency target: <300ms p95 end-to-end. Embedding computation dominates at ~120ms on CPU; the rest is network + index scan. Warm embeddings cached in Redis with 24h TTL.