Weaviate Primer
Stand up a vector-native knowledge base with hybrid search, tenant isolation, and zero external embeddings infra.
Why Weaviate
Weaviate bundles vector storage, ANN indexing, and an optional embedding module into a single Go binary. No sidecar model server, no Redis queue for indexing — just one process that speaks GraphQL, REST, and gRPC.
Core Concepts
- Class — analogous to a table. Defines the schema, vectorizer, and index config.
- Object — a row. Holds properties, a vector, and an optional cross-reference.
- Vectorizer — the module that turns text into vectors at insert time (e.g.
text2vec-transformers). - Hybrid search — BM25 keyword + vector similarity fused into one ranked result set.
Multi-Tenant Setup
Use a single Weaviate instance with per-tenant classes (e.g. TenantA_Document) or rely on the native multi-tenancy feature (v1.20+) that partitions objects under one class by tenant key. The latter keeps schema maintenance trivial.
Quick Start
# docker-compose.yml
weaviate:
image: semitechnologies/weaviate:1.24.1
environment:
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
DEFAULT_VECTORIZER_MODULE: text2vec-transformers
ENABLE_MODULES: text2vec-transformers
TRANSFORMERS_INFERENCE_API: http://t2v-transformers:8080
Nimbus Integration
Point the Nimbus loader at your Weaviate endpoint for runtime telemetry clustering. Store session fingerprints as vectors, query nearest neighbors to detect license-sharing rings, and surface anomalies via the dashboard's real-time feed.
Next: Hybrid Search Tuning — dial BM25 weight, alpha, and fusion algorithms for your workload.