Documentation

Embeddings guide

Everything you need to generate, store, and compare vector embeddings with the OpenAI /v1/embeddings endpoint. Covers both text-embedding-3-small and legacy text-embedding-ada-002 models.

Request / response shape

POST https://api.openai.com/v1/embeddings
Content-Type: application/json
Authorization: Bearer $OPENAI_API_KEY

{
  "input": ["Your text here", "Another chunk"],
  "model": "text-embedding-3-small",
  "dimensions": 1536
}
// Response
{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0012, -0.0045, ...]  // length = dimensions
    }
  ],
  "model": "text-embedding-3-small",
  "usage": { "prompt_tokens": 8, "total_tokens": 8 }
}

Model comparison

ModelMax dimensionsDefault dimsMTEB scorePricing
text-embedding-3-small1,5361,53662.3%$0.02 / 1M tokens
text-embedding-3-large3,0723,07264.6%$0.13 / 1M tokens
text-embedding-ada-0021,5361,53661.0%$0.10 / 1M tokens

text-embedding-3-small supports a dimensions parameter (8–1536). Lower dimensions reduce storage and speed up similarity search with minimal accuracy loss. ada-002 is fixed at 1536 and does not accept the parameter.

Python

import openai

client = openai.OpenAI()

resp = client.embeddings.create(
    model="text-embedding-3-small",
    input=["Meridian is a vector database."],
    dimensions=512,  # optional; omit for full 1536
)

vector = resp.data[0].embedding
print(f"Dimension: {len(vector)}")  # 512

TypeScript

const resp = await fetch("https://api.openai.com/v1/embeddings", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
  },
  body: JSON.stringify({
    model: "text-embedding-3-small",
    input: ["Meridian is a vector database."],
    dimensions: 512,
  }),
});

const json = await resp.json();
const vector: number[] = json.data[0].embedding;
console.log(vector.length); // 512

Pooling and tokenization

OpenAI embeddings use a transformer encoder. The final embedding is produced by mean-pooling token-level hidden states across the sequence dimension, then applying an L2 normalization layer. You do not need to implement pooling yourself — the API returns a single vector per input string.

  • Chunk size matters. The model has an 8,191-token context window for text-embedding-3-small and 2,046 for ada-002. Inputs exceeding the limit are truncated.
  • Cosine similarity is standard. Since vectors are L2-normalized, dot product equals cosine similarity. Use it for nearest-neighbor search.
  • Batch when possible. Send up to 2,048 input strings per request to reduce round-trips and improve throughput.