← Back to docs

Safetensors Format Primer

Safetensors is a fast, zero-copy tensor serialization format designed by Hugging Face to replace pickle-based weights. It eliminates arbitrary code execution risk, supports memory-mapped loading, and ships a deterministic JSON header for every payload Meridian streams to your inference workers.

1. File Layout

Every safetensors file starts with an 8-byte little-endian unsigned integer giving the header length, followed by a UTF-8 JSON header describing each tensor (dtype, shape, byte offsets), then the raw tensor payload concatenated contiguously.

[ 8 bytes ] header_size (u64 LE)
[ N bytes ] JSON header
  {
    "weight.0": {
      "dtype": "F16",
      "shape": [4096, 4096],
      "data_offsets": [0, 33554432]
    },
    "__metadata__": { "format": "pt" }
  }
[ rest    ] raw tensor bytes

2. Loading on Meridian

Meridian workers mmap the payload region read-only and hand slices directly to CUDA via cudaHostRegister. No allocation, no copy, no deserializer running attacker-controlled code. The JSON header is validated against an allow-list of dtypes (F16, BF16, F32, I8) before any tensor is exposed to the runtime.

3. Why It Beats Pickle

Pickle executes arbitrary Python during torch.load, which means a poisoned checkpoint can pop a shell on your GPU box. Safetensors carries data only. Combined with content-addressed storage on Meridian, every weight you pull is reproducible, verifiable, and safe to swap between tenants without a sandbox.