Safetensors Format Primer
Safetensors is a fast, zero-copy tensor serialization format designed by Hugging Face to replace pickle-based weights. It eliminates arbitrary code execution risk, supports memory-mapped loading, and ships a deterministic JSON header for every payload Meridian streams to your inference workers.
1. File Layout
Every safetensors file starts with an 8-byte little-endian unsigned integer giving the header length, followed by a UTF-8 JSON header describing each tensor (dtype, shape, byte offsets), then the raw tensor payload concatenated contiguously.
[ 8 bytes ] header_size (u64 LE)
[ N bytes ] JSON header
{
"weight.0": {
"dtype": "F16",
"shape": [4096, 4096],
"data_offsets": [0, 33554432]
},
"__metadata__": { "format": "pt" }
}
[ rest ] raw tensor bytes2. Loading on Meridian
Meridian workers mmap the payload region read-only and hand slices directly to CUDA via cudaHostRegister. No allocation, no copy, no deserializer running attacker-controlled code. The JSON header is validated against an allow-list of dtypes (F16, BF16, F32, I8) before any tensor is exposed to the runtime.
3. Why It Beats Pickle
Pickle executes arbitrary Python during torch.load, which means a poisoned checkpoint can pop a shell on your GPU box. Safetensors carries data only. Combined with content-addressed storage on Meridian, every weight you pull is reproducible, verifiable, and safe to swap between tenants without a sandbox.