Transformers library primer
Hugging Face Transformers is the de-facto gateway to thousands of pretrained models. This recipe walks through wiring it into a Meridian pipeline so you can ship inference endpoints without rebuilding tokenizers, weights, or serving glue from scratch.
1. Installing the toolkit
Start with a clean virtualenv. The library pulls in PyTorch or TensorFlow on demand — pick one to keep your container small. Meridian autoscales cold-starts, so a leaner image means snappier first-token latency.
pip install transformers torch --upgrade pip install accelerate sentencepiece export HF_HOME=/var/cache/huggingface
2. Loading a pretrained model
The AutoModel and AutoTokenizer classes resolve any checkpoint id from the Hub. Pin a revision hash in production so a weight update on the Hub never silently changes your inference behavior between deploys.
3. Wiring it into Meridian
Wrap the pipeline in a Meridian handler, expose a single /infer route, and let the gateway handle auth, rate limits, and billing. The model stays warm across requests; only the request payload crosses the boundary.