RECIPE

Transformers library primer

Hugging Face Transformers is the de-facto gateway to thousands of pretrained models. This recipe walks through wiring it into a Meridian pipeline so you can ship inference endpoints without rebuilding tokenizers, weights, or serving glue from scratch.

1. Installing the toolkit

Start with a clean virtualenv. The library pulls in PyTorch or TensorFlow on demand — pick one to keep your container small. Meridian autoscales cold-starts, so a leaner image means snappier first-token latency.

pip install transformers torch --upgrade
pip install accelerate sentencepiece
export HF_HOME=/var/cache/huggingface

2. Loading a pretrained model

The AutoModel and AutoTokenizer classes resolve any checkpoint id from the Hub. Pin a revision hash in production so a weight update on the Hub never silently changes your inference behavior between deploys.

3. Wiring it into Meridian

Wrap the pipeline in a Meridian handler, expose a single /infer route, and let the gateway handle auth, rate limits, and billing. The model stays warm across requests; only the request payload crosses the boundary.