Recipe
LoRA Fine-Tuning
Inject trainable low-rank matrices into frozen foundation models. Full adaptation quality at a fraction of the memory cost.
Memory
<16 GB
GPU VRAM required
Trainable params
0.1–2%
of base model
Checkpoint size
~10 MB
per adapter
How it works
LoRA freezes the pre-trained weight matrix W and injects a low-rank decomposition A·B. Only A and B receive gradient updates. The forward pass becomes h = Wx + (α/r)·ABx. At inference the adapter merges back into W with zero latency overhead.
Target modules
| Architecture | Modules |
|---|---|
| GPT-style | q_proj, v_proj, k_proj, o_proj |
| Llama / Mistral | q_proj, v_proj, gate_proj, up_proj, down_proj |
| Vision (ViT) | query, value |
Recommended hyperparameters
- Rank r: 8–64. Start at 16 for 7B models.
- Alpha α: 2× rank. Controls scaling factor.
- Dropout: 0.05–0.1. Regularizes small datasets.
- LR: 1e-4 to 5e-4 with cosine schedule.