Recipe

LoRA Fine-Tuning

Inject trainable low-rank matrices into frozen foundation models. Full adaptation quality at a fraction of the memory cost.

Memory

<16 GB

GPU VRAM required

Trainable params

0.1–2%

of base model

Checkpoint size

~10 MB

per adapter

How it works

LoRA freezes the pre-trained weight matrix W and injects a low-rank decomposition A·B. Only A and B receive gradient updates. The forward pass becomes h = Wx + (α/r)·ABx. At inference the adapter merges back into W with zero latency overhead.

Target modules

ArchitectureModules
GPT-styleq_proj, v_proj, k_proj, o_proj
Llama / Mistralq_proj, v_proj, gate_proj, up_proj, down_proj
Vision (ViT)query, value

Recommended hyperparameters

  • Rank r: 8–64. Start at 16 for 7B models.
  • Alpha α: 2× rank. Controls scaling factor.
  • Dropout: 0.05–0.1. Regularizes small datasets.
  • LR: 1e-4 to 5e-4 with cosine schedule.