Recipe

LoRA Fine-Tuning

Inject trainable low-rank matrices into frozen foundation models. Full adaptation quality at a fraction of the memory cost.

Memory

<16 GB

GPU VRAM required

Trainable params

0.1–2%

of base model

Checkpoint size

~10 MB

per adapter

How it works

LoRA freezes the pre-trained weight matrix W and injects a low-rank decomposition A·B. Only A and B receive gradient updates. The forward pass becomes h = Wx + (α/r)·ABx. At inference the adapter merges back into W with zero latency overhead.

Target modules

Architecture	Modules
GPT-style	q_proj, v_proj, k_proj, o_proj
Llama / Mistral	q_proj, v_proj, gate_proj, up_proj, down_proj
Vision (ViT)	query, value

Recommended hyperparameters

Rank r: 8–64. Start at 16 for 7B models.
Alpha α: 2× rank. Controls scaling factor.
Dropout: 0.05–0.1. Regularizes small datasets.
LR: 1e-4 to 5e-4 with cosine schedule.

← Back to docs Next: QLoRA →