QLoRA (Quantized Low-Rank Adaptation) | designpattern.fyi

designpattern.fyi

Software Design Catalog

Language Models

Back to Catalog

Language Models Fine-Tuning

QLoRA (Quantized Low-Rank Adaptation)

Fine-tune a 4-bit quantized base model using BF16 LoRA adapters — enabling 65B parameter fine-tuning on a single 48GB GPU.

Intent & Description

🎯 Intent

Combine 4-bit quantization’s memory savings with LoRA’s parameter efficiency — makes fine-tuning of very large models possible on hardware that couldn’t even hold them for inference at FP16.

📋 Context

Standard LoRA still requires the base model in FP16 — a 65B model needs ~130GB VRAM just for frozen base weights. QLoRA quantizes the frozen base to 4-bit NF4 while LoRA adapters are trained in BF16, dequantizing on the fly for each forward pass.

💡 Solution

Quantize frozen base model to 4-bit Normal Float (NF4) using bitsandbytes. Attach LoRA adapters in BF16 to target layers. During training: dequantize NF4 weight → BF16 for each forward pass, compute gradients in BF16, update only the LoRA adapter parameters. Apply double quantization (quantize the quantization constants themselves) and paged optimizers for memory spike handling.

Real-world Use Case

Fine-tuning 13B, 33B, 65B, or 70B models on a single high-end GPU. Research and fine-tuning experiments with single-node or consumer GPU budgets. Any scenario where standard LoRA fits but the full FP16 base does not.

📌 TL;DR

4-bit base, BF16 adapters. Fits 65B fine-tuning on one GPU. The hardware barrier to large-model adaptation effectively collapses.

Advantages

Makes 65B+ model fine-tuning accessible on a single 48GB GPU
Accuracy close to full BF16 LoRA fine-tuning despite 4-bit base weights
Paged optimizers handle memory spikes from gradient accumulation

Disadvantages

Slower training than BF16 LoRA due to per-pass NF4 dequantization overhead
NF4 base model has a slightly lower quality floor than FP16 baseline
More complex setup — requires bitsandbytes and careful memory budgeting

39 of 58

© 2026 designpattern.fyi. Crafted with ❤️ for modern software engineers by OpenAGI