Structured Pruning | designpattern.fyi

Back to Catalog

Advantages

Real hardware speedups without sparse compute kernels — the pruned model is just a smaller dense model
Composable with quantization and LoRA — prune first, then quantize and fine-tune for compounding gains
Layer-level pruning produces models with reduced depth — lower latency on sequential hardware

Disadvantages

Aggressive pruning (>40% of parameters) degrades quality significantly before recovery fine-tuning
Sensitivity analysis requires calibration data and adds pre-pruning evaluation cost
Recovery fine-tuning is required to restore quality — adds a training step after pruning