LLMs are very large, so instead of changing all their weights, we use parameter-efficient fine-tuning to only train small parts.
Methods like LoRA, QLoRA, prefix/prompt tuning, adapters, and BitFit add tiny extra layers or low-rank updates while keeping the main model frozen.
This saves memory and compute, lets people train on normal GPUs, and makes it easy to reuse one base model for many different tasks.