What is LLM Fine-Tuning? LoRA, QLoRA, and When It Helps

The short version

Fine-tuning continues training a pretrained LLM on a domain dataset.
Modern fine-tuning uses LoRA/QLoRA for parameter-efficient adaptation.
RAG and prompt engineering solve most problems; fine-tuning is the specialist's tool.
Data preparation is where most of the engineering effort actually goes.

The longer explanation

What fine-tuning does

A pretrained LLM has been trained on a broad corpus and developed broad capabilities. Fine-tuning continues the training on a narrower, curated dataset so the model develops capabilities specific to a target domain, task, or style. The base model capabilities do not disappear; they are specialized.

The three categories of fine-tuning that matter in practice:

Supervised fine-tuning (SFT). Train on input-output pairs. The model learns the specific mapping. This is the most common enterprise fine-tuning path.
Instruction fine-tuning. A flavor of SFT focused on following task-specific instructions. Often used for domain-specific assistants.
Preference fine-tuning (RLHF, DPO, and related). Train against preference data — "response A is better than response B" — to shape model behavior. Common for safety and style alignment.

LoRA and QLoRA

Full fine-tuning updates every parameter in the model. For a 70B-parameter model, this requires roughly 1.4 TB of GPU memory in 16-bit precision. Most enterprises do not have that infrastructure readily available.

LoRA (Low-Rank Adaptation) inserts small adapter matrices into the model and trains only those. The base model weights stay frozen. The adapter weights are a few tens of megabytes. The training workload drops by an order of magnitude, and the results for most tasks are comparable to full fine-tuning.

QLoRA goes further: it quantizes the frozen base model to 4-bit precision, further reducing GPU memory requirements. A 70B model that would need 4-8 H100s for full fine-tuning can be QLoRA fine-tuned on a single H100.

Both are production-ready. Open-weight models (Llama, Mistral, Qwen, Gemma) support them; the tooling (Hugging Face PEFT, Axolotl, and others) is mature.

When fine-tuning earns its keep

Specific output format the model does not produce reliably with prompt engineering alone.
Domain vocabulary and style that the base model treats as out-of-distribution.
Latency-sensitive workloads where baking behavior into weights beats paying for it in the prompt every request.
Cost-sensitive high-volume workloads where a smaller fine-tuned model outperforms a larger base model at lower cost per inference.
Tasks where prompt engineering has hit a ceiling after systematic iteration.

For the first enterprise AI workload, fine-tuning rarely earns its keep. For the fifth or tenth, it often does.

The cost structure

Compute cost is the less important part. Data preparation — curating, cleaning, formatting the training data — is where most of the engineering effort goes. A 10,000-example fine-tuning dataset might cost $500 to compute against but $50,000 to prepare properly, especially if the examples require domain-expert review.

Evaluation is the other expensive line item. A fine-tuned model needs to be evaluated against production scenarios; the evaluation suite is often as large as the training set.

How Thoughtwave approaches this

We recommend fine-tuning only when it is the right tool. For most engagements, prompt engineering plus RAG plus model-switching (to a different base model) solves the problem without fine-tuning. When fine-tuning is called for — specific output format, domain specialization for a cost-sensitive workload, or a production behavior the base model cannot produce reliably — we use LoRA or QLoRA on open-weight models.

For the deeper context on model selection and deployment, see our LLM Deployment Services and the accelerators portfolio.

What is fine-tuning a large language model?

The short version

The longer explanation

What fine-tuning does

LoRA and QLoRA

When fine-tuning earns its keep

The cost structure

How Thoughtwave approaches this

Frequently asked questions

Related Services

Industries

Case Study

Next Step

The short version

The longer explanation

What fine-tuning does

LoRA and QLoRA

When fine-tuning earns its keep

The cost structure

How Thoughtwave approaches this

Frequently asked questions

Related resources

Related Services

Industries

Case Study

Next Step