When to Use SFT
Best for:- Domain adaptation (legal, medical, code)
- Custom response style or tone
- Teaching specific formats or structures
- When you have high-quality example data
- You don’t have labeled training data
- You want to iterate quickly without retraining
- Your task is classification or QA with clear metrics
Prerequisites
Step 1: Prepare Your Training Data
Create a JSONL file with conversation examples. Each line is a JSON object with amessages array:
Data Format Requirements
Each example must have:- At least one
usermessage - At least one
assistantmessage (this is what the model learns to generate)
Supported Message Roles
| Role | Description |
|---|---|
system | Optional system prompt (first message only) |
user | User input |
assistant | Model response (training target) |
tool | Tool/function response |
Example: Basic Conversation
Example: With Tool Calls
Example: Vision/Multimodal
For vision models (e.g., Qwen3-VL), include images in user messages:Data Quality Tips
- Diverse examples: Cover the range of inputs your model will see
- Consistent format: Use the same response style across examples
- Quality over quantity: 100 excellent examples beat 10,000 mediocre ones
- Validation set: Hold out 10-20% for evaluation
Step 2: Create the Configuration
Create a TOML file for your training job:Configuration Reference
Model Selection
| Model | Use Case | Notes |
|---|---|---|
Qwen/Qwen2.5-7B-Instruct | General purpose | Good balance of speed/quality |
Qwen/Qwen2.5-14B-Instruct | Higher quality | Slower, more GPU memory |
Qwen/Qwen3-VL-7B | Vision tasks | Supports image inputs |
meta-llama/Llama-3.1-8B-Instruct | General purpose | Strong reasoning |
Hyperparameters
| Parameter | Default | Description |
|---|---|---|
num_train_epochs | 3 | Training passes over your data |
learning_rate | 2e-4 | How fast the model updates (lower = more stable) |
per_device_train_batch_size | 4 | Examples per GPU per step |
gradient_accumulation_steps | 4 | Accumulate gradients before update |
max_seq_length | 2048 | Maximum tokens per example |
warmup_ratio | 0.1 | Fraction of steps for learning rate warmup |
per_device_train_batch_size × gradient_accumulation_steps = 16
LoRA Settings
LoRA (Low-Rank Adaptation) fine-tunes efficiently by updating a small number of parameters:| Parameter | Default | Description |
|---|---|---|
enabled | true | Use LoRA (recommended) |
rank | 16 | Rank of adaptation matrices (higher = more capacity) |
alpha | 32 | Scaling factor (typically 2× rank) |
dropout | 0.1 | Regularization |
Evaluation Settings
| Parameter | Default | Description |
|---|---|---|
eval_steps | 500 | Evaluate every N steps |
early_stopping_patience | 3 | Stop if no improvement for N evals |
save_best_model | true | Keep the best checkpoint |
Step 3: Launch the Training Job
Using the CLI
--poll flag shows progress until completion:
Using Python
Resume a Job
If you need to check on a job later:Step 4: Use Your Fine-Tuned Model
After training completes, you’ll receive a model ID likeft:qwen2.5-7b:my-org:abc123.