Skip to main content
Supervised Fine-Tuning (SFT) trains a model on your instruction-response pairs, teaching it to generate outputs that match your examples. Unlike prompt optimization which changes the prompt, SFT modifies the model weights directly.

When to Use SFT

Best for:
  • Domain adaptation (legal, medical, code)
  • Custom response style or tone
  • Teaching specific formats or structures
  • When you have high-quality example data
Consider prompt optimization instead if:
  • You don’t have labeled training data
  • You want to iterate quickly without retraining
  • Your task is classification or QA with clear metrics

Prerequisites

# Required environment variables in .env
SYNTH_API_KEY=sk_...    # For authentication
Install the CLI:
pip install synth-ai
# or
uvx synth-ai --help

Step 1: Prepare Your Training Data

Create a JSONL file with conversation examples. Each line is a JSON object with a messages array:
{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is 2+2?"}, {"role": "assistant", "content": "4"}]}
{"messages": [{"role": "user", "content": "Translate 'hello' to French"}, {"role": "assistant", "content": "Bonjour"}]}

Data Format Requirements

Each example must have:
  • At least one user message
  • At least one assistant message (this is what the model learns to generate)

Supported Message Roles

RoleDescription
systemOptional system prompt (first message only)
userUser input
assistantModel response (training target)
toolTool/function response

Example: Basic Conversation

{
  "messages": [
    {"role": "system", "content": "You are a customer service agent for Acme Corp."},
    {"role": "user", "content": "I need to return my order"},
    {"role": "assistant", "content": "I'd be happy to help with your return. Could you please provide your order number?"}
  ]
}

Example: With Tool Calls

{
  "messages": [
    {"role": "user", "content": "What's the weather in NYC?"},
    {"role": "assistant", "content": null, "tool_calls": [
      {"id": "call_1", "type": "function", "function": {"name": "get_weather", "arguments": "{\"city\": \"NYC\"}"}}
    ]},
    {"role": "tool", "tool_call_id": "call_1", "content": "Sunny, 72°F"},
    {"role": "assistant", "content": "It's currently sunny and 72°F in New York City."}
  ],
  "tools": [{"name": "get_weather", "description": "Get current weather", "parameters": {"type": "object", "properties": {"city": {"type": "string"}}}}]
}

Example: Vision/Multimodal

For vision models (e.g., Qwen3-VL), include images in user messages:
{
  "messages": [
    {"role": "user", "content": [
      {"type": "text", "text": "What's in this image?"},
      {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
    ]},
    {"role": "assistant", "content": "I see a golden retriever playing in a park."}
  ]
}
Images can be URLs or base64-encoded:
{"type": "image_url", "image_url": {"url": "..."}}

Data Quality Tips

  1. Diverse examples: Cover the range of inputs your model will see
  2. Consistent format: Use the same response style across examples
  3. Quality over quantity: 100 excellent examples beat 10,000 mediocre ones
  4. Validation set: Hold out 10-20% for evaluation

Step 2: Create the Configuration

Create a TOML file for your training job:
[training]
algorithm = "sft"
model = "Qwen/Qwen2.5-7B-Instruct"

[training.data]
training_file = "data/train.jsonl"
validation_file = "data/val.jsonl"  # Optional but recommended

[training.hyperparameters]
num_train_epochs = 3
learning_rate = 2e-4
per_device_train_batch_size = 4
gradient_accumulation_steps = 4
max_seq_length = 2048

[training.lora]
enabled = true
rank = 16
alpha = 32

[training.evaluation]
eval_steps = 500
early_stopping_patience = 3

Configuration Reference

Model Selection

ModelUse CaseNotes
Qwen/Qwen2.5-7B-InstructGeneral purposeGood balance of speed/quality
Qwen/Qwen2.5-14B-InstructHigher qualitySlower, more GPU memory
Qwen/Qwen3-VL-7BVision tasksSupports image inputs
meta-llama/Llama-3.1-8B-InstructGeneral purposeStrong reasoning

Hyperparameters

ParameterDefaultDescription
num_train_epochs3Training passes over your data
learning_rate2e-4How fast the model updates (lower = more stable)
per_device_train_batch_size4Examples per GPU per step
gradient_accumulation_steps4Accumulate gradients before update
max_seq_length2048Maximum tokens per example
warmup_ratio0.1Fraction of steps for learning rate warmup
Effective batch size = per_device_train_batch_size × gradient_accumulation_steps = 16

LoRA Settings

LoRA (Low-Rank Adaptation) fine-tunes efficiently by updating a small number of parameters:
ParameterDefaultDescription
enabledtrueUse LoRA (recommended)
rank16Rank of adaptation matrices (higher = more capacity)
alpha32Scaling factor (typically 2× rank)
dropout0.1Regularization

Evaluation Settings

ParameterDefaultDescription
eval_steps500Evaluate every N steps
early_stopping_patience3Stop if no improvement for N evals
save_best_modeltrueKeep the best checkpoint

Step 3: Launch the Training Job

Using the CLI

synth-ai train --type sft --config my_config.toml --poll
The --poll flag shows progress until completion:
[14:23:01]    0.0s  Status: queued
[14:23:15]   14.2s  Status: running | Step: 0 | Loss: 2.45
[14:23:45]   44.1s  Status: running | Step: 100 | Loss: 1.82
[14:24:15]   74.0s  Status: running | Step: 200 | Loss: 1.34
...
[14:45:30]  1349s   Status: succeeded | Final Loss: 0.89
Job completed! Fine-tuned model: ft:qwen2.5-7b:my-org:abc123

Using Python

from synth_ai.sdk.api.train.sft import SFTJob
import os

# Create job from config
job = SFTJob.from_config(
    config_path="my_config.toml",
    api_key=os.environ["SYNTH_API_KEY"]
)

# Submit and wait
job_id = job.submit()
print(f"Job started: {job_id}")

result = job.poll_until_complete(timeout=7200.0)  # 2 hour timeout
print(f"Fine-tuned model: {result.get('fine_tuned_model')}")

Resume a Job

If you need to check on a job later:
job = SFTJob.from_job_id(
    job_id="sft_abc123",
    api_key=os.environ["SYNTH_API_KEY"]
)

status = job.get_status()
print(f"Status: {status['status']}")
print(f"Progress: {status.get('current_step', 0)} / {status.get('total_steps', '?')}")

Step 4: Use Your Fine-Tuned Model

After training completes, you’ll receive a model ID like ft:qwen2.5-7b:my-org:abc123.

Via API

from openai import OpenAI

client = OpenAI(
    base_url="https://api.usesynth.ai/v1",
    api_key=os.environ["SYNTH_API_KEY"]
)

response = client.chat.completions.create(
    model="ft:qwen2.5-7b:my-org:abc123",  # Your fine-tuned model
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

print(response.choices[0].message.content)

Supported Models

See Supported Models for the full list of trainable models.