Skip to main content
SFT (Supervised Fine-Tuning) trains your model to clone expert behavior from demonstration data. This is especially powerful when combined with self-training: generate many candidate solutions, filter for successful ones, and fine-tune on the winners.

When to Use

  • Cloning successful AI generations (ReST-EM style self-training)
  • Distilling from a larger model to a smaller one
  • Training on domain-specific data (code, medical, legal, etc.)
  • Teaching specific output formats or styles
  • Vision fine-tuning with image-text pairs

Full Config Reference

[algorithm]
type = "offline"              # Required: "offline" for SFT
method = "sft"                # Required: "sft" or "supervised_finetune"
variety = "fft"               # "fft" (full), "lora", or "qlora"

[job]
model = "Qwen/Qwen3-4B"       # Required: HuggingFace model identifier
data = "path/to/train.jsonl"  # Required: path to training data (or data_path)
poll_seconds = 30             # Polling interval for status updates

[compute]
gpu_type = "H100"             # Required: GPU type (H100, H200, A100, etc.)
gpu_count = 4                 # Required: number of GPUs
nodes = 1                     # Number of nodes for multi-node training

[compute.topology]
type = "single_node_split"    # Topology type
gpus_for_vllm = 0             # GPUs for inference (not used in SFT)
gpus_for_training = 4         # GPUs for training
gpus_for_ref = 0              # GPUs for reference model
tensor_parallel = 1           # Tensor parallelism degree

[policy]
model_name = "Qwen/Qwen3-4B"  # Model name (exactly one of model_name or source required)
# source = "ft:abc123"        # Alternative: checkpoint source
max_tokens = 512              # Max generation tokens
temperature = 0.7             # Sampling temperature
top_p = 0.95                  # Top-p sampling
top_k = 50                    # Top-k sampling (optional)
repetition_penalty = 1.0      # Repetition penalty
stop_sequences = ["<|end|>"]  # Stop sequences (optional)
trainer_mode = "full"         # Required: "full", "lora", or "qlora"
label = "my-model"            # Required: model identifier/name
inference_url = "..."         # Optional: for distributed inference

[data]
validation_path = "path/to/val.jsonl"  # Optional validation dataset

[data.topology]
container_count = 4           # Number of data containers
gpus_per_node = 4             # GPUs per node
total_gpus = 4                # Total GPUs
nodes = 1                     # Number of nodes

[training]
mode = "full_finetune"        # "full_finetune", "lora", or "sft_offline"
use_qlora = false             # Enable QLoRA (4-bit quantization)

[training.validation]
enabled = true                # Enable validation during training
evaluation_strategy = "steps" # "steps" or "epoch"
eval_steps = 100              # Evaluate every N steps
save_best_model_at_end = true # Save best checkpoint
metric_for_best_model = "val.loss"  # Metric to optimize
greater_is_better = false     # Lower is better for loss

[training.lora]
r = 8                         # LoRA rank
alpha = 16                    # LoRA alpha scaling
dropout = 0.1                 # LoRA dropout
target_modules = ["q_proj", "v_proj"]  # Modules to apply LoRA

[hyperparameters]
n_epochs = 1                  # Number of training epochs
batch_size = 8                # Total batch size (deprecated, use global_batch)
global_batch = 8              # Global batch size across all GPUs
per_device_batch = 1          # Batch size per GPU
gradient_accumulation_steps = 1  # Gradient accumulation steps
sequence_length = 1024        # Max sequence length
learning_rate = 5e-6          # Learning rate
warmup_ratio = 0.03           # Warmup ratio (fraction of total steps)
weight_decay = 0.01           # Weight decay for regularization
train_kind = "fft"            # "fft" (full) or "peft" (LoRA/QLoRA)

[hyperparameters.parallelism]
use_deepspeed = false         # Use DeepSpeed for training
deepspeed_stage = 2           # DeepSpeed ZeRO stage (1, 2, or 3)
fsdp = true                   # Use FSDP (Fully Sharded Data Parallel)
bf16 = true                   # Use bfloat16 precision
fp16 = false                  # Use float16 precision
activation_checkpointing = true  # Enable gradient checkpointing
tensor_parallel_size = 1      # Tensor parallelism degree
pipeline_parallel_size = 1    # Pipeline parallelism degree

[model_config]
supports_vision = true        # Enable vision model support
max_images_per_message = 1    # Max images per input (for vision models)

[tags]
purpose = "production"        # Custom metadata tags
team = "ml-platform"

Parameters

[algorithm] (Required)

ParameterTypeRequiredDescription
typestringMust be "offline" for SFT
methodstring"sft" or "supervised_finetune"
varietystring"fft" (full fine-tune), "lora", or "qlora"

[job] (Required)

ParameterTypeRequiredDescription
modelstringHuggingFace model identifier (e.g., "Qwen/Qwen3-4B")
datastringPath to training data (JSONL format). Alternative: data_path
data_pathstringAlternative to data
poll_secondsintPolling interval for status updates (default: 30)

[compute] (Required)

ParameterTypeRequiredDescription
gpu_typestringGPU type: "H100", "H200", "A100", etc.
gpu_countintNumber of GPUs
nodesintNumber of nodes (default: 1)

[compute.topology]

ParameterTypeDescription
typestringTopology type (e.g., "single_node_split")
gpus_for_vllmintGPUs for inference
gpus_for_trainingintGPUs for training
gpus_for_refintGPUs for reference model
tensor_parallelintTensor parallelism degree
reference_placementstringReference model placement: "none", "shared", "dedicated"

[policy]

ParameterTypeRequiredDescription
model_namestring✓*Model name (exactly one of model_name or source required)
sourcestring✓*Checkpoint source (e.g., "ft:abc123")
max_tokensintMax generation tokens (default: 512)
temperaturefloatSampling temperature (default: 0.7)
top_pfloatTop-p sampling (default: 0.95)
top_kintTop-k sampling
repetition_penaltyfloatRepetition penalty (default: 1.0)
stop_sequenceslistStop sequences
trainer_modestringTraining mode: "full", "lora", or "qlora"
labelstringModel identifier/name
inference_urlstringURL for distributed inference

[data]

ParameterTypeDescription
validation_pathstringPath to validation dataset (JSONL)

[data.topology]

ParameterTypeDescription
container_countintNumber of data containers
gpus_per_nodeintGPUs per node
total_gpusintTotal GPUs
nodesintNumber of nodes

[training]

ParameterTypeDescription
modestringTraining mode: "full_finetune", "lora", "sft_offline"
use_qloraboolEnable QLoRA (4-bit quantization)

[training.validation]

ParameterTypeDescription
enabledboolEnable validation during training
evaluation_strategystring"steps" or "epoch"
eval_stepsintEvaluate every N steps
save_best_model_at_endboolSave best checkpoint
metric_for_best_modelstringMetric to optimize (e.g., "val.loss")
greater_is_betterboolWhether higher metric is better

[training.lora]

ParameterTypeDescription
rintLoRA rank
alphaintLoRA alpha scaling factor
dropoutfloatLoRA dropout rate
target_moduleslistModules to apply LoRA (e.g., ["q_proj", "v_proj"])

[hyperparameters]

ParameterTypeDefaultDescription
n_epochsint1Number of training epochs
batch_sizeintTotal batch size (deprecated)
global_batchintGlobal batch size across all GPUs
per_device_batchintBatch size per GPU
gradient_accumulation_stepsintGradient accumulation steps
sequence_lengthintMax sequence length
learning_ratefloatLearning rate
warmup_ratiofloatWarmup ratio (fraction of total steps)
weight_decayfloatWeight decay for regularization
train_kindstring"fft" (full) or "peft" (LoRA/QLoRA)

[hyperparameters.parallelism]

ParameterTypeDescription
use_deepspeedboolUse DeepSpeed for training
deepspeed_stageintDeepSpeed ZeRO stage (1, 2, or 3)
fsdpboolUse FSDP (Fully Sharded Data Parallel)
bf16boolUse bfloat16 precision
fp16boolUse float16 precision
activation_checkpointingboolEnable gradient checkpointing
tensor_parallel_sizeintTensor parallelism degree
pipeline_parallel_sizeintPipeline parallelism degree

[model_config]

ParameterTypeDescription
supports_visionboolEnable vision model support
max_images_per_messageintMax images per input message

[tags]

Arbitrary key-value pairs for metadata and tracking.

Returns

from synth_ai.sdk.api.train.sft import SFTJob

job = SFTJob.from_config("sft.toml")
job.submit()
result = job.poll_until_complete()

# Get model ID
model_id = result.get("fine_tuned_model")
# e.g., "ft:Qwen/Qwen3-0.6B:job_658ba4f3a93845aa"

# Or via method
model_id = job.get_fine_tuned_model()

Model ID Format

ft:<base_model>:<job_id>
Examples:
  • ft:Qwen/Qwen3-0.6B:job_658ba4f3a93845aa
  • peft:Qwen/Qwen3-4B:job_abc123def456 (LoRA)

Using Your Model

Dev Inference (testing):
from synth_ai.sdk import InferenceClient

client = InferenceClient(
    base_url="https://agent-learning.onrender.com",
    api_key=os.environ["SYNTH_API_KEY"],
)

response = await client.create_chat_completion(
    model="ft:Qwen/Qwen3-0.6B:job_658ba4f3a93845aa",
    messages=[{"role": "user", "content": "Hello!"}],
)
Export to HuggingFace:
uvx synth-ai artifacts export ft:Qwen/Qwen3-0.6B:job_658ba4f3a93845aa \
  --repo-id myorg/my-fine-tuned-model \
  --private

List Your Models

uvx synth-ai status models --type sft