SFT (Supervised Fine-Tuning)

SFT (Supervised Fine-Tuning) trains your model to clone expert behavior from demonstration data. This is especially powerful when combined with self-training: generate many candidate solutions, filter for successful ones, and fine-tune on the winners.

Paper: Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models (ReST-EM)

When to Use

Cloning successful AI generations (ReST-EM style self-training)
Distilling from a larger model to a smaller one
Training on domain-specific data (code, medical, legal, etc.)
Teaching specific output formats or styles
Vision fine-tuning with image-text pairs

Full Config Reference

[algorithm]
type = "offline"              # Required: "offline" for SFT
method = "sft"                # Required: "sft" or "supervised_finetune"
variety = "fft"               # "fft" (full), "lora", or "qlora"

[job]
model = "Qwen/Qwen3-4B"       # Required: HuggingFace model identifier
data = "path/to/train.jsonl"  # Required: path to training data (or data_path)
poll_seconds = 30             # Polling interval for status updates

[compute]
gpu_type = "H100"             # Required: GPU type (H100, H200, A100, etc.)
gpu_count = 4                 # Required: number of GPUs
nodes = 1                     # Number of nodes for multi-node training

[compute.topology]
type = "single_node_split"    # Topology type
gpus_for_vllm = 0             # GPUs for inference (not used in SFT)
gpus_for_training = 4         # GPUs for training
gpus_for_ref = 0              # GPUs for reference model
tensor_parallel = 1           # Tensor parallelism degree

[policy]
model_name = "Qwen/Qwen3-4B"  # Model name (exactly one of model_name or source required)
# source = "ft:abc123"        # Alternative: checkpoint source
max_tokens = 512              # Max generation tokens
temperature = 0.7             # Sampling temperature
top_p = 0.95                  # Top-p sampling
top_k = 50                    # Top-k sampling (optional)
repetition_penalty = 1.0      # Repetition penalty
stop_sequences = ["<|end|>"]  # Stop sequences (optional)
trainer_mode = "full"         # Required: "full", "lora", or "qlora"
label = "my-model"            # Required: model identifier/name
inference_url = "..."         # Optional: for distributed inference

[data]
validation_path = "path/to/val.jsonl"  # Optional validation dataset

[data.topology]
container_count = 4           # Number of data containers
gpus_per_node = 4             # GPUs per node
total_gpus = 4                # Total GPUs
nodes = 1                     # Number of nodes

[training]
mode = "full_finetune"        # "full_finetune", "lora", or "sft_offline"
use_qlora = false             # Enable QLoRA (4-bit quantization)

[training.validation]
enabled = true                # Enable validation during training
evaluation_strategy = "steps" # "steps" or "epoch"
eval_steps = 100              # Evaluate every N steps
save_best_model_at_end = true # Save best checkpoint
metric_for_best_model = "val.loss"  # Metric to optimize
greater_is_better = false     # Lower is better for loss

[training.lora]
r = 8                         # LoRA rank
alpha = 16                    # LoRA alpha scaling
dropout = 0.1                 # LoRA dropout
target_modules = ["q_proj", "v_proj"]  # Modules to apply LoRA

[hyperparameters]
n_epochs = 1                  # Number of training epochs
batch_size = 8                # Total batch size (deprecated, use global_batch)
global_batch = 8              # Global batch size across all GPUs
per_device_batch = 1          # Batch size per GPU
gradient_accumulation_steps = 1  # Gradient accumulation steps
sequence_length = 1024        # Max sequence length
learning_rate = 5e-6          # Learning rate
warmup_ratio = 0.03           # Warmup ratio (fraction of total steps)
weight_decay = 0.01           # Weight decay for regularization
train_kind = "fft"            # "fft" (full) or "peft" (LoRA/QLoRA)

[hyperparameters.parallelism]
use_deepspeed = false         # Use DeepSpeed for training
deepspeed_stage = 2           # DeepSpeed ZeRO stage (1, 2, or 3)
fsdp = true                   # Use FSDP (Fully Sharded Data Parallel)
bf16 = true                   # Use bfloat16 precision
fp16 = false                  # Use float16 precision
activation_checkpointing = true  # Enable gradient checkpointing
tensor_parallel_size = 1      # Tensor parallelism degree
pipeline_parallel_size = 1    # Pipeline parallelism degree

[model_config]
supports_vision = true        # Enable vision model support
max_images_per_message = 1    # Max images per input (for vision models)

[tags]
purpose = "production"        # Custom metadata tags
team = "ml-platform"

Parameters

`[algorithm]` (Required)

Parameter	Type	Required	Description
`type`	string	✓	Must be `"offline"` for SFT
`method`	string	✓	`"sft"` or `"supervised_finetune"`
`variety`	string	✓	`"fft"` (full fine-tune), `"lora"`, or `"qlora"`

`[job]` (Required)

Parameter	Type	Required	Description
`model`	string	✓	HuggingFace model identifier (e.g., `"Qwen/Qwen3-4B"`)
`data`	string	✓	Path to training data (JSONL format). Alternative: `data_path`
`data_path`	string		Alternative to `data`
`poll_seconds`	int		Polling interval for status updates (default: 30)

`[compute]` (Required)

Parameter	Type	Required	Description
`gpu_type`	string	✓	GPU type: `"H100"`, `"H200"`, `"A100"`, etc.
`gpu_count`	int	✓	Number of GPUs
`nodes`	int		Number of nodes (default: 1)

`[compute.topology]`

Parameter	Type	Description
`type`	string	Topology type (e.g., `"single_node_split"`)
`gpus_for_vllm`	int	GPUs for inference
`gpus_for_training`	int	GPUs for training
`gpus_for_ref`	int	GPUs for reference model
`tensor_parallel`	int	Tensor parallelism degree
`reference_placement`	string	Reference model placement: `"none"`, `"shared"`, `"dedicated"`

`[policy]`

Parameter	Type	Required	Description
`model_name`	string	✓*	Model name (exactly one of `model_name` or `source` required)
`source`	string	✓*	Checkpoint source (e.g., `"ft:abc123"`)
`max_tokens`	int		Max generation tokens (default: 512)
`temperature`	float		Sampling temperature (default: 0.7)
`top_p`	float		Top-p sampling (default: 0.95)
`top_k`	int		Top-k sampling
`repetition_penalty`	float		Repetition penalty (default: 1.0)
`stop_sequences`	list		Stop sequences
`trainer_mode`	string	✓	Training mode: `"full"`, `"lora"`, or `"qlora"`
`label`	string	✓	Model identifier/name
`inference_url`	string		URL for distributed inference

`[data]`

Parameter	Type	Description
`validation_path`	string	Path to validation dataset (JSONL)

`[data.topology]`

Parameter	Type	Description
`container_count`	int	Number of data containers
`gpus_per_node`	int	GPUs per node
`total_gpus`	int	Total GPUs
`nodes`	int	Number of nodes

`[training]`

Parameter	Type	Description
`mode`	string	Training mode: `"full_finetune"`, `"lora"`, `"sft_offline"`
`use_qlora`	bool	Enable QLoRA (4-bit quantization)

`[training.validation]`

Parameter	Type	Description
`enabled`	bool	Enable validation during training
`evaluation_strategy`	string	`"steps"` or `"epoch"`
`eval_steps`	int	Evaluate every N steps
`save_best_model_at_end`	bool	Save best checkpoint
`metric_for_best_model`	string	Metric to optimize (e.g., `"val.loss"`)
`greater_is_better`	bool	Whether higher metric is better

`[training.lora]`

Parameter	Type	Description
`r`	int	LoRA rank
`alpha`	int	LoRA alpha scaling factor
`dropout`	float	LoRA dropout rate
`target_modules`	list	Modules to apply LoRA (e.g., `["q_proj", "v_proj"]`)

`[hyperparameters]`

Parameter	Type	Default	Description
`n_epochs`	int	1	Number of training epochs
`batch_size`	int		Total batch size (deprecated)
`global_batch`	int		Global batch size across all GPUs
`per_device_batch`	int		Batch size per GPU
`gradient_accumulation_steps`	int		Gradient accumulation steps
`sequence_length`	int		Max sequence length
`learning_rate`	float		Learning rate
`warmup_ratio`	float		Warmup ratio (fraction of total steps)
`weight_decay`	float		Weight decay for regularization
`train_kind`	string		`"fft"` (full) or `"peft"` (LoRA/QLoRA)

`[hyperparameters.parallelism]`

Parameter	Type	Description
`use_deepspeed`	bool	Use DeepSpeed for training
`deepspeed_stage`	int	DeepSpeed ZeRO stage (1, 2, or 3)
`fsdp`	bool	Use FSDP (Fully Sharded Data Parallel)
`bf16`	bool	Use bfloat16 precision
`fp16`	bool	Use float16 precision
`activation_checkpointing`	bool	Enable gradient checkpointing
`tensor_parallel_size`	int	Tensor parallelism degree
`pipeline_parallel_size`	int	Pipeline parallelism degree

`[model_config]`

Parameter	Type	Description
`supports_vision`	bool	Enable vision model support
`max_images_per_message`	int	Max images per input message

`[tags]`

Arbitrary key-value pairs for metadata and tracking.

Returns

from synth_ai.sdk.api.train.sft import SFTJob

job = SFTJob.from_config("sft.toml")
job.submit()
result = job.poll_until_complete()

# Get model ID
model_id = result.get("fine_tuned_model")
# e.g., "ft:Qwen/Qwen3-0.6B:job_658ba4f3a93845aa"

# Or via method
model_id = job.get_fine_tuned_model()

Model ID Format

ft:<base_model>:<job_id>

Examples:

ft:Qwen/Qwen3-0.6B:job_658ba4f3a93845aa
peft:Qwen/Qwen3-4B:job_abc123def456 (LoRA)

Using Your Model

Dev Inference (testing):

from synth_ai.sdk import InferenceClient

client = InferenceClient(
    base_url="https://agent-learning.onrender.com",
    api_key=os.environ["SYNTH_API_KEY"],
)

response = await client.create_chat_completion(
    model="ft:Qwen/Qwen3-0.6B:job_658ba4f3a93845aa",
    messages=[{"role": "user", "content": "Hello!"}],
)

Export to HuggingFace:

uvx synth-ai artifacts export ft:Qwen/Qwen3-0.6B:job_658ba4f3a93845aa \
  --repo-id myorg/my-fine-tuned-model \
  --private

List Your Models

uvx synth-ai status models --type sft

Production API — Call your fine-tuned models
Artifacts CLI — Export models to HuggingFace
SFT Jobs — SDK reference

Training Methods

SDK

CLI

​When to Use

​Full Config Reference

​Parameters

​[algorithm] (Required)

​[job] (Required)

​[compute] (Required)

​[compute.topology]

​[policy]

​[data]

​[data.topology]

​[training]

​[training.validation]

​[training.lora]

​[hyperparameters]

​[hyperparameters.parallelism]

​[model_config]

​[tags]

​Returns

​Model ID Format

​Using Your Model

​List Your Models

​Related

When to Use

Full Config Reference

Parameters

`[algorithm]` (Required)

`[job]` (Required)

`[compute]` (Required)

`[compute.topology]`

`[policy]`

`[data]`

`[data.topology]`

`[training]`

`[training.validation]`

`[training.lora]`

`[hyperparameters]`

`[hyperparameters.parallelism]`

`[model_config]`

`[tags]`

Returns

Model ID Format

Using Your Model

List Your Models

Related