Full Fine-Tuning

Run standard supervised finetuning using the SFT workflow with training.use_qlora = false (default) and typical FFT hyperparameters.

Invoke via: uvx synth-ai train --type sft --config <path>
Uses the same client/payload path as LoRA; differs only in training mode/toggles and typical hyperparameters/parallelism

FFT vs LoRA/QLoRA

FFT (full finetune) updates all weights. Best final quality, higher VRAM/compute.
LoRA updates adapters on top of frozen weights; faster/cheaper, smaller artifacts.
QLoRA uses 4-bit quantization for adapters; further reduces memory at some quality/latency tradeoffs.
Switch by toggling training.use_qlora and (optionally) training.mode:
- FFT: training.use_qlora = false, hyperparameters.train_kind = "fft"
- LoRA: training.use_qlora = false, training.mode = "lora", hyperparameters.train_kind = "peft"
- QLoRA: training.use_qlora = true, training.mode = "lora", hyperparameters.train_kind = "peft"

Quickstart

uvx synth-ai train --type sft --config examples/warming_up_to_rl/configs/crafter_fft.toml --dataset /abs/path/to/train.jsonl

Minimal TOML (FFT)

[job]
model = "Qwen/Qwen3-4B"
# data = "/abs/path/to/train.jsonl"  # or pass via --dataset

[compute]
gpu_type = "H100"       # required by backend
gpu_count = 4
nodes = 1

[data.topology]
container_count = 4      # optional; informational

[training]
mode = "full_finetune"  # documentation; backend infers
use_qlora = false

[training.validation]
enabled = true
evaluation_strategy = "steps"
eval_steps = 20
save_best_model_at_end = true
metric_for_best_model = "val.loss"
greater_is_better = false

[hyperparameters]
n_epochs = 2
sequence_length = 2048
per_device_batch = 2
gradient_accumulation_steps = 64
learning_rate = 8e-6
warmup_ratio = 0.03
train_kind = "fft"

[hyperparameters.parallelism]
use_deepspeed = true
deepspeed_stage = 3
fsdp = false
bf16 = true
fp16 = false

What the client validates and sends

Validates dataset path existence and JSONL records
Uploads files to /api/learning/files, then creates/starts job under /api/learning/jobs
Payload mapping is identical to LoRA SFT: hyperparameters + metadata.effective_config (compute, data.topology, training)

Multi‑GPU guidance (FFT)

Use [compute] for cluster shape
Prefer [hyperparameters.parallelism] for deepspeed stage, FSDP, precision, TP/PP sizes; forwarded verbatim
[data.topology] is optional and informational; backend/trainer validates actual resource consistency

GPU options

Single-GPU: A10G/L40S/H100 for small to mid models (≤7B). Increase gradient_accumulation_steps.
Multi-GPU single-node: H100 2x/4x for 14B–32B FFT. Use ZeRO-2/3 and optionally FSDP.
Multi-node: H100 with RDMA for very large FFT or MoE. Provide nodes > 1 and topology in [hyperparameters.parallelism].

Common issues

HTTP 400 missing_gpu_type: add [compute].gpu_type
Dataset not found: specify absolute path or use --dataset (paths resolved from current working directory)

Helpful CLI flags

--examples N to subset data for a quick smoke test
--dry-run to preview payload before submitting

All sections and parameters (FFT)

[job] (client reads)
- model (string, required): base model identifier
- data or data_path (string): training JSONL (required unless --dataset provided)
[compute] (forwarded into metadata.effective_config.compute)
- gpu_type (string): required by backend
- gpu_count (int)
- nodes (int, optional)
[data] / [data.topology]
- topology (table): forwarded into metadata.effective_config.data.topology
- validation_path (string, optional): if present and exists, is uploaded to enable validation
[training]
- mode (string, optional): copied to metadata for visibility
- use_qlora (bool, default false)
- [training.validation] keys promoted into hyperparameters:
  - enabled (bool, default true) -> surfaced into metadata.effective_config.training.validation.enabled
  - evaluation_strategy (string, default “steps”)
  - eval_steps (int, default 0)
  - save_best_model_at_end (bool, default true)
  - metric_for_best_model (string, default “val.loss”)
  - greater_is_better (bool, default false)
[hyperparameters]
- n_epochs (int, default 1)
- Optional: batch_size, global_batch, per_device_batch, gradient_accumulation_steps, sequence_length, learning_rate, warmup_ratio, train_kind
- [hyperparameters.parallelism] forwarded verbatim: use_deepspeed, deepspeed_stage, fsdp, bf16, fp16, tensor_parallel_size, pipeline_parallel_size
[algorithm] (ignored by client): sometimes used in examples for documentation only

Validation and error rules:

Dataset path must exist; otherwise the CLI prompts/aborts
Dataset JSONL checked for messages structure
Backend requires compute.gpu_type; missing value yields HTTP 400 at create job

Payload mapping recap:

model from [job].model
training_type = "sft_offline"
hyperparameters from [hyperparameters] plus selected [training.validation] keys
metadata.effective_config.compute from [compute]
metadata.effective_config.data.topology from [data.topology]
metadata.effective_config.training.{mode,use_qlora} from [training]

Fine Tuning

​FFT vs LoRA/QLoRA

​Quickstart

​Minimal TOML (FFT)

​What the client validates and sends

​Multi‑GPU guidance (FFT)

​GPU options

​Common issues

​Helpful CLI flags

​All sections and parameters (FFT)

FFT vs LoRA/QLoRA

Quickstart

Minimal TOML (FFT)

What the client validates and sends

Multi‑GPU guidance (FFT)

GPU options

Common issues

Helpful CLI flags

All sections and parameters (FFT)