training.use_qlora = false
(default) and typical FFT hyperparameters.
- Invoke via:
uvx synth-ai train --type sft --config <path>
- Uses the same client/payload path as LoRA; differs only in training mode/toggles and typical hyperparameters/parallelism
FFT vs LoRA/QLoRA
- FFT (full finetune) updates all weights. Best final quality, higher VRAM/compute.
- LoRA updates adapters on top of frozen weights; faster/cheaper, smaller artifacts.
- QLoRA uses 4-bit quantization for adapters; further reduces memory at some quality/latency tradeoffs.
- Switch by toggling
training.use_qlora
and (optionally)training.mode
:- FFT:
training.use_qlora = false
,hyperparameters.train_kind = "fft"
- LoRA:
training.use_qlora = false
,training.mode = "lora"
,hyperparameters.train_kind = "peft"
- QLoRA:
training.use_qlora = true
,training.mode = "lora"
,hyperparameters.train_kind = "peft"
- FFT:
Quickstart
Minimal TOML (FFT)
What the client validates and sends
- Validates dataset path existence and JSONL records
- Uploads files to
/api/learning/files
, then creates/starts job under/api/learning/jobs
- Payload mapping is identical to LoRA SFT: hyperparameters +
metadata.effective_config
(compute, data.topology, training)
Multi‑GPU guidance (FFT)
- Use
[compute]
for cluster shape - Prefer
[hyperparameters.parallelism]
for deepspeed stage, FSDP, precision, TP/PP sizes; forwarded verbatim [data.topology]
is optional and informational; backend/trainer validates actual resource consistency
GPU options
- Single-GPU: A10G/L40S/H100 for small to mid models (≤7B). Increase
gradient_accumulation_steps
. - Multi-GPU single-node: H100 2x/4x for 14B–32B FFT. Use ZeRO-2/3 and optionally FSDP.
- Multi-node: H100 with RDMA for very large FFT or MoE. Provide
nodes > 1
and topology in[hyperparameters.parallelism]
.
Common issues
- HTTP 400
missing_gpu_type
: add[compute].gpu_type
- Dataset not found: specify absolute path or use
--dataset
(paths resolved from current working directory)
Helpful CLI flags
--examples N
to subset data for a quick smoke test--dry-run
to preview payload before submitting
All sections and parameters (FFT)
-
[job]
(client reads)model
(string, required): base model identifierdata
ordata_path
(string): training JSONL (required unless--dataset
provided)
-
[compute]
(forwarded into metadata.effective_config.compute)gpu_type
(string): required by backendgpu_count
(int)nodes
(int, optional)
-
[data]
/[data.topology]
topology
(table): forwarded intometadata.effective_config.data.topology
validation_path
(string, optional): if present and exists, is uploaded to enable validation
-
[training]
mode
(string, optional): copied to metadata for visibilityuse_qlora
(bool, default false)[training.validation]
keys promoted into hyperparameters:enabled
(bool, default true) -> surfaced into metadata.effective_config.training.validation.enabledevaluation_strategy
(string, default “steps”)eval_steps
(int, default 0)save_best_model_at_end
(bool, default true)metric_for_best_model
(string, default “val.loss”)greater_is_better
(bool, default false)
-
[hyperparameters]
n_epochs
(int, default 1)- Optional:
batch_size
,global_batch
,per_device_batch
,gradient_accumulation_steps
,sequence_length
,learning_rate
,warmup_ratio
,train_kind
[hyperparameters.parallelism]
forwarded verbatim:use_deepspeed
,deepspeed_stage
,fsdp
,bf16
,fp16
,tensor_parallel_size
,pipeline_parallel_size
-
[algorithm]
(ignored by client): sometimes used in examples for documentation only
- Dataset path must exist; otherwise the CLI prompts/aborts
- Dataset JSONL checked for
messages
structure - Backend requires
compute.gpu_type
; missing value yields HTTP 400 at create job
model
from[job].model
training_type = "sft_offline"
hyperparameters
from[hyperparameters]
plus selected[training.validation]
keysmetadata.effective_config.compute
from[compute]
metadata.effective_config.data.topology
from[data.topology]
metadata.effective_config.training.{mode,use_qlora}
from[training]