training.use_qlora = false (default) and typical FFT hyperparameters.
- Invoke via:
uvx synth-ai train --type sft --config <path> - Uses the same client/payload path as LoRA; differs only in training mode/toggles and typical hyperparameters/parallelism
FFT vs LoRA/QLoRA
- FFT (full finetune) updates all weights. Best final quality, higher VRAM/compute.
- LoRA updates adapters on top of frozen weights; faster/cheaper, smaller artifacts.
- QLoRA uses 4-bit quantization for adapters; further reduces memory at some quality/latency tradeoffs.
- Switch by toggling
training.use_qloraand (optionally)training.mode:- FFT:
training.use_qlora = false,hyperparameters.train_kind = "fft" - LoRA:
training.use_qlora = false,training.mode = "lora",hyperparameters.train_kind = "peft" - QLoRA:
training.use_qlora = true,training.mode = "lora",hyperparameters.train_kind = "peft"
- FFT:
Quickstart
Minimal TOML (FFT)
What the client validates and sends
- Validates dataset path existence and JSONL records
- Uploads files to
/api/learning/files, then creates/starts job under/api/learning/jobs - Payload mapping is identical to LoRA SFT: hyperparameters +
metadata.effective_config(compute, data.topology, training)
Multi‑GPU guidance (FFT)
- Use
[compute]for cluster shape - Prefer
[hyperparameters.parallelism]for deepspeed stage, FSDP, precision, TP/PP sizes; forwarded verbatim [data.topology]is optional and informational; backend/trainer validates actual resource consistency
GPU options
- Single-GPU: A10G/L40S/H100 for small to mid models (≤7B). Increase
gradient_accumulation_steps. - Multi-GPU single-node: H100 2x/4x for 14B–32B FFT. Use ZeRO-2/3 and optionally FSDP.
- Multi-node: H100 with RDMA for very large FFT or MoE. Provide
nodes > 1and topology in[hyperparameters.parallelism].
Common issues
- HTTP 400
missing_gpu_type: add[compute].gpu_type - Dataset not found: specify absolute path or use
--dataset(paths resolved from current working directory)
Helpful CLI flags
--examples Nto subset data for a quick smoke test--dry-runto preview payload before submitting
All sections and parameters (FFT)
-
[job](client reads)model(string, required): base model identifierdataordata_path(string): training JSONL (required unless--datasetprovided)
-
[compute](forwarded into metadata.effective_config.compute)gpu_type(string): required by backendgpu_count(int)nodes(int, optional)
-
[data]/[data.topology]topology(table): forwarded intometadata.effective_config.data.topologyvalidation_path(string, optional): if present and exists, is uploaded to enable validation
-
[training]mode(string, optional): copied to metadata for visibilityuse_qlora(bool, default false)[training.validation]keys promoted into hyperparameters:enabled(bool, default true) -> surfaced into metadata.effective_config.training.validation.enabledevaluation_strategy(string, default “steps”)eval_steps(int, default 0)save_best_model_at_end(bool, default true)metric_for_best_model(string, default “val.loss”)greater_is_better(bool, default false)
-
[hyperparameters]n_epochs(int, default 1)- Optional:
batch_size,global_batch,per_device_batch,gradient_accumulation_steps,sequence_length,learning_rate,warmup_ratio,train_kind [hyperparameters.parallelism]forwarded verbatim:use_deepspeed,deepspeed_stage,fsdp,bf16,fp16,tensor_parallel_size,pipeline_parallel_size
-
[algorithm](ignored by client): sometimes used in examples for documentation only
- Dataset path must exist; otherwise the CLI prompts/aborts
- Dataset JSONL checked for
messagesstructure - Backend requires
compute.gpu_type; missing value yields HTTP 400 at create job
modelfrom[job].modeltraining_type = "sft_offline"hyperparametersfrom[hyperparameters]plus selected[training.validation]keysmetadata.effective_config.computefrom[compute]metadata.effective_config.data.topologyfrom[data.topology]metadata.effective_config.training.{mode,use_qlora}from[training]