Skip to main content

Overview

synth-ai train --type sft orchestrates three REST endpoints exposed by the Synth learning service:
  1. POST /learning/files – upload training (and optional validation) datasets in JSONL format.
  2. POST /learning/jobs – create an offline SFT job that references the uploaded file IDs.
  3. POST /learning/jobs/{job_id}/start – trigger the created job.
Each request is sent to the backend base URL provided via --backend (defaults to http://localhost:8000/api) and uses the bearer token stored in SYNTH_API_KEY.

Upload Dataset — POST /learning/files

Send a multipart upload for every dataset you intend to train or validate against. The response returns a file ID that you must include in the job payload.
POST {backend_base}/learning/files
Authorization: Bearer <SYNTH_API_KEY>
Content-Type: multipart/form-data
file=@training.jsonl
Response (201)
{
  "id": "file_01J9A8R1XZ8TAXP6AF0BVF3Y4K",
  "filename": "training.jsonl",
  "sha256": "…",
  "bytes": 123456
}
Upload validation sets the same way; list their IDs under metadata.effective_config.data.validation_files when creating the job.

Create Job — POST /learning/jobs

Once uploads are complete, the CLI builds the job payload (synth_ai/api/train/builders.py) and calls the job creation endpoint.
POST {backend_base}/learning/jobs
Authorization: Bearer <SYNTH_API_KEY>
Content-Type: application/json

{
  "model": "Qwen/Qwen3-4B",
  "training_type": "sft_offline",
  "training_file_id": "file_01J9A8R1XZ8TAXP6AF0BVF3Y4K",
  "hyperparameters": {
    "n_epochs": 3,
    "batch_size": 64,
    "learning_rate": 2e-5,
    "warmup_ratio": 0.1
  },
  "metadata": {
    "effective_config": {
      "compute": { "gpu_type": "A100", "gpu_count": 4 },
      "data": { "topology": {} },
      "training": { "mode": "qlora" }
    }
  }
}
Response (201)
{
  "job_id": "job_01J9A9EFKYH36J6PPZ79PAY3R2",
  "status": "created"
}

Start Job — POST /learning/jobs/{job_id}/start

Immediately start the job after creation. This call is issued inside handle_sft() once the job has been provisioned.
POST {backend_base}/learning/jobs/{job_id}/start
Authorization: Bearer <SYNTH_API_KEY>
Content-Type: application/json

{}
Response (200)
{ "status": "running" }

Polling Status

SFTJobPoller (synth_ai/api/train/pollers.py) polls GET /learning/jobs/{job_id} until the status is terminal (succeeded, failed, or cancelled). Poll responses surface progress updates, checkpoint metadata, and the produced model ID (ft:…).

Error Handling

  • 400 / 422: Payload or dataset schema invalid. The CLI performs local validation (validate_sft_jsonl), but backend schema drifts appear here.
  • 409: Duplicate creation. Use --idempotency to supply an Idempotency-Key header when you need dedupe semantics.
  • 5xx: Backend issue. Inspect logs and retry once the service is healthy.
  • synth_ai/api/train/cli.py: handle_sft() coordinates dataset discovery, uploads, job creation, start, and polling.
  • synth_ai/api/train/builders.py: constructs the payload used in job creation.
I