RL - Synth

Overview

Synth-AI supports clustered online RL with a task app, vLLM, and a trainer service orchestrated via backend workflows. Jobs are created through the backend at POST /api/rl/jobs and monitored via learning job endpoints and SSE.

Key Concepts

Task App: The environment-facing service that exposes /rollout and /health. Clients provide its base URL when creating a job.
Trainer: Orchestrates evaluation and training. The backend injects the trainer start URL server-side.
Server-injected URLs: The backend resolves and injects sensitive URLs (e.g., training_start_url). Public clients should never provide private trainer URLs.

Creating a Job

POST /api/rl/jobs with the following minimum payload:

{
  "job_type": "rl",
  "data": {
    "model": "Qwen/Qwen3-0.6B",
    "endpoint_base_url": "https://...task-app.modal.run",
    "job_config_id": "<your-config-id>"
  }
}

Notes:

Do not include training_start_url from the client; the backend sets it based on environment.
Include compute/topology/overrides via data as needed; the backend validates and merges them.

SDK Behavior

The RlClient.create_job method now omits training_start_url. The backend injects it.
Example scripts under synth-ai/examples/rl validate the task app URL but do not send trainer URLs.

Production vs Development

The backend uses a fixed trainer URL per environment. Development and production trainer URLs are resolved server-side and are not exposed to clients.

Monitoring

Use GET /api/learning/jobs/{id} for status and GET /api/learning/jobs/{id}/events for SSE streaming.
The SDK also provides JobHandle.poll_until_terminal and helpers for streaming events.

Synth-AI

​Overview

​Key Concepts

​Creating a Job

​SDK Behavior

​Production vs Development