Overview

Synth-AI supports clustered online RL with a task app, vLLM, and a trainer service orchestrated via backend workflows. Jobs are created through the backend at POST /api/rl/jobs and monitored via learning job endpoints and SSE.

Key Concepts

  • Task App: The environment-facing service that exposes /rollout and /health. Clients provide its base URL when creating a job.
  • Trainer: Orchestrates evaluation and training. The backend injects the trainer start URL server-side.
  • Server-injected URLs: The backend resolves and injects sensitive URLs (e.g., training_start_url). Public clients should never provide private trainer URLs.

Creating a Job

POST /api/rl/jobs with the following minimum payload:
{
  "job_type": "rl",
  "data": {
    "model": "Qwen/Qwen3-0.6B",
    "endpoint_base_url": "https://...task-app.modal.run",
    "job_config_id": "<your-config-id>"
  }
}
Notes:
  • Do not include training_start_url from the client; the backend sets it based on environment.
  • Include compute/topology/overrides via data as needed; the backend validates and merges them.

SDK Behavior

  • The RlClient.create_job method now omits training_start_url. The backend injects it.
  • Example scripts under synth-ai/examples/rl validate the task app URL but do not send trainer URLs.

Production vs Development

  • The backend uses a fixed trainer URL per environment. Development and production trainer URLs are resolved server-side and are not exposed to clients.

Monitoring

  • Use GET /api/learning/jobs/{id} for status and GET /api/learning/jobs/{id}/events for SSE streaming.
  • The SDK also provides JobHandle.poll_until_terminal and helpers for streaming events.