Overview
Synth-AI supports clustered online RL with a task app, vLLM, and a trainer service orchestrated via backend workflows. Jobs are created through the backend atPOST /api/rl/jobs
and monitored via learning job endpoints and SSE.
Key Concepts
- Task App: The environment-facing service that exposes
/rollout
and/health
. Clients provide its base URL when creating a job. - Trainer: Orchestrates evaluation and training. The backend injects the trainer start URL server-side.
- Server-injected URLs: The backend resolves and injects sensitive URLs (e.g.,
training_start_url
). Public clients should never provide private trainer URLs.
Creating a Job
POST/api/rl/jobs
with the following minimum payload:
- Do not include
training_start_url
from the client; the backend sets it based on environment. - Include compute/topology/overrides via
data
as needed; the backend validates and merges them.
SDK Behavior
- The
RlClient.create_job
method now omitstraining_start_url
. The backend injects it. - Example scripts under
synth-ai/examples/rl
validate the task app URL but do not send trainer URLs.
Production vs Development
- The backend uses a fixed trainer URL per environment. Development and production trainer URLs are resolved server-side and are not exposed to clients.
Monitoring
- Use
GET /api/learning/jobs/{id}
for status andGET /api/learning/jobs/{id}/events
for SSE streaming. - The SDK also provides
JobHandle.poll_until_terminal
and helpers for streaming events.