Overview
Task apps expose the environment and evaluation endpoints (e.g.,/rollout
, /health
). Provide the task app base URL when creating an RL job.
Health and readiness
- Expose
/health
and/readyz
for rollout token minting and diagnostics.
Security
- Rollouts use organization-scoped tokens minted by the backend. Keep task app private; only the base URL is provided by the client.
Rollout endpoint
- Path:
POST /rollout
- Auth: header
X-API-Key: <rollout token>
- Tokens are issued by the backend per job at
POST /api/rl/jobs/{id}/tokens/rollout
and scoped torollout
.
- Tokens are issued by the backend per job at
Request body (JSON)
ops
specifies the interleave of agent and environment steps up tomax_steps
per episode.- The task app should use
policy.config.inference_url
to call the LLM policy, and step the environment accordingly. - Determinism: respect the provided
seed
when applicable.
Response body (200 OK)
- Include
steps
with per-stepobs
, optionaltool_calls
, andreward
when defined by the task. - Include a
final.observation
block; environment-specific handlers computeepisode_return
from this. - For long-running rollouts, you may return
303 See Other
with aLocation
to poll; the trainer will follow until completion. - If the horizon is insufficient,
422 Unprocessable Entity
is acceptable; the trainer will retry with a largermax_steps
once.