Skip to main content
RL training requires a task app that can stream experience, compute rewards, and expose rubric-driven judges. The better your task app, the easier it is to scale RL jobs. The math demo introduced in the quickstart demonstrates this wiring end-to-end.

Responsibilities

Your task app must:
  1. Provide a TaskAppConfig registered through register_task_app.
  2. Implement /rollout so the trainer can request multi-step episodes.
  3. Emit RolloutResponse objects with:
    • trajectories (legacy format) or a v3 trace payload.
    • metrics including mean_return, episode_returns, and optional judge scores.
    • pipeline_metadata for debugging (e.g., policy version, prompt used).
  4. Advertise dataset metadata and rubrics in TaskInfo to guide judge selection.
  5. Guard endpoints with ENVIRONMENT_API_KEY via require_api_key_dependency.
from synth_ai.task.server import TaskAppConfig, RubricBundle
from synth_ai.task.tracing_utils import (
    build_tracer_factory,
    resolve_tracing_db_url,
    tracing_env_enabled,
)
from synth_ai.tracing_v3.session_tracer import SessionTracer

def build_config() -> TaskAppConfig:
    tracing_enabled = tracing_env_enabled(default=True)
    tracer_factory = build_tracer_factory(
        SessionTracer,
        enabled=tracing_enabled,
        db_url=resolve_tracing_db_url(),
    )

    app_state = {"tracing_enabled": tracing_enabled}
    if tracer_factory:
        app_state["session_tracer_factory"] = tracer_factory

    return TaskAppConfig(
        app_id="grpo-crafter",
        name="Crafter Task App",
        description="Supports RL rollouts, tracing, and vendor proxies.",
        base_task_info=base_info,
        describe_taskset=lambda: describe_taskset(dataset),
        provide_task_instances=lambda seeds: provide_instances(dataset, base_info, seeds),
        rollout=rollout_executor,
        dataset_registry=registry,
        rubrics=RubricBundle(outcome=OUTCOME_RUBRIC, events=EVENTS_RUBRIC),
        proxy=ProxyConfig(enable_openai=True, enable_groq=True),
        app_state=app_state,
    )

Step rewards

Inside rollout_executor, emit event-level rewards so the trainer can optimise shaped signals:
await tracer.record_event_reward(
    event_id=event_id,
    turn_number=turn,
    reward_value=float(delta),
    reward_type="achievement_delta",
    source="environment",
)
The Crafter example maintains both cumulative and “unique achievement” rewards—handy for GSPO-style objectives.

Judges

Expose rubrics via RubricBundle and ensure your rollout metrics include judge outputs:
metrics = RolloutMetrics(
    episode_returns=returns,
    mean_return=sum(returns) / len(returns),
    outcome_score=judge_scores.get("outcome"),
    events_score=judge_scores.get("events"),
    details=judge_scores,
)
The CLI prints these during uvx synth-ai eval, and RL training can target them directly.

Local smoke test

See Run Task Apps Locally and Run Evaluations for full flag references.
uvx synth-ai serve your-task-id --trace traces/v3

uvx synth-ai eval \
  --app-id your-task-id \
  --model Qwen/Qwen3-4B \
  --seeds 1-5
Confirm that:
  • /health and /task_info return HTTP 200.
  • The eval command prints judge metrics.
  • Traces appear in traces/v3/synth_ai.db.

Production deployment

Need CLI specifics? Check Deploy Task Apps and Run Task Apps on Modal. Deploy to Modal when ready:
uvx synth-ai deploy your-task-id \
  --name my-task-app \
  --trace traces/v3
Verify the hosted endpoint before scheduling a long RL run:
curl -H "X-API-Key: $ENVIRONMENT_API_KEY" https://<modal>.modal.run/health
curl -H "X-API-Key: $ENVIRONMENT_API_KEY" https://<modal>.modal.run/task_info
Now you can reference the hosted URL in the RL TOML (services.task_url) and fire off training jobs confidently. Tip: the CLI auto-loads the .env produced by uvx synth-ai setup / uvx synth-ai demo setup. Use --env-file only when you need to override or layer additional secrets.