Skip to main content
Managed Research runs Synth’s optimization and evaluation APIs — GEPA, MIPRO, dataset assembly, harness builds — as an overnight service against your actual codebase. You connect a repo, point it at your traces or dataset, and wake up to experiment results, an optimized prompt or policy, and a proof bundle. It’s the same Synth AI SDK you’d run yourself, orchestrated by agents that know how to build harnesses, run trial matrices, score outputs, and ship artifacts without you babysitting it.

What it runs

  • Prompt and policy optimization — baseline → GEPA/MIPRO → holdout against your labeled dataset, with before/after scores and the winning candidate as a PR
  • Evaluation loops — nightly runs against a versioned harness and dataset; structured scoring on every run
  • Dataset and eval assembly — agents build dataset splits and verifiers from your traces and repo

How it works

You trigger a run. An orchestrator agent claims it, provisions your repo into a workspace, and dispatches worker agents into isolated Daytona sandboxes. Workers read their task instructions and project config, run optimization (GEPA or MIPRO via the pre-deployed run_gepa.py), and emit artifacts as they go. When it’s done you get a report_md and any result files the workers registered, plus github_pr artifacts for any PRs opened on your repo.