> ## Documentation Index
> Fetch the complete documentation index at: https://docs.usesynth.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Benchmark Improvement

> Run a benchmark-focused Managed Research task with evidence and budget controls.

Use this cookbook when the worker should improve a benchmark result or reproduce a benchmark failure.

## Goal

Give the worker the benchmark command, budget, target metric, and review criteria. Require evidence for every claimed improvement.

## Launch

```python theme={null}
run = client.research.runs.start(
    "Improve the benchmark result without changing evaluation configs. Run the benchmark command, compare against baseline, and report evidence.",
    host_kind="daytona",
    work_mode="directed_effort",
    providers=[{"provider": "openrouter"}],
    runbook="heavy",
    timebox_seconds=60 * 60,
)
```

## Prompt details to include

* benchmark command
* baseline metric and target metric
* allowed files and forbidden files
* maximum runtime or spend
* required report format

## Expected evidence

* baseline and candidate metrics
* changed files
* command output
* artifact manifest
* final report with reproducibility notes

## Failure notes

Use `runbook="heavy"` when the benchmark requires longer multi-actor work. Use `directed_effort` when the target metric and command are known.
