> ## Documentation Index
> Fetch the complete documentation index at: https://docs.usesynth.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Eval Harness Improvement

> Use Managed Research to improve an eval harness and leave reviewable evidence.

Use this cookbook when the target is an eval harness, benchmark runner, or scoring workflow that needs reliability, clarity, or better failure evidence.

## Goal

Start a directed run that inspects the harness, makes the smallest high-impact improvement, runs the relevant check, and returns a report with artifacts.

## Python path

```python theme={null}
run = client.research.runs.start(
    "Inspect the eval harness, fix the highest-leverage reliability issue, run the relevant check, and leave evidence.",
    host_kind="daytona",
    work_mode="directed_effort",
    providers=[{"provider": "openrouter"}],
    runbook="lite",
)
```

## MCP path

Ask your MCP client:

```text theme={null}
Start a Managed Research run to improve the eval harness. Use directed_effort, daytona, openrouter, and runbook lite. Require a final report with the command run, failures found, patch summary, and artifacts.
```

## Expected evidence

* changed files or a PR
* command output or failure summary
* artifact manifest
* final report explaining what improved and what remains risky

## Failure notes

If the run cannot launch, preflight usually points to repo access, missing credentials, provider availability, or budget state.
