Skip to main content

synth agent

The agent command runs and manages Research Agent jobs for automated prompt optimization. Research Agents spawn sandboxed environments that analyze your code and apply MIPRO optimization to improve prompt performance.

Commands

agent run

Start a new Research Agent job.
synth agent run [options]
Options:
OptionDescription
--config, -c <path>Path to TOML configuration file (recommended)
--repo, -r <url>Repository URL (alternative to config file)
--branch, -b <branch>Repository branch (default: main)
--task, -t <text>Task description for the agent
--dataset, -d <id>HuggingFace dataset ID (e.g., PolyAI/banking77)
--tool <tool>Optimization tool: mipro (default: mipro)
--model, -m <model>Agent model (default: gpt-5.1-codex-mini)
--reasoning-effort <level>Reasoning effort: low, medium, high (default: medium)
--iterations, -n <n>Number of optimization iterations (default: 10)
--max-agent-spend <usd>Max agent LLM spend in USD (default: 25.0)
--max-synth-spend <usd>Max optimization spend in USD (default: 150.0)
--poll/--no-pollWait for completion and stream events (default: —poll)
--timeout <seconds>Timeout when polling (default: 3600)
Examples:
# From config file (recommended)
synth agent run --config research.toml

# Quick job with CLI options
synth agent run \
    --repo https://github.com/your-org/repo \
    --task "Optimize intent classification accuracy" \
    --dataset PolyAI/banking77 \
    --iterations 10

# Start without polling
synth agent run --config research.toml --no-poll

agent status

Check the status of an existing job.
synth agent status <job_id>
Example:
synth agent status ra_abc123def456
Output:
Job: ra_abc123def456
Status: running
Progress: Iteration 3/10
Current Metric: 0.847
Elapsed: 12m 34s

agent list

List recent Research Agent jobs.
synth agent list [options]
Options:
OptionDescription
--limit <n>Number of jobs to show (default: 10)
--status <status>Filter by status: queued, running, succeeded, failed

agent events

Stream events from a research agent job.
synth agent events <job_id> [options]
Options:
OptionDescription
--since <n>Show events after this sequence number (default: 0)
--follow, -fFollow events in real-time
Example:
# Show all events
synth agent events ra_abc123def456

# Follow events in real-time
synth agent events ra_abc123def456 --follow

agent results

Get results from a completed research agent job.
synth agent results <job_id> [options]
Options:
OptionDescription
--output, -o <path>Write results to file (JSON)
Example:
# Print results to stdout
synth agent results ra_abc123def456

# Save to file
synth agent results ra_abc123def456 -o results.json

agent cancel

Cancel a running job.
synth agent cancel <job_id>

Configuration File

Research Agent jobs are configured via TOML files. Here’s a complete example:
[research_agent]
# Repository to optimize (public GitHub URL)
repo_url = "https://github.com/your-org/your-pipeline"
repo_branch = "main"

# Agent configuration
model = "gpt-5.1-codex-mini"
reasoning_effort = "medium"  # low, medium, high

# Spend limits (USD)
max_agent_spend_usd = 25.0    # Agent LLM calls
max_synth_spend_usd = 150.0   # Optimization experiments

[research_agent.research]
# What to optimize
task_description = """
Optimize the prompt for Banking77 intent classification.

The Banking77 dataset contains customer banking queries that need to be
classified into 77 intent categories.

Goals:
1. Load dataset from /app/data/
2. Create evaluation pipeline
3. Use MIPRO to optimize the system prompt
4. Save best prompt to /app/artifacts/
"""

# Optimization settings
tools = ["mipro"]              # Optimization algorithm
primary_metric = "accuracy"    # Metric to optimize
num_iterations = 10            # Number of iterations

# Dataset source
[[research_agent.research.datasets]]
source_type = "huggingface"
hf_repo_id = "PolyAI/banking77"
hf_split = "train"

# MIPRO-specific settings
[research_agent.research.mipro_config]
meta_model = "llama-3.3-70b-versatile"
meta_provider = "groq"
num_trials = 15
proposer_effort = "MEDIUM"

Configuration Reference

[research_agent]

FieldTypeDescription
repo_urlstringGitHub repository URL
repo_branchstringBranch to use (default: “main”)
modelstringAgent model (e.g., “gpt-5.1-codex-mini”)
reasoning_effortstringlow, medium, or high
max_agent_spend_usdfloatMax agent inference spend
max_synth_spend_usdfloatMax optimization spend

[research_agent.research]

FieldTypeDescription
task_descriptionstringDetailed optimization instructions
toolsarrayOptimization tools: [“mipro”]
primary_metricstringMetric to optimize (default: “accuracy”)
num_iterationsintNumber of optimization iterations

[[research_agent.research.datasets]]

FieldTypeDescription
source_typestring”huggingface”, “upload”, or “inline”
hf_repo_idstringHuggingFace dataset ID
hf_splitstringDataset split (default: “train”)

[research_agent.research.mipro_config]

FieldTypeDescription
meta_modelstringModel for generating proposals
meta_providerstringProvider: “groq”, “openai”, “google”
num_trialsintNumber of optimization trials
proposer_effortstringLOW_CONTEXT, LOW, MEDIUM, HIGH

Environment Variables

VariableDescription
SYNTH_API_KEYYour Synth API key (required)

Examples

Banking77 Intent Classification

# banking77.toml
[research_agent]
repo_url = "https://github.com/synth-labs/banking77-pipeline"
model = "gpt-5.1-codex-mini"
max_agent_spend_usd = 25.0

[research_agent.research]
task_description = "Optimize intent classification accuracy"
tools = ["mipro"]
primary_metric = "accuracy"
num_iterations = 10

[[research_agent.research.datasets]]
source_type = "huggingface"
hf_repo_id = "PolyAI/banking77"
Run:
export SYNTH_API_KEY="your-key"
synth agent run --config banking77.toml --poll

Quick Iteration (Iris Dataset)

# iris.toml - Fast test with small dataset
[research_agent]
model = "gpt-5.1-codex-mini"
max_agent_spend_usd = 10.0

[research_agent.research]
task_description = "Classify iris flowers into setosa, versicolor, virginica"
tools = ["mipro"]
num_iterations = 5

[[research_agent.research.datasets]]
source_type = "huggingface"
hf_repo_id = "scikit-learn/iris"

[research_agent.research.mipro_config]
num_trials = 8

CI/CD Integration

#!/bin/bash
# optimize.sh - Run in CI pipeline

set -e

export SYNTH_API_KEY="${SYNTH_API_KEY}"

# Run job with polling (blocks until complete)
synth agent run \
    --config research.toml \
    --poll \
    --timeout 2400

# On success, get results
synth agent list --status succeeded --limit 1
For non-blocking CI:
#!/bin/bash
# Start job without polling
OUTPUT=$(synth agent run --config research.toml --no-poll 2>&1)
JOB_ID=$(echo "$OUTPUT" | grep -oP 'ra_[a-f0-9]+')
echo "Started job: $JOB_ID"

# In a later step, check status
synth agent status $JOB_ID

Output

When using --poll, the CLI shows real-time progress:
Research Agent Job: ra_abc123def456
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

[00:00] Starting sandbox...
[00:15] Cloning repository...
[00:23] Analyzing codebase...
[01:02] Setting up evaluation pipeline...
[02:45] Running MIPRO optimization...
        Iteration 1/10 - accuracy: 0.723
        Iteration 2/10 - accuracy: 0.756
        Iteration 3/10 - accuracy: 0.801
        ...
        Iteration 10/10 - accuracy: 0.892
[18:34] Saving artifacts...
[18:45] Complete!

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Status: succeeded
Best Metric: 0.892 (↑ 23.4% from baseline)
Duration: 18m 45s

Artifacts:
  - optimized_prompt.txt
  - optimization_report.md
  - changes.diff

See Also