Skip to main content

synth_ai.cli.commands.eval.config

Eval command configuration loading and normalization. This module handles loading and resolving evaluation configuration from:
  • TOML config files (legacy eval format or prompt_learning format)
  • Command-line arguments (override config values)
  • Environment variables (for API keys, etc.)
Config File Formats:
  1. Legacy Eval Format:
    [eval]
    app_id = "banking77"
    url = "http://localhost:8103"
    env_name = "banking77"
    seeds = [0, 1, 2, 3, 4]
    
    [eval.policy_config]
    model = "gpt-4"
    provider = "openai"
    
  2. Prompt Learning Format:
    [prompt_learning]
    task_app_id = "banking77"
    task_app_url = "http://localhost:8103"
    
    [prompt_learning.gepa]
    env_name = "banking77"
    
    [prompt_learning.gepa.evaluation]
    seeds = [0, 1, 2, 3, 4]
    
See Also:
  • synth_ai.cli.commands.eval.core.eval_command(): CLI entry point
  • synth_ai.cli.commands.eval.runner.run_eval(): Uses resolved config

Functions

load_eval_toml

load_eval_toml(path: Path) -> dict[str, Any]

resolve_eval_config

resolve_eval_config() -> EvalRunConfig
Resolve evaluation configuration from multiple sources. Loads configuration from TOML file (if provided) and merges with CLI arguments. CLI arguments take precedence over config file values. Config File Formats:
  • Legacy eval format: [eval] section
  • Prompt learning format: [prompt_learning] section
Precedence Order:
  1. CLI arguments (highest priority)
  2. Config file values
  3. Default values
Args:
  • config_path: Path to TOML config file (optional)
  • cli_app_id: App ID from CLI (overrides config)
  • cli_model: Model name from CLI (overrides config)
  • cli_seeds: Seeds list from CLI (overrides config)
  • cli_url: Task app URL from CLI (overrides config)
  • cli_backend_url: Backend URL from CLI (overrides config)
  • cli_concurrency: Concurrency from CLI (overrides config)
  • seed_set: Which seed pool to use (“seeds”, “validation_seeds”, “test_pool”)
  • metadata: Metadata key-value pairs for filtering
Returns:
  • Resolved EvalRunConfig with all values merged.
Raises:
  • FileNotFoundError: If config file is specified but doesn’t exist.

Classes

EvalRunConfig

Configuration for evaluation runs. This dataclass holds all configuration needed to execute an evaluation against a task app. Values can come from TOML config files, CLI arguments, or environment variables. Required Fields: app_id: Task app identifier task_app_url: URL of running task app (or None to spawn locally) seeds: List of seeds/indices to evaluate Optional Fields: env_name: Environment name (usually matches app_id) policy_config: Model and provider configuration backend_url: Backend URL for trace capture (enables backend mode) concurrency: Number of parallel rollouts return_trace: Whether to include traces in responses Example:
config = EvalRunConfig(
    app_id="banking77",
    task_app_url="http://localhost:8103",
    backend_url="http://localhost:8000",
    env_name="banking77",
    seeds=[0, 1, 2, 3, 4],
    policy_config={"model": "gpt-4", "provider": "openai"},
    concurrency=5,
    return_trace=True,
)