Run an Eval Job From Your Terminal

This guide walks through creating an eval job directly from the Synth AI TUI.

Prerequisites

Install the Synth AI SDK in your Python environment:

pip install synth-ai

Launch the TUI

In the same environment where synth-ai is installed, run:

synth-ai

This opens the TUI where you can view existing jobs and create new ones.

Create a New Job

Press n to create a new job. You’ll be prompted to select or create a Container file. Select Container

Container File

A Container is a FastAPI app that defines your evaluation task. It includes:

A dataset to evaluate against
A rollout handler that calls your LLM and scores responses
Task metadata for the Synth AI backend

You can use an existing Container file or create a new one. For a working example, download the Banking77 classification task, either through cURL or from GitHub directly:

curl -L -o container.py \
  https://raw.githubusercontent.com/synth-laboratories/cookbooks/main/code/demos/banking77/banking77_container.py

If creating a new file, choose where to save it: Save location

You can optionally open the file in your editor to review or customize it: Open for review

Select Job Type

After selecting your Container file, choose the job type. Select eval to run an eval job:

Job Execution

When you start the job, the TUI:

Loads and validates your Container module
Starts a local FastAPI server
Creates a Cloudflare Tunnel so Synth AI can reach your machine
Submits the eval job to Synth AI
Runs rollouts against your dataset and aggregates scores

View Results

The TUI displays the job status and results as the eval runs:

The eval details show:

Progress: Completed rollouts out of total
Mean Reward: Average score across all rollouts
Avg Reward: Running average
Pass Rate: Percentage of rollouts that passed

Ready to get started?

Get Started

Schedule Demo

See Synth in action with a personalized walkthrough.

Walkthroughs

​Prerequisites

​Launch the TUI

​Create a New Job

​Container File

​Select Job Type

​Job Execution

​View Results

​Ready to get started?