Judging Workflows

Workflows improves graphs by scoring candidate outputs against your dataset. You choose the scoring mode in:

{ "judge_config": { "mode": "rubric" | "contrastive" | "gold_examples" } }

Rubric mode (default)

Use rubric when you can describe quality with criteria.

Add criteria per‑task via tasks[].rubric and/or globally via default_rubric.
Criteria arrays are merged: task criteria first, then defaults.
Best for classification, extraction, and structured outputs.

Use contrastive for open‑ended generation where “good” is about style or feel.

Provide gold outputs that represent the target distribution.
The judge compares candidates to gold examples and scores closeness.
Best for writing style, creative text, image/video generation, and other subjective outputs.

Examples:

Text style matching: outlines in, reference essays out. See cookbooks/workflows/style-matching.
Image style matching: prompts in, reference images out (base64 data URLs) with a VLM‑capable judge. See cookbooks/workflows/image-style-matching.

Use gold_examples when gold outputs should be shown as reference context rather than used for direct comparison.

Gold outputs are included as few‑shot references to the judge.
Helpful when you want “match these patterns” but still rely on rubric‑like scoring.