Skip to main content
Workflows improves graphs by scoring candidate outputs against your dataset. You choose the scoring mode in:
{ "judge_config": { "mode": "rubric" | "contrastive" | "gold_examples" } }

Rubric mode (default)

Use rubric when you can describe quality with criteria.
  • Add criteria per‑task via tasks[].rubric and/or globally via default_rubric.
  • Criteria arrays are merged: task criteria first, then defaults.
  • Best for classification, extraction, and structured outputs.

Contrastive mode

Use contrastive for open‑ended generation where “good” is about style or feel.
  • Provide gold outputs that represent the target distribution.
  • The judge compares candidates to gold examples and scores closeness.
  • Best for writing style, creative text, image/video generation, and other subjective outputs.
Examples:
  • Text style matching: outlines in, reference essays out. See cookbooks/workflows/style-matching.
  • Image style matching: prompts in, reference images out (base64 data URLs) with a VLM‑capable judge. See cookbooks/workflows/image-style-matching.

Gold examples mode

Use gold_examples when gold outputs should be shown as reference context rather than used for direct comparison.
  • Gold outputs are included as few‑shot references to the judge.
  • Helpful when you want “match these patterns” but still rely on rubric‑like scoring.