Skip to main content

Overview

References:
  • GEPA: Agrawal et al. (2025). “GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning.” arXiv:2507.19457
  • MIPRO: Opsahl-Ong et al. (2024). “Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs.” arXiv:2406.11695
System specifications (specs) are structured JSON documents that define task principles, rules, policies, and constraints for prompt optimization. Both GEPA and MIPRO can use specs to guide prompt generation with domain-specific knowledge.

What Are Specs?

Specs are JSON files that encode:
  • Principles: High-level guidelines for the task
  • Rules: Specific policies with priorities (0-10)
  • Constraints: Must/must-not/should directives
  • Examples: Good and bad examples for each rule
  • Glossary: Domain-specific terminology
  • Interfaces: Input/output formats and capabilities

Example Spec Structure

{
  "metadata": {
    "id": "spec.banking77_pipeline.v1",
    "title": "Banking77 Two-Stage Classification Pipeline Specification",
    "version": "1.0.0",
    "scope": "banking-intent-classification-pipeline"
  },
  "principles": [
    {
      "id": "P-clarity",
      "text": "Prioritize immediate-action intents over informational queries when multiple interpretations are possible.",
      "rationale": "Customers with urgent issues need immediate assistance."
    }
  ],
  "rules": [
    {
      "id": "R-card-disambiguation",
      "title": "Disambiguate card arrival vs. card payment issues",
      "priority": 10,
      "constraints": {
        "must": [
          "Classify as 'lost_or_stolen_card' if payment-related keywords are present",
          "Classify as 'card_arrival' if delivery-related keywords are present"
        ],
        "must_not": [
          "Assume 'card' always refers to physical delivery"
        ]
      },
      "examples": [
        {
          "kind": "good",
          "prompt": "My card was declined at the store",
          "response": "declined_card_payment",
          "description": "Payment keyword indicates payment issue"
        }
      ]
    }
  ],
  "glossary": [
    {
      "term": "disambiguate",
      "definition": "To distinguish between multiple plausible interpretations",
      "aliases": ["clarify", "distinguish"]
    }
  ]
}

How Specs Are Used

GEPA: Spec-Guided Mutations

When GEPA uses proposer_type = "spec", the spec is included in mutation prompts:
[prompt_learning.gepa]
proposer_type = "spec"  # Use spec mode
spec_path = "examples/task_apps/banking77_pipeline/banking77_pipeline_spec.json"
spec_max_tokens = 5000
spec_include_examples = true
spec_priority_threshold = 8  # Only include high-priority rules (8+)
How it works:
  1. Spec Loading: GEPA loads the spec JSON file at initialization
  2. Context Serialization: Spec is converted to compact markdown format (up to spec_max_tokens)
  3. Mutation Prompts: Spec context is injected into LLM-guided mutation prompts
  4. Rule Filtering: Only rules with priority >= spec_priority_threshold are included
Mutation Prompt Structure:
You are a prompt engineering expert. Improve the instruction text (DSPy-style).

Requirements:
- Preserve placeholders (e.g., {{query}}) and tool names
- Be precise, action-oriented, and unambiguous
- Keep guidance concise; avoid fluff

Current instruction:
{classifier_instruction}

Feedback (hints to address):
{feedback_text}

## System Specification
(Task principles, rules, and policies from spec document)
{spec_context}

Output: 1-3 bullet snippets (1-2 sentences each) that replace/augment the instruction.

MIPRO: Spec-Enhanced Meta-Prompts

MIPRO includes spec context in meta-prompts for instruction generation:
[prompt_learning.mipro]
spec_path = "examples/task_apps/banking77_pipeline/banking77_pipeline_spec.json"
spec_max_tokens = 5000
spec_include_examples = true
spec_priority_threshold = 8
How it works:
  1. Spec Loading: MIPRO loads the spec during initialization
  2. Compact Context: Spec is serialized to compact format (respects spec_max_tokens)
  3. Meta-Prompt Injection: Spec context is added to meta-LLM prompts
  4. Instruction Generation: Meta-LLM uses spec rules/principles to generate better instructions
Meta-Prompt Structure:
You are optimising the instruction for a specific stage within a multi-stage language model pipeline.

## Pipeline Overview
{pipeline_overview}

## Focus Stage
Stage ID: classifier
Module ID: classifier

## Baseline Stage Instruction
{baseline_instruction}

## High-Scoring Demonstrations
{few_shot_examples}

## Reference Examples
{reference_corpus}

## System Specification
(Task principles, rules, and policies from spec document)
{spec_context}

## Instructions
Return JSON only using the schema {"instruction": "...", "demo_indices": [...], "rationale": "..."}

Configuration Parameters

spec_path (Required)

Path to the spec JSON file (relative to config file or absolute).
spec_path = "examples/task_apps/banking77_pipeline/banking77_pipeline_spec.json"

spec_max_tokens (Default: 5000)

Maximum tokens for spec context in prompts. The serializer will:
  1. Start with high-priority rules (priority >= 7)
  2. Remove examples if still too long
  3. Remove glossary if still too long
  4. Increase priority threshold if still too long
spec_max_tokens = 5000  # Default

spec_include_examples (Default: true)

Whether to include rule examples in the spec context.
spec_include_examples = true  # Include good/bad examples

spec_priority_threshold (Optional)

Only include rules with priority >= threshold. Higher threshold = fewer but more important rules.
spec_priority_threshold = 8  # Only include priority 8+ rules
Priority Guidelines:
  • 10: Critical rules (must always be followed)
  • 9: High-priority rules (important for accuracy)
  • 8: Medium-high priority (recommended)
  • 7: Medium priority (helpful)
  • <7: Lower priority (may be filtered out)

Spec Format Details

Principles

High-level guidelines that apply across all rules:
{
  "id": "P-clarity",
  "text": "Prioritize immediate-action intents over informational queries",
  "rationale": "Customers with urgent issues need immediate assistance."
}

Rules

Specific policies with priorities and constraints:
{
  "id": "R-card-disambiguation",
  "title": "Disambiguate card arrival vs. card payment issues",
  "priority": 10,
  "rationale": "Queries mentioning 'card' can refer to physical delivery or payment problems.",
  "constraints": {
    "must": [
      "Classify as 'lost_or_stolen_card' if payment-related keywords are present"
    ],
    "must_not": [
      "Assume 'card' always refers to physical delivery"
    ],
    "should": [
      "Consider the query_analyzer's complexity assessment"
    ]
  },
  "examples": [
    {
      "kind": "good",
      "prompt": "My card was declined",
      "response": "declined_card_payment",
      "description": "Payment keyword indicates payment issue"
    },
    {
      "kind": "bad",
      "prompt": "My card isn't working",
      "response": "card_arrival",
      "description": "WRONG: Should be card_not_working"
    }
  ]
}

Constraints Types

  • must: Required behaviors (always enforced)
  • must_not: Prohibited behaviors (never allowed)
  • should: Recommended behaviors (preferred when possible)
  • should_not: Discouraged behaviors (avoid when possible)

Benefits of Using Specs

1. Domain Knowledge Injection

Specs encode expert knowledge about the task:
  • Edge cases and disambiguation rules
  • Domain-specific terminology
  • Priority-based policies

2. Constraint-Aware Optimization

Optimizers respect spec constraints:
  • GEPA: Mutations follow spec rules (must/must_not)
  • MIPRO: Instruction proposals align with spec principles

3. Faster Convergence

Spec-guided optimization typically:
  • Converges faster (fewer generations/iterations)
  • Produces more accurate prompts
  • Better handles edge cases

4. Consistency

Specs ensure:
  • Consistent terminology across prompts
  • Alignment with domain requirements
  • Compliance with business rules

Example: Banking77 Pipeline Spec

Location: examples/task_apps/banking77_pipeline/banking77_pipeline_spec.json Key Rules:
  • R-card-disambiguation (Priority 10): Distinguish card delivery vs. payment issues
  • R-urgency-signals (Priority 10): Handle urgent queries (lost cards, fraud)
  • R-balance-transfer (Priority 9): Disambiguate balance update scenarios
  • R-stage-coordination (Priority 8): Coordinate between analyzer and classifier stages
Usage in Config:
[prompt_learning.gepa]
proposer_type = "spec"
spec_path = "examples/task_apps/banking77_pipeline/banking77_pipeline_spec.json"
spec_max_tokens = 5000
spec_include_examples = true
spec_priority_threshold = 8

When to Use Specs

Use specs when:
  • ✅ You have domain expertise to encode
  • ✅ Task has complex edge cases or disambiguation rules
  • ✅ You want faster convergence
  • ✅ Consistency with business rules is critical
  • ✅ Multi-stage pipelines need coordination rules
Skip specs when:
  • ❌ Task is simple and straightforward
  • ❌ No domain-specific rules or constraints
  • ❌ You want maximum exploration (specs may constrain search)

Creating a Spec

Step 1: Define Principles

Start with high-level guidelines:
{
  "principles": [
    {
      "id": "P-clarity",
      "text": "Prioritize immediate-action intents over informational queries",
      "rationale": "Urgent issues need immediate assistance."
    }
  ]
}

Step 2: Add Rules

Define specific policies with priorities:
{
  "rules": [
    {
      "id": "R-card-disambiguation",
      "title": "Disambiguate card arrival vs. card payment issues",
      "priority": 10,
      "constraints": {
        "must": [
          "Classify as 'lost_or_stolen_card' if payment keywords present"
        ]
      },
      "examples": [
        {
          "kind": "good",
          "prompt": "My card was declined",
          "response": "declined_card_payment"
        }
      ]
    }
  ]
}

Step 3: Add Glossary

Define domain-specific terms:
{
  "glossary": [
    {
      "term": "disambiguate",
      "definition": "To distinguish between multiple plausible interpretations",
      "aliases": ["clarify", "distinguish"]
    }
  ]
}

Step 4: Reference in Config

Point to the spec file:
[prompt_learning.gepa]
proposer_type = "spec"
spec_path = "path/to/your/spec.json"
spec_max_tokens = 5000
spec_priority_threshold = 8

Best Practices

  1. Start with High-Priority Rules: Focus on critical constraints first (priority 8+)
  2. Include Examples: Good and bad examples help the optimizer understand intent
  3. Use Clear Constraints: Be specific with must/must_not directives
  4. Test Token Limits: Ensure spec_max_tokens fits in your model’s context window
  5. Filter by Priority: Use spec_priority_threshold to focus on important rules
  6. Update Regularly: Keep specs in sync with task requirements

Comparison: DSPy vs Spec Mode

AspectDSPy ModeSpec Mode
GuidanceGeneric prompt engineering principlesDomain-specific rules and constraints
ConvergenceSlower (broader exploration)Faster (focused search)
AccuracyGood for general tasksBetter for domain-specific tasks
SetupNo additional filesRequires spec JSON file
Best ForSimple tasks, explorationComplex tasks, edge cases

Next Steps