Skip to main content

Graph Inference

After training a graph with ADAS or Graph GEPA, you can run production inference through the Synth API. The graph executes server-side using your optimized prompts and structure.

API Endpoint

POST /api/adas/graph/completions

Request

{
  "job_id": "adas_abc123",
  "input": {
    "question": "What is the capital of France?"
  }
}
FieldTypeRequiredDescription
job_idstringYesADAS job ID that produced the graph
inputobjectYesInput matching your dataset’s input schema
modelstringNoOverride the policy model for this call
prompt_snapshot_idstringNoUse a specific snapshot instead of best

Response

{
  "output": {
    "answer": "Paris"
  },
  "metadata": {
    "latency_ms": 342,
    "tokens_used": 156,
    "model": "gpt-4o-mini",
    "snapshot_id": "snap_xyz789"
  }
}

Python SDK

Using ADASJob

from synth_ai.sdk.api.train.adas import ADASJob

# After training completes
job = ADASJob.from_existing("adas_abc123", api_key=api_key)

# Run inference
result = job.run_inference({
    "question": "What is the capital of France?"
})
print(result["output"])  # {"answer": "Paris"}

Override Model

Run the same graph with a different model:
result = job.run_inference(
    {"question": "Complex reasoning question..."},
    model="gpt-4o"  # Use stronger model for this query
)

Use Specific Snapshot

Target a particular prompt version from training:
result = job.run_inference(
    {"question": "What is 2+2?"},
    prompt_snapshot_id="snap_gen3_best"
)

cURL Example

curl -X POST $SYNTH_BACKEND_URL/api/adas/graph/completions \
  -H "Authorization: Bearer $SYNTH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "job_id": "adas_abc123",
    "input": {"question": "What is the capital of France?"}
  }'

Batch Inference

For multiple inputs, call the endpoint in parallel:
import asyncio
from synth_ai.sdk.api.train.adas import ADASJob

job = ADASJob.from_existing("adas_abc123", api_key=api_key)

async def run_batch(inputs: list[dict]) -> list[dict]:
    tasks = [job.run_inference_async(inp) for inp in inputs]
    return await asyncio.gather(*tasks)

inputs = [
    {"question": "What is 2+2?"},
    {"question": "What is the capital of France?"},
    {"question": "Who wrote Hamlet?"},
]

results = asyncio.run(run_batch(inputs))

Verifier Graph Inference

Verifier graphs (custom judges) use a different endpoint that accepts V3 traces and rubrics.

API Endpoint

POST /api/adas/graph/judge

Request

{
  "job_id": "adas_verifier_xyz",
  "session_trace": {
    "session_id": "trace_to_evaluate",
    "session_time_steps": [
      {
        "step_id": "step_0",
        "step_index": 0,
        "events": [
          {"event_id": 1, "event_type": "runtime", "metadata": {"action": "search", "query": "capital france"}},
          {"event_id": 2, "event_type": "environment", "metadata": {"result": "Paris"}}
        ]
      }
    ],
    "metadata": {"task": "qa"}
  },
  "context": {
    "rubric": {
      "outcome": {
        "criteria": [
          {"name": "correctness", "description": "Is the answer correct?", "weight": 1.0}
        ]
      }
    }
  }
}
FieldTypeRequiredDescription
job_idstringYesVerifier graph job ID
session_traceobjectYesV3 SessionTrace to evaluate
contextobjectNoContext including rubric
context.rubricobjectNoEvaluation criteria

Response

{
  "score": 0.92,
  "reasoning": "The agent correctly identified Paris as the capital of France...",
  "event_rewards": [
    {"event_id": 1, "value": 0.9, "annotation": "Good search query"},
    {"event_id": 2, "value": 1.0, "annotation": "Correct result used"}
  ],
  "outcome": {
    "total_reward": 0.92,
    "feedback": "Task completed correctly"
  },
  "metadata": {
    "latency_ms": 523,
    "model": "gpt-4o-mini"
  }
}

Python SDK

from synth_ai.sdk.api.train.adas import ADASJob

# Load your trained verifier
verifier = ADASJob.from_existing("adas_verifier_xyz", api_key=api_key)

# Evaluate a trace
result = verifier.run_judge(
    session_trace={
        "session_id": "my_trace",
        "session_time_steps": [
            {
                "step_id": "step_0",
                "step_index": 0,
                "events": [
                    {"event_id": 1, "event_type": "runtime", "metadata": {"output": "Paris"}}
                ]
            }
        ],
        "metadata": {}
    },
    context={
        "rubric": {
            "outcome": {
                "criteria": [
                    {"name": "accuracy", "description": "Is the answer accurate?", "weight": 1.0}
                ]
            }
        }
    }
)

print(f"Score: {result['score']}")           # 0.92
print(f"Reasoning: {result['reasoning']}")    # Explanation
print(f"Events: {result['event_rewards']}")   # Per-event scores

Using Verifier Output for RL

Verifier graphs integrate with reinforcement learning training. The structured output maps directly to synth-ai reward tables:
# Get verifier judgement
judgement = verifier.run_judge(session_trace=trace, context={"rubric": rubric})

# Use as RL reward
from synth_ai.core.tracing_v3 import SessionTracer

tracer = SessionTracer()

# Attach event-level rewards
for event_reward in judgement["event_rewards"]:
    await tracer.record_event_reward(
        event_id=event_reward["event_id"],
        reward_value=event_reward["value"],
        annotation=event_reward.get("annotation")
    )

# Attach outcome reward
await tracer.record_outcome_reward(
    reward_value=judgement["score"],
    feedback=judgement.get("reasoning")
)

Input Schema

Your inference inputs must match the schema from your training dataset:
// Training dataset task
{
  "id": "t1",
  "input": {
    "question": "What is 2+2?",
    "context": "Basic arithmetic"
  }
}

// Inference request - same schema
{
  "job_id": "adas_abc123",
  "input": {
    "question": "What is 3+3?",
    "context": "Basic arithmetic"
  }
}

Error Handling

try:
    result = job.run_inference({"question": "..."})
except Exception as e:
    if "rate_limit" in str(e):
        # Back off and retry
        pass
    elif "invalid_input" in str(e):
        # Check input schema
        pass
    raise

Pricing

Graph inference is billed per execution:
  • Base cost: Per-graph execution fee
  • LLM costs: Pass-through from underlying model calls
  • Multi-node graphs: Each node’s LLM call is billed separately
See Pricing for current rates.