Skip to main content
Documents Stack 0.2.0-dev.20260702.3 inference behavior. The default primary worker remains Codex/BYOK unless you explicitly opt in to Synth inference.
Stack exposes Synth inference as hosted, optional lanes:
LaneModelRouteDefault roles
Free auxnemotron-3-ultra/api/v1/stack-aux/openai/v1/responsesmonitor, gardener, remote gardener, aux
Billed GLMglm-5.2 / baseten/zai-org/GLM-5.2/api/v1/stack-inference/openai/v1/responsesmonitor, gardener, remote gardener; worker only with explicit opt-in

1. Sign in and get a key

Create a Synth account and an API key at usesynth.ai/keys.
export SYNTH_API_KEY=sk_...

2. List models

curl https://api.usesynth.ai/api/v1/synth/models \
  -H "authorization: Bearer $SYNTH_API_KEY"
You should see free aux models such as nemotron-3-ultra. When the billed gateway is deployed for your environment, you should also see baseten/zai-org/GLM-5.2 with aliases including glm-5.2. From Stack:
stack inference list
stack inference usage

3. Run free aux inference

The aux endpoint requires an X-Stack-Actor-Role header (monitor, gardener, or aux). Use aux for general calls:
curl https://api.usesynth.ai/api/v1/stack-aux/openai/v1/responses \
  -H "authorization: Bearer $SYNTH_API_KEY" \
  -H "x-stack-actor-role: aux" \
  -H "content-type: application/json" \
  -d '{"model":"nemotron-3-ultra","input":"Reply with one word: pong."}'
Without the x-stack-actor-role header the request returns 403. Primary coding roles (worker, primary, codex, cursor) are intentionally rejected — this endpoint is for auxiliary agents and aux inference.

4. Run billed GLM inference

The billed gateway uses the same Responses wire shape and accepts GLM aliases:
curl https://api.usesynth.ai/api/v1/stack-inference/openai/v1/responses \
  -H "authorization: Bearer $SYNTH_API_KEY" \
  -H "x-stack-actor-role: monitor" \
  -H "content-type: application/json" \
  -d '{"model":"glm-5.2","input":"Reply with one word: pong."}'
Primary worker roles require an explicit opt-in header and an explicit actor profile. This prevents a local Codex/BYOK worker from silently switching to billed Synth inference.

5. Usage and cost

Free aux requests count against the aux budget:
curl https://api.usesynth.ai/api/v1/stack-aux/usage \
  -H "authorization: Bearer $SYNTH_API_KEY" \
  -H "x-stack-actor-role: aux"
Billed GLM usage is visible through Stack:
stack inference usage --json
Usage rows include token and cost summaries, not prompts or transcripts.

In Stack

Monitor profiles are opt-in:
STACK_AUX_INFERENCE=1 STACK_MONITOR_PROFILE=free-aux stack
STACK_SYNTH_INFERENCE=1 STACK_MONITOR_PROFILE=billed-glm stack
If the selected Synth route is unavailable, Stack falls back to the Codex app-server monitor with a visible notice. The signed-out local loop (Quickstart) keeps working without any hosted inference.