Documents Stack 0.2.0-dev.20260702.3 inference behavior. The default primary
worker remains Codex/BYOK unless you explicitly opt in to Synth inference.
Stack exposes Synth inference as hosted, optional lanes:
| Lane | Model | Route | Default roles |
|---|
| Free aux | nemotron-3-ultra | /api/v1/stack-aux/openai/v1/responses | monitor, gardener, remote gardener, aux |
| Billed GLM | glm-5.2 / baseten/zai-org/GLM-5.2 | /api/v1/stack-inference/openai/v1/responses | monitor, gardener, remote gardener; worker only with explicit opt-in |
1. Sign in and get a key
Create a Synth account and an API key at usesynth.ai/keys.
export SYNTH_API_KEY=sk_...
2. List models
curl https://api.usesynth.ai/api/v1/synth/models \
-H "authorization: Bearer $SYNTH_API_KEY"
You should see free aux models such as nemotron-3-ultra. When the billed
gateway is deployed for your environment, you should also see
baseten/zai-org/GLM-5.2 with aliases including glm-5.2.
From Stack:
stack inference list
stack inference usage
3. Run free aux inference
The aux endpoint requires an X-Stack-Actor-Role header (monitor, gardener, or
aux). Use aux for general calls:
curl https://api.usesynth.ai/api/v1/stack-aux/openai/v1/responses \
-H "authorization: Bearer $SYNTH_API_KEY" \
-H "x-stack-actor-role: aux" \
-H "content-type: application/json" \
-d '{"model":"nemotron-3-ultra","input":"Reply with one word: pong."}'
Without the x-stack-actor-role header the request returns 403. Primary coding
roles (worker, primary, codex, cursor) are intentionally rejected — this
endpoint is for auxiliary agents and aux inference.
4. Run billed GLM inference
The billed gateway uses the same Responses wire shape and accepts GLM aliases:
curl https://api.usesynth.ai/api/v1/stack-inference/openai/v1/responses \
-H "authorization: Bearer $SYNTH_API_KEY" \
-H "x-stack-actor-role: monitor" \
-H "content-type: application/json" \
-d '{"model":"glm-5.2","input":"Reply with one word: pong."}'
Primary worker roles require an explicit opt-in header and an explicit actor
profile. This prevents a local Codex/BYOK worker from silently switching to
billed Synth inference.
5. Usage and cost
Free aux requests count against the aux budget:
curl https://api.usesynth.ai/api/v1/stack-aux/usage \
-H "authorization: Bearer $SYNTH_API_KEY" \
-H "x-stack-actor-role: aux"
Billed GLM usage is visible through Stack:
stack inference usage --json
Usage rows include token and cost summaries, not prompts or transcripts.
In Stack
Monitor profiles are opt-in:
STACK_AUX_INFERENCE=1 STACK_MONITOR_PROFILE=free-aux stack
STACK_SYNTH_INFERENCE=1 STACK_MONITOR_PROFILE=billed-glm stack
If the selected Synth route is unavailable, Stack falls back to the Codex
app-server monitor with a visible notice. The signed-out local loop
(Quickstart) keeps working without any hosted inference.