> ## Documentation Index
> Fetch the complete documentation index at: https://docs.usesynth.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Hosted inference

> Use Synth free aux and billed GLM inference from Stack without changing the default local worker.

<Note>
  Documents Stack **`0.2.0-dev.20260702.3`** inference behavior. The default primary
  worker remains Codex/BYOK unless you explicitly opt in to Synth inference.
</Note>

Stack exposes Synth inference as hosted, optional lanes:

| Lane       | Model                                 | Route                                         | Default roles                                                        |
| ---------- | ------------------------------------- | --------------------------------------------- | -------------------------------------------------------------------- |
| Free aux   | `nemotron-3-ultra`                    | `/api/v1/stack-aux/openai/v1/responses`       | monitor, gardener, remote gardener, aux                              |
| Billed GLM | `glm-5.2` / `baseten/zai-org/GLM-5.2` | `/api/v1/stack-inference/openai/v1/responses` | monitor, gardener, remote gardener; worker only with explicit opt-in |

## 1. Sign in and get a key

Create a Synth account and an API key at [usesynth.ai/keys](https://usesynth.ai/keys).

```bash theme={null}
export SYNTH_API_KEY=sk_...
```

## 2. List models

```bash theme={null}
curl https://api.usesynth.ai/api/v1/synth/models \
  -H "authorization: Bearer $SYNTH_API_KEY"
```

You should see free aux models such as `nemotron-3-ultra`. When the billed
gateway is deployed for your environment, you should also see
`baseten/zai-org/GLM-5.2` with aliases including `glm-5.2`.

From Stack:

```bash theme={null}
stack inference list
stack inference usage
```

## 3. Run free aux inference

The aux endpoint requires an `X-Stack-Actor-Role` header (`monitor`, `gardener`, or
`aux`). Use `aux` for general calls:

```bash theme={null}
curl https://api.usesynth.ai/api/v1/stack-aux/openai/v1/responses \
  -H "authorization: Bearer $SYNTH_API_KEY" \
  -H "x-stack-actor-role: aux" \
  -H "content-type: application/json" \
  -d '{"model":"nemotron-3-ultra","input":"Reply with one word: pong."}'
```

<Warning>
  Without the `x-stack-actor-role` header the request returns `403`. Primary coding
  roles (`worker`, `primary`, `codex`, `cursor`) are intentionally rejected — this
  endpoint is for auxiliary agents and aux inference.
</Warning>

## 4. Run billed GLM inference

The billed gateway uses the same Responses wire shape and accepts GLM aliases:

```bash theme={null}
curl https://api.usesynth.ai/api/v1/stack-inference/openai/v1/responses \
  -H "authorization: Bearer $SYNTH_API_KEY" \
  -H "x-stack-actor-role: monitor" \
  -H "content-type: application/json" \
  -d '{"model":"glm-5.2","input":"Reply with one word: pong."}'
```

Primary worker roles require an explicit opt-in header and an explicit actor
profile. This prevents a local Codex/BYOK worker from silently switching to
billed Synth inference.

## 5. Usage and cost

Free aux requests count against the aux budget:

```bash theme={null}
curl https://api.usesynth.ai/api/v1/stack-aux/usage \
  -H "authorization: Bearer $SYNTH_API_KEY" \
  -H "x-stack-actor-role: aux"
```

Billed GLM usage is visible through Stack:

```bash theme={null}
stack inference usage --json
```

Usage rows include token and cost summaries, not prompts or transcripts.

## In Stack

Monitor profiles are opt-in:

```bash theme={null}
STACK_AUX_INFERENCE=1 STACK_MONITOR_PROFILE=free-aux stack
STACK_SYNTH_INFERENCE=1 STACK_MONITOR_PROFILE=billed-glm stack
```

If the selected Synth route is unavailable, Stack falls back to the Codex
app-server monitor with a visible notice. The signed-out local loop
([Quickstart](/stack/quickstart)) keeps working without any hosted inference.
