Changelog #0233 – Week of September 16, 2025

TL;DR

Organization-Scoped Credentials: Sealed-box encrypted environment API keys
First-Party Task App Integration: Managed Task Apps with authenticated rollouts
Single-Node Multi-GPU Online RL: Out-of-the-box GPU split for inference and training
Production Run Flow: Complete workflow from job start to checkpoint inference

Organization-Scoped Environment Credentials

Upload your environment API key once (sealed-box encrypted). The platform decrypts and injects it at run time; plaintext is never transmitted or stored.

Secure Storage: Sealed-box encryption for API keys
Runtime Injection: Keys injected at runtime, never stored in plaintext
Organization Scope: Keys scoped to organizations for better security
One-Time Setup: Upload once, use across all jobs

First-Party Task App Integration

Run environments behind a managed Task App with authenticated rollouts. Online RL calls your Task App endpoints directly during training.

Managed Task Apps: Task Apps managed by the platform
Authenticated Rollouts: Secure authentication for rollout requests
Direct Integration: Online RL calls Task App endpoints directly
Seamless Workflow: No manual configuration required

Single-Node, Multi-GPU Online RL

Out-of-the-box split between vLLM inference GPUs and training GPUs on a single node (e.g., 6 inference / 2 training on H100).

Automatic Split: Automatic GPU allocation for inference and training
Single Node: Works on a single node with multiple GPUs
Flexible Configuration: Configurable tensor parallelism for inference
Reference Model Support: Supports reference model (for KL) stacked on inference or in its own GPU

Multi-Node Training

Multi-node training finished in dev - reach out if interested.

Production Run Flow

Start an Online RL job against your deployed Task App, monitor progress/events, and run inference using the produced checkpoint when training completes.

Complete Workflow: End-to-end workflow from job start to inference
Progress Monitoring: Real-time progress and event monitoring
Checkpoint Inference: Use produced checkpoints for inference
Production Ready: Full production workflow support

Use Cases

Secure Credentials: Store environment API keys securely with encryption
Managed Environments: Use managed Task Apps for easier deployment
Efficient RL: Optimize GPU usage with automatic split between inference and training
Production RL: Run Online RL jobs in production with complete monitoring

Changelog

Benchmarks

​TL;DR

​Organization-Scoped Environment Credentials

​First-Party Task App Integration

​Single-Node, Multi-GPU Online RL

​Multi-Node Training

​Production Run Flow

​Use Cases