Skip to main content

TL;DR

  • Organization-Scoped Credentials: Sealed-box encrypted environment API keys
  • First-Party Task App Integration: Managed Task Apps with authenticated rollouts
  • Single-Node Multi-GPU Online RL: Out-of-the-box GPU split for inference and training
  • Production Run Flow: Complete workflow from job start to checkpoint inference

Organization-Scoped Environment Credentials

Upload your environment API key once (sealed-box encrypted). The platform decrypts and injects it at run time; plaintext is never transmitted or stored.
  • Secure Storage: Sealed-box encryption for API keys
  • Runtime Injection: Keys injected at runtime, never stored in plaintext
  • Organization Scope: Keys scoped to organizations for better security
  • One-Time Setup: Upload once, use across all jobs

First-Party Task App Integration

Run environments behind a managed Task App with authenticated rollouts. Online RL calls your Task App endpoints directly during training.
  • Managed Task Apps: Task Apps managed by the platform
  • Authenticated Rollouts: Secure authentication for rollout requests
  • Direct Integration: Online RL calls Task App endpoints directly
  • Seamless Workflow: No manual configuration required

Single-Node, Multi-GPU Online RL

Out-of-the-box split between vLLM inference GPUs and training GPUs on a single node (e.g., 6 inference / 2 training on H100).
  • Automatic Split: Automatic GPU allocation for inference and training
  • Single Node: Works on a single node with multiple GPUs
  • Flexible Configuration: Configurable tensor parallelism for inference
  • Reference Model Support: Supports reference model (for KL) stacked on inference or in its own GPU

Multi-Node Training

Multi-node training finished in dev - reach out if interested.

Production Run Flow

Start an Online RL job against your deployed Task App, monitor progress/events, and run inference using the produced checkpoint when training completes.
  • Complete Workflow: End-to-end workflow from job start to inference
  • Progress Monitoring: Real-time progress and event monitoring
  • Checkpoint Inference: Use produced checkpoints for inference
  • Production Ready: Full production workflow support

Use Cases

  • Secure Credentials: Store environment API keys securely with encryption
  • Managed Environments: Use managed Task Apps for easier deployment
  • Efficient RL: Optimize GPU usage with automatic split between inference and training
  • Production RL: Run Online RL jobs in production with complete monitoring