Skip to main content

synth_ai.sdk.inference.client

Inference client for model inference via Synth AI. This module provides a client for making inference requests through Synth AI’s inference proxy, which routes requests to appropriate model providers (OpenAI, Groq, etc.) based on the model identifier. Example:
>>> from synth_ai.sdk.inference import InferenceClient
>>>
>>> client = InferenceClient(
...     base_url="https://api.usesynth.ai",
...     api_key=os.environ["SYNTH_API_KEY"],
... )
>>>
>>> response = await client.create_chat_completion(
...     model="gpt-4o-mini",
...     messages=[
...         {"role": "user", "content": "Hello!"}
...     ],
...     temperature=0.7,
...     max_tokens=100,
... )
>>>
>>> print(response["choices"][0]["message"]["content"])

Classes

InferenceClient

Client for making inference requests through Synth AI’s inference proxy. This client provides a unified interface for calling LLMs through Synth AI’s backend, which handles routing to appropriate providers (OpenAI, Groq, etc.) based on the model identifier. Methods:

create_chat_completion

create_chat_completion(self, **kwargs: Any) -> dict[str, Any]
Create a chat completion request. This method sends a chat completion request to the Synth AI inference proxy, which routes it to the appropriate provider based on the model identifier. Args:
  • model: Model identifier (e.g., “gpt-4o-mini”, “Qwen/Qwen3-4B”)
  • messages: List of message dicts with “role” and “content” keys
  • **kwargs: Additional OpenAI-compatible parameters:
  • temperature: Sampling temperature (0.0-2.0)
  • max_tokens: Maximum tokens to generate
  • thinking_budget: Budget for thinking tokens (default: 256)
  • top_p: Nucleus sampling parameter
  • frequency_penalty: Frequency penalty (-2.0 to 2.0)
  • presence_penalty: Presence penalty (-2.0 to 2.0)
  • stop: Stop sequences
  • tools: Function calling tools
  • tool_choice: Tool choice strategy
  • stream: Whether to stream responses
  • … (other OpenAI API parameters)
Returns:
  • Completion response dict with:
  • id: Request ID
  • choices: List of completion choices
  • usage: Token usage statistics
  • … (other OpenAI-compatible fields)
Raises:
  • ValueError: If model is not supported or request is invalid
  • HTTPError: If the API request fails