Inference class is the main facade for interacting with LLM APIs. It provides
a clean, immutable interface for chat completions, tool calling, JSON output generation,
and streaming — all through a consistent API regardless of the underlying provider.
Quick Start
The simplest way to generate text is with a single chained call:using() static method resolves a named preset from your configuration, while
withMessages() accepts a Messages object. Use Messages::fromString() to wrap a plain text
prompt, or Messages::fromArray() to convert an array of role/content pairs.
The get() method executes the request and returns the response content as a string.
Creating an Inference Instance
For more control over the lifecycle, create an instance directly. Without arguments, Inference uses a sensible default configuration:with() method, which accepts all request parameters at once:
Core Request Fields
Thewith() method and its individual with...() counterparts allow you to set every
aspect of the inference request:
| Field | Method | Description |
|---|---|---|
messages | withMessages() | The conversation messages |
model | withModel() | Override the model defined in the preset |
tools | withTools() | Tool/function definitions (ToolDefinitions) for the model to call |
toolChoice | withToolChoice() | Control which tool the model should use (ToolChoice) |
responseFormat | withResponseFormat() | Request structured output (ResponseFormat) |
options | withOptions() | Provider-specific parameters (temperature, max_tokens, etc.) |
maxTokens | withMaxTokens() | Shorthand for setting the maximum output token count |
Execution Paths
Once you have configured a request, choose how to execute it:| Method | Returns | Use case |
|---|---|---|
get() | string | Quick text extraction |
response() | InferenceResponse | Full response with metadata, usage stats, and tool calls |
asJson() | string | Extract JSON from the response content |
asJsonData() | array | Decode JSON from the response into a PHP array |
asToolCallJson() | string | Extract tool call arguments as a JSON string |
asToolCallJsonData() | array | Decode tool call arguments into a PHP array |
stream() | InferenceStream | Stream partial deltas as they arrive |
Multi-Turn Conversations
For multi-turn conversations, pass an array of messages with role annotations:Customizing Request Options
Provider-specific parameters such astemperature, max_tokens, or top_p are
passed through the options array. Most providers follow the OpenAI-compatible
parameter conventions:
with() convenience method:
Streaming Responses
Streaming lets you display partial output as it arrives from the model, creating a more responsive user experience. Callstream() to get an InferenceStream,
then iterate over deltas:
PartialInferenceDelta exposes the contentDelta string for the incremental
text fragment. The stream also provides functional-style helpers — map(), filter(),
and reduce() — for processing deltas inline.
You can also register a callback to handle each delta as it arrives:
final() to retrieve the assembled
InferenceResponse with full content and usage statistics.
Working with the Full Response
When you need more than just text, useresponse() to access the complete
InferenceResponse object:
Switching Between Providers
Polyglot ships with YAML-based presets for many providers. Switching between them is a single method call:openai, anthropic, gemini, mistral, groq,
ollama, fireworks, together, openrouter, cohere, deepseek, xai,
azure, perplexity, sambanova, and others. Each preset is defined in a
YAML file under resources/config/llm/presets/.
Configuring Presets
Each preset is a YAML file that defines the connection parameters for a provider. For example, the OpenAI preset:config/llm/presets/(your project root)packages/polyglot/resources/config/llm/presets/(monorepo)vendor/cognesy/instructor-php/packages/polyglot/resources/config/llm/presets/vendor/cognesy/instructor-polyglot/resources/config/llm/presets/
config/llm/presets/
at your project root and modify it as needed. Environment variables are referenced
with the ${VAR_NAME} syntax.
Selecting a Model
Each preset defines a default model, but you can override it per-request:Immutability
Inference is immutable from the caller’s perspective. Every with...() method
returns a new instance, leaving the original unchanged. This makes it safe to
build a base configuration and derive specialized variants from it:
$precise and $fast inherit the temperature setting without affecting
each other or the $base instance.