Skip to main content
The Inference class is the main facade for interacting with LLM APIs. It provides a clean, immutable interface for chat completions, tool calling, JSON output generation, and streaming — all through a consistent API regardless of the underlying provider.

Quick Start

The simplest way to generate text is with a single chained call:
<?php
use Cognesy\Polyglot\Inference\Inference;
use Cognesy\Messages\Messages;

$answer = Inference::using('openai')
    ->withMessages(Messages::fromString('What is the capital of France?'))
    ->get();
// @doctest id="ce0a"
The using() static method resolves a named preset from your configuration, while withMessages() accepts a Messages object. Use Messages::fromString() to wrap a plain text prompt, or Messages::fromArray() to convert an array of role/content pairs. The get() method executes the request and returns the response content as a string.

Creating an Inference Instance

For more control over the lifecycle, create an instance directly. Without arguments, Inference uses a sensible default configuration:
<?php
use Cognesy\Polyglot\Inference\Inference;
use Cognesy\Messages\Messages;

$inference = new Inference();

$answer = $inference
    ->withMessages(Messages::fromArray([['role' => 'user', 'content' => 'Explain event sourcing briefly.']]))
    ->get();
// @doctest id="8016"
You may also use the with() method, which accepts all request parameters at once:
<?php
use Cognesy\Polyglot\Inference\Inference;
use Cognesy\Messages\Messages;

$answer = (new Inference)->with(
    messages: Messages::fromString('What is the capital of France?'),
)->get();
// @doctest id="4a10"

Core Request Fields

The with() method and its individual with...() counterparts allow you to set every aspect of the inference request:
FieldMethodDescription
messageswithMessages()The conversation messages
modelwithModel()Override the model defined in the preset
toolswithTools()Tool/function definitions (ToolDefinitions) for the model to call
toolChoicewithToolChoice()Control which tool the model should use (ToolChoice)
responseFormatwithResponseFormat()Request structured output (ResponseFormat)
optionswithOptions()Provider-specific parameters (temperature, max_tokens, etc.)
maxTokenswithMaxTokens()Shorthand for setting the maximum output token count

Execution Paths

Once you have configured a request, choose how to execute it:
MethodReturnsUse case
get()stringQuick text extraction
response()InferenceResponseFull response with metadata, usage stats, and tool calls
asJson()stringExtract JSON from the response content
asJsonData()arrayDecode JSON from the response into a PHP array
asToolCallJson()stringExtract tool call arguments as a JSON string
asToolCallJsonData()arrayDecode tool call arguments into a PHP array
stream()InferenceStreamStream partial deltas as they arrive

Multi-Turn Conversations

For multi-turn conversations, pass an array of messages with role annotations:
<?php
use Cognesy\Polyglot\Inference\Inference;
use Cognesy\Messages\Messages;

$messages = Messages::fromArray([
    ['role' => 'user', 'content' => 'Can you help me with a math problem?'],
    ['role' => 'assistant', 'content' => 'Of course! What would you like to solve?'],
    ['role' => 'user', 'content' => 'What is the square root of 144?'],
]);

$answer = Inference::using('openai')
    ->withMessages($messages)
    ->get();
// @doctest id="69fb"

Customizing Request Options

Provider-specific parameters such as temperature, max_tokens, or top_p are passed through the options array. Most providers follow the OpenAI-compatible parameter conventions:
<?php
use Cognesy\Polyglot\Inference\Inference;
use Cognesy\Messages\Messages;

$answer = Inference::using('openai')
    ->withMessages(Messages::fromString('Write a short poem about coding.'))
    ->withModel('gpt-4o')
    ->withOptions(['temperature' => 0.7, 'max_tokens' => 200])
    ->get();
// @doctest id="cae8"
You can also set all parameters at once via the with() convenience method:
<?php
use Cognesy\Polyglot\Inference\Inference;
use Cognesy\Messages\Messages;

$answer = Inference::using('openai')->with(
    messages: Messages::fromString('Write a haiku about PHP.'),
    model: 'gpt-4o',
    options: ['temperature' => 0.9, 'max_tokens' => 100],
)->get();
// @doctest id="be86"

Streaming Responses

Streaming lets you display partial output as it arrives from the model, creating a more responsive user experience. Call stream() to get an InferenceStream, then iterate over deltas:
<?php
use Cognesy\Polyglot\Inference\Inference;
use Cognesy\Messages\Messages;

$stream = Inference::using('openai')
    ->withMessages(Messages::fromString('Describe the capital of Brasil.'))
    ->withMaxTokens(512)
    ->stream();

foreach ($stream->deltas() as $delta) {
    echo $delta->contentDelta;
}
// @doctest id="25fd"
Each PartialInferenceDelta exposes the contentDelta string for the incremental text fragment. The stream also provides functional-style helpers — map(), filter(), and reduce() — for processing deltas inline. You can also register a callback to handle each delta as it arrives:
<?php
use Cognesy\Polyglot\Inference\Inference;
use Cognesy\Messages\Messages;

$stream = Inference::using('openai')
    ->withMessages(Messages::fromString('Tell me a story.'))
    ->stream();

$stream->onDelta(fn($delta) => print($delta->contentDelta));

// Drain the stream to trigger callbacks
$stream->all();
// @doctest id="cffc"
After the stream completes, call final() to retrieve the assembled InferenceResponse with full content and usage statistics.

Working with the Full Response

When you need more than just text, use response() to access the complete InferenceResponse object:
<?php
use Cognesy\Polyglot\Inference\Inference;
use Cognesy\Messages\Messages;

$response = Inference::using('openai')
    ->withMessages(Messages::fromString('What is quantum computing?'))
    ->response();

$text = $response->content();
$usage = $response->usage();
$finishReason = $response->finishReason();
// @doctest id="3645"
The response object provides access to content, reasoning content (for models that support chain-of-thought), tool calls, token usage statistics, and the raw HTTP response data.

Switching Between Providers

Polyglot ships with YAML-based presets for many providers. Switching between them is a single method call:
<?php
use Cognesy\Polyglot\Inference\Inference;
use Cognesy\Messages\Messages;

$question = Messages::fromString('What is the capital of France?');

$openai = Inference::using('openai')->withMessages($question)->get();
$anthropic = Inference::using('anthropic')->withMessages($question)->get();
$gemini = Inference::using('gemini')->withMessages($question)->get();
// @doctest id="b44a"
Available presets include openai, anthropic, gemini, mistral, groq, ollama, fireworks, together, openrouter, cohere, deepseek, xai, azure, perplexity, sambanova, and others. Each preset is defined in a YAML file under resources/config/llm/presets/.

Configuring Presets

Each preset is a YAML file that defines the connection parameters for a provider. For example, the OpenAI preset:
driver: openai
apiUrl: 'https://api.openai.com/v1'
apiKey: '${OPENAI_API_KEY}'
endpoint: /chat/completions
model: gpt-4.1-nano
maxTokens: 1024
contextLength: 1000000
maxOutputLength: 16384
# @doctest id="1a8c"
Polyglot resolves presets from several locations, searched in order:
  1. config/llm/presets/ (your project root)
  2. packages/polyglot/resources/config/llm/presets/ (monorepo)
  3. vendor/cognesy/instructor-php/packages/polyglot/resources/config/llm/presets/
  4. vendor/cognesy/instructor-polyglot/resources/config/llm/presets/
To customize a provider, copy the relevant YAML file into config/llm/presets/ at your project root and modify it as needed. Environment variables are referenced with the ${VAR_NAME} syntax.

Selecting a Model

Each preset defines a default model, but you can override it per-request:
<?php
use Cognesy\Polyglot\Inference\Inference;
use Cognesy\Messages\Messages;

$answer = Inference::using('openai')
    ->withMessages(Messages::fromString('Explain machine learning in one sentence.'))
    ->withModel('gpt-4o')
    ->get();
// @doctest id="9256"

Immutability

Inference is immutable from the caller’s perspective. Every with...() method returns a new instance, leaving the original unchanged. This makes it safe to build a base configuration and derive specialized variants from it:
<?php
use Cognesy\Polyglot\Inference\Inference;

$base = Inference::using('openai')->withOptions(['temperature' => 0.3]);

$precise = $base->withModel('gpt-4o');
$fast = $base->withModel('gpt-4.1-mini');
// @doctest id="9bb9"
Both $precise and $fast inherit the temperature setting without affecting each other or the $base instance.