Skip to main content
Most applications only need a small part of the package. The Inference and Embeddings facades provide a fluent, immutable interface that handles provider differences behind the scenes.

Inference

The Inference class is the main entry point for LLM interactions. It encapsulates provider complexities behind a unified, fluent interface. Namespace: Cognesy\Polyglot\Inference\Inference

Creating an Instance

Polyglot offers several ways to create an Inference instance depending on how much control you need:
use Cognesy\Polyglot\Inference\Inference;

// From a named preset (resolves config from YAML files)
$inference = Inference::using('openai');

// From an explicit config object
$inference = Inference::fromConfig($llmConfig);

// From a provider object (useful for overrides)
$inference = Inference::fromProvider($provider);

// From an already-built runtime
$inference = Inference::fromRuntime($runtime);
// @doctest id="4b63"
You can also pass a custom driver registry to using() or fromConfig() if you have registered custom drivers:
$inference = Inference::using('my-provider', drivers: $customRegistry);
// @doctest id="beff"

Building a Request

All request methods return a new immutable instance, so you can safely branch configurations:
$base = Inference::using('openai')
    ->withModel('gpt-4.1-nano')
    ->withMaxTokens(1024);

// Branch into two different requests from the same base
$response1 = $base->withMessages(Messages::fromString('Explain PHP traits.'))->get();
$response2 = $base->withMessages(Messages::fromString('Explain PHP enums.'))->get();
// @doctest id="eb5a"
Available request methods:
MethodPurpose
with(...)Set multiple parameters at once (messages, model, tools, toolChoice, responseFormat, options)
withMessages(...)Set the conversation messages
withModel(...)Override the model
withMaxTokens(...)Set maximum output tokens
withTools(...)Provide tool/function definitions
withToolChoice(...)Control tool selection behavior
withResponseFormat(...)Request structured output format
withOptions(...)Pass additional provider options (merged with existing)
withStreaming(...)Enable or disable streaming
withCachedContext(...)Set cached context (messages, tools, toolChoice, responseFormat)
withRetryPolicy(...)Configure retry behavior via InferenceRetryPolicy
withResponseCachePolicy(...)Control response caching via ResponseCachePolicy
withRequest(...)Set all parameters from an existing InferenceRequest
withRuntime(...)Swap the underlying runtime

Executing and Reading Results

Shortcuts execute the request and return results directly:
// Get the text content
$text = $inference->withMessages(Messages::fromString('Hello'))->get();

// Get the full response object
$response = $inference->withMessages(Messages::fromString('Hello'))->response();

// Parse JSON from the response content
$data = $inference->withMessages(Messages::fromString('Return JSON'))->asJsonData();

// Get JSON as a string
$json = $inference->withMessages(Messages::fromString('Return JSON'))->asJson();

// Parse tool call arguments as JSON array
$args = $inference->withMessages(Messages::fromString('Call a tool'))->asToolCallJsonData();

// Get tool call arguments as JSON string
$json = $inference->withMessages(Messages::fromString('Call a tool'))->asToolCallJson();

// Stream the response
$stream = $inference->withMessages(Messages::fromString('Hello'))->stream();
// @doctest id="33a8"
For lower-level control, create() returns a PendingInference without triggering execution:
$pending = $inference->withMessages(Messages::fromString('Hello'))->create();

// Then choose how to consume it
$text = $pending->get();
$response = $pending->response();
$stream = $pending->stream();
// @doctest id="f7b4"

Working with Responses

The InferenceResponse object provides access to all parts of the provider’s response:
$response = $inference->withMessages(Messages::fromString('Hello'))->response();

$response->content();          // string -- the text content
$response->reasoningContent(); // string -- reasoning/thinking content (if supported)
$response->toolCalls();        // ToolCalls -- tool call collection
$response->usage();            // InferenceUsage -- token counts
$response->finishReason();     // InferenceFinishReason enum
$response->responseData();     // HttpResponse -- raw provider response
$response->isPartial();        // bool -- true for partial streaming responses

// Convenience checks
$response->hasContent();
$response->hasReasoningContent();
$response->hasToolCalls();
$response->hasFinishReason();

// JSON extraction
$response->findJsonData();         // Json object from content
$response->findToolCallJsonData(); // Json object from tool call args
// @doctest id="280a"

Working with Streams

The InferenceStream provides several ways to consume streaming data:
$stream = $inference->withMessages(Messages::fromString('Hello'))->stream();

// Iterate over visible deltas
foreach ($stream->deltas() as $delta) {
    echo $delta->contentDelta;          // incremental text
    echo $delta->reasoningContentDelta; // incremental reasoning (if any)
    echo $delta->toolName;              // tool name (if starting a tool call)
    echo $delta->toolArgs;              // tool arguments fragment
}

// Get the final assembled response
$finalResponse = $stream->final();

// Register a callback for each delta
$stream->onDelta(function (PartialInferenceDelta $delta): void {
    echo $delta->contentDelta;
});

// Functional-style processing
$texts = $stream->map(fn($d) => $d->contentDelta);
$full = $stream->reduce(fn($carry, $d) => $carry . $d->contentDelta, '');
$toolOnly = $stream->filter(fn($d) => $d->toolName !== '');

// Collect all deltas at once
$allDeltas = $stream->all();
// @doctest id="d6b6"

Embeddings

The Embeddings class provides a unified interface for generating vector embeddings across providers. Namespace: Cognesy\Polyglot\Embeddings\Embeddings

Creating an Instance

use Cognesy\Polyglot\Embeddings\Embeddings;

// From a named preset
$embeddings = Embeddings::using('openai');

// From an explicit config
$embeddings = Embeddings::fromConfig($embeddingsConfig);

// From a provider object
$embeddings = Embeddings::fromProvider($provider);

// From a runtime
$embeddings = Embeddings::fromRuntime($runtime);
// @doctest id="3455"

Building a Request

$embeddings = Embeddings::using('openai')
    ->withInputs('The quick brown fox')
    ->withModel('text-embedding-3-small')
    ->withOptions(['dimensions' => 256]);
// @doctest id="9223"
Available request methods:
MethodPurpose
with(...)Set input, options, and model at once
withInputs(...)Set input text(s) to embed (string or array of strings)
withModel(...)Override the model
withOptions(...)Pass additional provider options
withRetryPolicy(...)Configure retry behavior via EmbeddingsRetryPolicy
withRequest(...)Set all parameters from an existing EmbeddingsRequest
withRuntime(...)Swap the underlying runtime

Executing and Reading Results

// Get the full response
$response = $embeddings->withInputs('Hello world')->get();

// Get just the vector objects
$vectors = $embeddings->withInputs(['text one', 'text two'])->vectors();

// Get the first vector
$vector = $embeddings->withInputs('Hello world')->first();
// @doctest id="5eb1"
For lower-level control, create() returns a PendingEmbeddings:
$pending = $embeddings->withInputs('Hello world')->create();
$response = $pending->get();
// @doctest id="c34c"

Working with Responses

The EmbeddingsResponse object provides access to the embedding vectors:
$response = $embeddings->withInputs(['Hello', 'World'])->get();

$response->vectors();       // Vector[] -- all embedding vectors
$response->all();           // Vector[] -- alias for vectors()
$response->first();         // ?Vector -- first vector
$response->last();          // ?Vector -- last vector
$response->usage();         // EmbeddingsUsage -- token counts
$response->toValuesArray(); // array -- raw float arrays

// Split vectors at a given index
[$before, $after] = $response->split(1);
// @doctest id="ef8d"

Registering Custom Drivers

Inference Drivers

Custom inference drivers are registered through the InferenceDriverRegistry and passed to the runtime:
use Cognesy\Polyglot\Inference\Creation\BundledInferenceDrivers;

$registry = BundledInferenceDrivers::registry()
    ->withDriver('my-provider', MyCustomDriver::class);

$inference = Inference::using('my-provider', drivers: $registry);
// @doctest id="52fc"

Embeddings Drivers

Custom embeddings drivers are registered through the EmbeddingsDriverRegistry and passed to the runtime:
use Cognesy\Polyglot\Embeddings\Creation\BundledEmbeddingsDrivers;

$registry = BundledEmbeddingsDrivers::registry()
    ->withDriver('my-provider', MyCustomDriver::class);

$runtime = EmbeddingsRuntime::fromConfig($config, drivers: $registry);
$embeddings = Embeddings::fromRuntime($runtime);
// @doctest id="b1e6"
See the Providers page for details on driver registration and factory patterns.