Skip to main content
Polyglot’s PendingInference class represents a pending inference execution. It is returned by the Inference class when you call the create() method. The request is not sent to the underlying LLM until you actually access the response data, making the object a lazy handle over a single inference operation.

Retrieving Text Content

The simplest way to get the model’s response is the get() method, which returns the response content as a plain string:
<?php

use Cognesy\Polyglot\Inference\Inference;
use Cognesy\Messages\Messages;

$pending = Inference::using('openai')
    ->withMessages(Messages::fromString('What is the capital of France?'))
    ->create();

// Get the response as plain text
$text = $pending->get();
echo $text; // "The capital of France is Paris."
// @doctest id="b2e6"

Retrieving JSON Data

When you request a JSON response format, use asJsonData() to decode the content directly into an associative array, or asJson() to get the raw JSON string:
<?php

use Cognesy\Polyglot\Inference\Inference;
use Cognesy\Messages\Messages;
use Cognesy\Polyglot\Inference\Data\ResponseFormat;

$pending = Inference::using('openai')
    ->withMessages(Messages::fromString('Return JSON with a single "status" field.'))
    ->withResponseFormat(ResponseFormat::jsonObject())
    ->create();

// Decode response content as a PHP array
$data = $pending->asJsonData();
echo $data['status'];

// Or get the raw JSON string
$json = $pending->asJson();
// @doctest id="ed92"

Working with InferenceResponse

For full access to every detail of the model’s reply, call response() to get the normalized InferenceResponse object:
<?php

use Cognesy\Polyglot\Inference\Inference;
use Cognesy\Messages\Messages;

$pending = Inference::using('openai')
    ->withMessages(Messages::fromString('What is the capital of France?'))
    ->create();

$response = $pending->response();

// Content and reasoning
echo "Content: " . $response->content() . "\n";
echo "Reasoning: " . $response->reasoningContent() . "\n";

// Finish reason (returns InferenceFinishReason enum)
echo "Finish reason: " . $response->finishReason()->value . "\n";

// Token usage
$usage = $response->usage();
echo "Input tokens: " . $usage->input() . "\n";
echo "Output tokens: " . $usage->output() . "\n";
echo "Total tokens: " . $usage->total() . "\n";
echo "Cache tokens: " . $usage->cache() . "\n";

// Raw HTTP response data
$httpResponse = $response->responseData();
// @doctest id="286d"

Available InferenceResponse Methods

MethodReturnsDescription
content()stringThe model’s text output
reasoningContent()stringChain-of-thought / thinking content (if supported)
toolCalls()ToolCallsCollection of tool calls made by the model
usage()InferenceUsageToken counts for the request
finishReason()InferenceFinishReasonWhy the model stopped generating
responseData()HttpResponseThe underlying raw HTTP response
hasContent()boolWhether the response contains text content
hasToolCalls()boolWhether the model made any tool calls
hasReasoningContent()boolWhether reasoning / thinking content is present
isPartial()boolWhether this is a partial (streaming) response

Finish Reasons

The finishReason() method returns an InferenceFinishReason enum. Polyglot normalizes the many vendor-specific strings into a consistent set of values:
ValueMeaning
StopThe model finished naturally
LengthOutput was truncated due to token limits
ToolCallsThe model wants to invoke a tool
ContentFilterContent was blocked by safety filters
ErrorAn error occurred during generation
OtherAn unrecognized finish reason

Token Usage

The InferenceUsage object provides detailed token breakdowns including cache and reasoning tokens:
<?php

$usage = $response->usage();

$usage->inputTokens;       // Input / prompt tokens
$usage->outputTokens;      // Output / completion tokens
$usage->cacheWriteTokens;  // Tokens written to cache
$usage->cacheReadTokens;   // Tokens read from cache
$usage->reasoningTokens;   // Reasoning / thinking tokens

// Convenience accessors
$usage->input();   // Same as inputTokens
$usage->output();  // outputTokens + reasoningTokens
$usage->cache();   // cacheWriteTokens + cacheReadTokens
$usage->total();   // Sum of all token counts
// @doctest id="ba84"

Handling Tool Calls

When the model decides to invoke a tool, you can extract the tool call data using asToolCallJsonData() on PendingInference, or inspect the ToolCalls collection on the response object:
<?php

use Cognesy\Polyglot\Inference\Inference;
use Cognesy\Messages\Messages;
use Cognesy\Polyglot\Inference\Data\ToolDefinitions;
use Cognesy\Polyglot\Inference\Data\ToolChoice;

$tools = ToolDefinitions::fromArray([
    [
        'type' => 'function',
        'function' => [
            'name' => 'get_weather',
            'description' => 'Get the current weather in a location',
            'parameters' => [
                'type' => 'object',
                'properties' => [
                    'location' => [
                        'type' => 'string',
                        'description' => 'The city and state, e.g. San Francisco, CA',
                    ],
                    'unit' => [
                        'type' => 'string',
                        'enum' => ['celsius', 'fahrenheit'],
                    ],
                ],
                'required' => ['location'],
            ],
        ],
    ],
]);

$response = Inference::using('openai')
    ->with(
        messages: Messages::fromString('What is the weather in Paris?'),
        tools: $tools,
        toolChoice: ToolChoice::auto(),
    )
    ->response();

if ($response->hasToolCalls()) {
    $toolCalls = $response->toolCalls();

    foreach ($toolCalls->all() as $call) {
        echo "Tool: " . $call->name() . "\n";
        echo "Args: " . $call->argsAsJson() . "\n";

        // Access individual argument values
        $location = $call->value('location');
        $unit = $call->value('unit', 'celsius');
    }
}
// @doctest id="c24b"

Quick JSON Extraction from Tool Calls

If you just need the arguments as a PHP array without inspecting the full response, use the shorthand on PendingInference:
<?php

// Single tool call: returns the arguments array directly
$args = $pending->asToolCallJsonData();

// Or as a JSON string
$json = $pending->asToolCallJson();
// @doctest id="b0d8"
Note: When a single tool call is present, asToolCallJsonData() returns that call’s arguments as an array. When multiple tool calls are present, it returns an array of all tool call data.

Streaming Responses

For long-running completions, streaming lets you display output as it arrives. Call stream() to get an InferenceStream and consume deltas:
<?php

use Cognesy\Polyglot\Inference\Inference;
use Cognesy\Messages\Messages;

$stream = Inference::using('openai')
    ->withMessages(Messages::fromString('Write a short story about a robot.'))
    ->stream();

foreach ($stream->deltas() as $delta) {
    echo $delta->contentDelta;
}

// After iteration, get the finalized response
$finalResponse = $stream->final();
echo "\n\nTokens used: " . $finalResponse->usage()->total();
// @doctest id="d9e4"

The PartialInferenceDelta Object

Each delta yielded during streaming is a PartialInferenceDelta with the following public properties:
PropertyTypeDescription
contentDeltastringNew text content in this chunk
reasoningContentDeltastringNew reasoning content in this chunk
toolIdToolCallId|string|nullTool call ID
toolNamestringTool name (when streaming tool calls)
toolArgsstringPartial tool arguments JSON
finishReasonstringSet on the final delta
usage?InferenceUsageToken usage (typically on the final delta)
usageIsCumulativeboolWhether usage counts are cumulative

Stream Methods

The InferenceStream class provides several ways to consume and transform the delta stream:
<?php

// Iterate over visible deltas
foreach ($stream->deltas() as $delta) { /* ... */ }

// Transform each delta
foreach ($stream->map(fn($d) => strtoupper($d->contentDelta)) as $text) {
    echo $text;
}

// Reduce to a single value
$fullText = $stream->reduce(
    fn(string $carry, $delta) => $carry . $delta->contentDelta,
    ''
);

// Filter deltas
foreach ($stream->filter(fn($d) => $d->contentDelta !== '') as $delta) {
    echo $delta->contentDelta;
}

// Collect all deltas into an array
$allDeltas = $stream->all();

// Get the finalized response (drains the stream if needed)
$response = $stream->final();
// @doctest id="f4c6"

Using the onDelta Callback

Instead of iterating manually, you can register a callback that fires for each visible delta:
<?php

use Cognesy\Polyglot\Inference\Inference;
use Cognesy\Messages\Messages;

$stream = Inference::using('openai')
    ->withMessages(Messages::fromString('Explain queues in simple terms.'))
    ->stream();

$stream->onDelta(function ($delta) {
    echo $delta->contentDelta;

    // Flush output for real-time display
    if (ob_get_level() > 0) {
        ob_flush();
        flush();
    }
});

// Drain the stream to trigger all callbacks
$response = $stream->final();
// @doctest id="110e"

Stream Lifecycle

The stream is one-shot: once deltas() has been fully iterated, calling it again throws a LogicException. If you need to replay the response, work with the finalized InferenceResponse returned by $stream->final(). Calling final() before the stream is exhausted will automatically drain all remaining deltas, ensuring the finalized response is complete.

Checking for Streaming Mode

If you need to branch your code based on whether a request was configured for streaming, use the isStreamed() method on PendingInference:
<?php

$pending = Inference::using('openai')
    ->withMessages(Messages::fromString('Hello!'))
    ->withStreaming()
    ->create();

if ($pending->isStreamed()) {
    foreach ($pending->stream()->deltas() as $delta) {
        echo $delta->contentDelta;
    }
} else {
    echo $pending->get();
}
// @doctest id="7e3a"