InferenceRequest
InferenceRequest encapsulates everything needed for an LLM call. It stores the conversation messages, model selection, tools, response format, options, and caching/retry configuration.
Namespace: Cognesy\Polyglot\Inference\Data\InferenceRequest
Key Properties
| Property | Type | Description |
|---|---|---|
id | InferenceRequestId | Unique identifier, auto-generated |
createdAt | DateTimeImmutable | Timestamp of creation |
updatedAt | DateTimeImmutable | Timestamp of last mutation |
messages | Messages | The conversation messages |
model | string | Model identifier |
tools | ToolDefinitions | Tool/function definitions |
toolChoice | ToolChoice | Tool selection strategy |
responseFormat | ResponseFormat | Structured output format |
options | array | Additional options (e.g. stream, max_tokens, temperature) |
cachedContext | CachedInferenceContext | Shared context for prompt caching |
responseCachePolicy | ResponseCachePolicy | Controls response caching behavior |
retryPolicy | ?InferenceRetryPolicy | Retry configuration |
Reading Values
hasMessages(), hasModel(), hasTools(), hasToolChoice(), hasResponseFormat(), hasNonTextResponseFormat(), hasTextResponseFormat(), hasOptions().
Modifying a Request
All mutators return a new instance, preserving the original request ID and creation timestamp:with(...) method allows setting multiple fields in a single call:
Cached Context
The cached context mechanism allows you to separate stable parts of a prompt (system messages, tool definitions, response format) from the dynamic parts (user messages). WhenwithCacheApplied() is called, the cached context is merged into the request:
Serialization
Requests can be serialized to and from arrays for storage or transport:PendingInference
PendingInference is a lazy handle for a single inference operation. It does not execute the request until you access the results. This enables the fluent Inference API to defer execution to the moment of consumption.
Namespace: Cognesy\Polyglot\Inference\PendingInference
Consuming Results
InferenceExecutionSession handles retry logic, event dispatching, and response caching. Once execution completes, the response is cached for the lifetime of the PendingInference instance.
Important: Callingstream()on a non-streaming request will throw anInvalidArgumentException. Enable streaming viawithStreaming(true)on the facade before callingcreate().
InferenceResponse
InferenceResponse is a final readonly value object that normalizes the provider’s result into a consistent shape.
Namespace: Cognesy\Polyglot\Inference\Data\InferenceResponse
Reading the Response
hasContent(), hasReasoningContent(), hasToolCalls(), hasFinishReason().
JSON Extraction
The response provides convenience methods for extracting structured data:findToolCallJsonData() returns the arguments of that call. When there are multiple tool calls, it returns an array of all tool call data.
Reasoning Content Fallback
Some providers embed reasoning in<think> tags within the content rather than in a dedicated field. The withReasoningContentFallbackFromContent() method handles this:
<think> tags are present.
Finish Reason
ThefinishReason() method returns an InferenceFinishReason enum. The hasFinishedWithFailure() method checks whether the response ended with an error, content filter, or length limit:
Serialization
Responses support round-trip serialization:PartialInferenceDelta
During streaming, the driver emitsPartialInferenceDelta objects for each SSE event. Each delta carries only the incremental change from that event.
Namespace: Cognesy\Polyglot\Inference\Data\PartialInferenceDelta
Fields
| Field | Type | Description |
|---|---|---|
contentDelta | string | Incremental text content |
reasoningContentDelta | string | Incremental reasoning content |
toolId | ToolCallId|string|null | Tool call identifier |
toolName | string | Tool name (first delta of a tool call) |
toolArgs | string | Incremental tool call arguments |
finishReason | string | Set on the final delta |
usage | ?InferenceUsage | Token usage (typically on the last delta) |
usageIsCumulative | bool | Whether usage represents total (true) or incremental (false) |
responseData | ?HttpResponse | Raw response data for this event |
value | mixed | Optional provider-specific value |
InferenceStream accumulates these deltas internally using InferenceStreamState and assembles the final InferenceResponse when the stream completes. A VisibilityTracker ensures that only deltas with meaningful content changes are yielded to the caller.
InferenceUsage
TheInferenceUsage object tracks token consumption across several categories:
Namespace: Cognesy\Polyglot\Inference\Data\InferenceUsage
Cost Calculation
Cost is calculated externally using a calculator rather than through methods on the usage object. Pricing is specified in USD per 1 million tokens:Accumulation
Usage and cost can be accumulated across multiple requests:Embeddings Data Objects
EmbeddingsRequest
Holds the input texts, model, options, and retry policy for an embeddings call:EmbeddingsResponse
Normalizes the provider’s embeddings result:PendingEmbeddings
A lazy handle similar toPendingInference. Calling get() triggers the HTTP request and returns an EmbeddingsResponse. The response is cached after the first call. Retry logic is handled internally based on the EmbeddingsRetryPolicy attached to the request, using the same exponential backoff pattern as inference retries.