InferenceRequest
InferenceRequest is the canonical payload passed to inference drivers.
messages()model()tools(),toolChoice()responseFormat()options()outputMode()cachedContext()responseCachePolicy()retryPolicy()
withMessages(...)withModel(...)withStreaming(...)withTools(...)withToolChoice(...)withResponseFormat(...)withOptions(...)withOutputMode(...)withCachedContext(...)withResponseCachePolicy(...)withRetryPolicy(...)
PendingInference
PendingInference defers execution until you read output.
get(): text contentasJson(),asJsonData()response(): fullInferenceResponsestream():InferenceStream(only when streaming is enabled)isStreamed()
InferenceStream
InferenceStream exposes partial snapshots during streaming.
responses(): generator ofPartialInferenceResponseall(): collect all partial responsesfinal(): materialized finalInferenceResponseonPartialResponse(callable)map(...),filter(...),reduce(...)
PartialInferenceResponse
Each streamed chunk is represented as a cumulative snapshot:contentDelta/content()reasoningContentDelta/reasoningContent()toolId,toolName,toolArgstoolCalls()finishReason()usage()
EmbeddingsRequest and EmbeddingsResponse
Embeddings follow the same deferred pattern.EmbeddingsResponse also provides last(), split(...), toValuesArray(), and toArray().
Identity Types
IDs are value objects serialized as strings at boundaries:InferenceRequestIdInferenceExecutionIdInferenceAttemptIdInferenceResponseIdPartialInferenceResponseIdToolCallId