Inference class is a thin, immutable facade over InferenceRuntime. It provides the
unified entry point for configuring providers, building requests, and retrieving responses
from any supported LLM.
Creating an Instance
Choose the factory method that matches your level of control:Presets
The most common pattern isInference::using(), which loads a named preset from your
configuration files. Each preset defines the provider type, API key, base URL, default
model, and other connection details:
Configuring a Request
The fluent API lets you build requests step by step. Every method returns a new immutable instance, so you can safely branch from a shared configuration:Messages and Model
Tools and Response Format
Streaming and Token Limits
Provider-Specific Options
The Combined with() Method
When you prefer a single call, use with() to set multiple fields at once:
Full Method Reference
| Method | Purpose |
|---|---|
withMessages(...) | Set conversation messages |
withModel(...) | Override the model |
withTools(...) | Attach tool/function definitions |
withToolChoice(...) | Control tool selection strategy |
withResponseFormat(...) | Specify the response format |
withOptions(...) | Set provider-specific options |
withStreaming(...) | Enable or disable streaming |
withMaxTokens(...) | Set maximum token count |
withCachedContext(...) | Attach reusable cached context |
withRetryPolicy(...) | Configure retry behavior |
withResponseCachePolicy(...) | Configure response caching |
withRequest(...) | Load all fields from an InferenceRequest |
withRuntime(...) | Replace the underlying runtime |
Executing Requests
Response Shortcuts
These methods build the request, execute it, and return the result in a single step:Streaming
To receive partial results as they arrive from the provider:The Lazy Handle: PendingInference
If you need to defer execution or pass the handle to another part of your system,
call create() to get a PendingInference instance. Execution happens only when
you call a response method on it:
PendingInference exposes the same response methods as Inference: get(),
response(), asJson(), asJsonData(), asToolCallJson(), asToolCallJsonData(),
and stream().
Custom Drivers
To use a custom driver, implement theCanProvideInferenceDrivers contract and pass
it to Inference::using() or Inference::fromConfig() via the drivers parameter: