Custom Inference Drivers
Inference drivers implement theCanProcessInferenceRequest interface, which defines three
methods:
| Method | Purpose |
|---|---|
makeResponseFor() | Send a synchronous request and return the complete response |
makeStreamDeltasFor() | Send a streaming request and yield partial deltas |
capabilities() | Report driver capabilities (tool calls, JSON mode, vision, etc.) |
Registering a Driver Class
The simplest approach is to provide a class string. Polyglot will instantiate it with the standard constructor signature($config, $httpClient, $events):
Registering a Driver Factory
For more control over instantiation, pass a callable that receivesLLMConfig,
CanSendHttpRequests, and CanHandleEvents, and returns a CanProcessInferenceRequest:
Using the Registry with InferenceRuntime
You can pass the driver registry directly when building a runtime:drivers parameter on Inference::fromConfig() or Inference::using():
Implementing a Full Driver
When building a driver from scratch, you will typically need to implement several adapter components:- Request Adapter — transforms
InferenceRequestinto the provider’s HTTP request format - Body Format — structures the request body according to the provider’s API schema
- Message Format — converts Polyglot’s message format to the provider’s format
- Response Adapter — parses the provider’s HTTP response into
InferenceResponse - Usage Format — extracts token usage information from the response
OpenAIDriver or
AnthropicDriver source code for reference implementations.
Custom Embeddings Drivers
Embeddings drivers implement theCanHandleVectorization interface:
BundledEmbeddingsDrivers registry, the same
pattern used for inference drivers:
Note: TheEmbeddingsDriverRegistryis immutable — each mutation returns a new instance, matching the same pattern asInferenceDriverRegistry.
Removing or Replacing Bundled Drivers
TheInferenceDriverRegistry is immutable — each mutation returns a new instance. You can
remove a bundled driver or replace it entirely:
Bundled Drivers
For reference, Polyglot bundles the following inference drivers:| Driver Name | Class |
|---|---|
a21 | A21Driver |
anthropic | AnthropicDriver |
azure | AzureDriver |
bedrock-openai | BedrockOpenAIDriver |
cerebras | CerebrasDriver |
cohere | CohereV2Driver |
deepseek | DeepseekDriver |
fireworks | FireworksDriver |
gemini | GeminiDriver |
gemini-oai | GeminiOAIDriver |
glm | GlmDriver |
groq | GroqDriver |
huggingface | HuggingFaceDriver |
inception | InceptionDriver |
meta | MetaDriver |
minimaxi | MinimaxiDriver |
mistral | MistralDriver |
openai | OpenAIDriver |
openai-responses | OpenAIResponsesDriver |
openresponses | OpenResponsesDriver |
openrouter | OpenRouterDriver |
perplexity | PerplexityDriver |
qwen | QwenDriver |
sambanova | SambaNovaDriver |
xai | XAiDriver |
moonshot | OpenAICompatibleDriver |
ollama | OpenAICompatibleDriver |
openai-compatible | OpenAICompatibleDriver |
together | OpenAICompatibleDriver |
BundledInferenceDrivers::registry().
Bundled embeddings drivers include: openai, azure, cohere, gemini, jina, mistral,
and ollama.
Listening to Events
BothInferenceRuntime and EmbeddingsRuntime dispatch events at key lifecycle points. You
can listen for specific events or wiretap all of them: