The Transport Contract
All HTTP communication flows through a single contract:CanSendHttpRequests implementation. The driver translates its InferenceRequest into an HttpRequest, sends it via the client’s send() method (which returns a PendingHttpResponse), calls get() on the pending response to obtain the HttpResponse, and translates that back.
Default Client
When you callInferenceRuntime::fromConfig(...) or EmbeddingsRuntime::fromConfig(...) without providing an HTTP client, Polyglot creates a default one using HttpClientBuilder:
HttpRequest and HttpResponse
These data objects represent the HTTP layer’s request and response. Drivers createHttpRequest objects through their request adapters and read HttpResponse objects through their response adapters.
HttpRequest
HttpResponse
TheHttpResponse interface provides access to the response data:
stream() method returns a Generator that yields chunks as they arrive from the provider. The driver’s response adapter parses these chunks into SSE events and then into PartialInferenceDelta objects.
Middleware
The HTTP client supports a middleware stack for cross-cutting concerns like logging, retries, caching, and authentication. Middleware implements theHttpMiddleware interface:
BaseMiddleware abstract class provides convenient hooks so you do not need to manage the chain manually:
Managing the Middleware Stack
TheMiddlewareStack supports named middleware for easy manipulation:
prepend() runs before middleware added with append(). The name parameter is optional but recommended — it allows you to replace or remove middleware later without tracking references.
Stream Cache Manager
For advanced use cases, Polyglot supports stream caching through theCanManageStreamCache contract. When provided, the stream cache manager can record and replay streaming responses, which is useful for testing and development:
ResponseCachePolicy enum on the InferenceRequest. You can set this through the facade: