Listening to Events
Both inference and embeddings runtimes expose two ways to listen to events:Targeted Listeners
UseonEvent() to listen for a specific event class:
Wiretap
Usewiretap() to receive all events regardless of type. This is useful for debugging and general-purpose logging:
Inference Events
The inference lifecycle dispatches events in this order:Execution-Level Events
| Event | When Dispatched | Key Data |
|---|---|---|
InferenceStarted | Beginning of execution | data['executionId'], data['requestId'], data['isStreamed'], data['model'], data['messageCount'] |
InferenceCompleted | End of execution (success or failure) | data['executionId'], data['isSuccess'], data['finishReason'], data['attemptCount'], data['durationMs'], token-count fields |
InferenceCompleted is dispatched exactly once per execution, whether it succeeded or failed.
Attempt-Level Events
Each retry attempt dispatches its own events:| Event | When Dispatched | Key Data |
|---|---|---|
InferenceAttemptStarted | Beginning of an attempt | execution ID, attempt ID, attempt number, model |
InferenceAttemptSucceeded | Attempt completed successfully | data['executionId'], data['attemptId'], data['attemptNumber'], data['finishReason'], data['durationMs'], token-count fields |
InferenceAttemptFailed | Attempt failed | data['executionId'], data['attemptId'], data['attemptNumber'], data['errorMessage'], data['errorType'], data['willRetry'], data['httpStatusCode'], partial token-count fields, data['durationMs'] |
InferenceUsageReported | After a successful attempt | data['executionId'], data['model'], data['isFinal'], token-count fields |
InferenceAttemptStarted/InferenceAttemptFailed pairs before a final InferenceAttemptSucceeded event. The attemptNumber field tracks which attempt is running.
Response Events
| Event | When Dispatched | Key Data |
|---|---|---|
InferenceRequested | Before sending the HTTP request | request data |
InferenceResponseCreated | After receiving and parsing the response | data['executionId'], data['requestId'], data['responseId'], data['finishReason'], content-length fields, tool-call summary, data['usage'] |
InferenceFailed | On unrecoverable failure | error details |
Streaming Events
| Event | When Dispatched | Key Data |
|---|---|---|
StreamFirstChunkReceived | First visible delta arrives | execution ID, timeToFirstChunkMs, receivedAt, model, initial content |
PartialInferenceDeltaCreated | Each visible delta | data['executionId'], data['contentDelta'] |
StreamEventReceived | Raw SSE event received | raw event data |
StreamEventParsed | SSE event parsed into a delta | parsed event data |
StreamFirstChunkReceived event is particularly useful for measuring time-to-first-chunk (TTFC), as it includes the requestStartedAt timestamp.
Driver Events
| Event | When Dispatched | Key Data |
|---|---|---|
InferenceDriverBuilt | After the driver is created by the factory | driver class, redacted config, HTTP client class |
InferenceDriverBuilt event payload.
Embeddings Events
The embeddings lifecycle dispatches a smaller set of events:| Event | When Dispatched | Key Data |
|---|---|---|
EmbeddingsDriverBuilt | After the embeddings driver is created | driver class, config, HTTP client class |
EmbeddingsRequested | Before sending the embeddings request | request data |
EmbeddingsResponseReceived | After receiving the response | data['model'], data['inputCount'], data['vectorCount'], data['dimensions'], data['usage'] |
EmbeddingsFailed | On failure | error details |
Practical Examples
Logging Token Usage
Measuring Time-to-First-Chunk
Tracking Retry Attempts
Monitoring Execution Outcomes
Event Dispatcher
Events are dispatched through anEventDispatcher that implements CanHandleEvents (which extends Psr\EventDispatcher\EventDispatcherInterface). When a runtime is created without an explicit event dispatcher, it creates a default one named 'polyglot.inference.runtime' or 'polyglot.embeddings.runtime'.
You can inject a shared event dispatcher to correlate events across multiple runtimes or integrate with your application’s existing event system: