InferenceStream provides a set of functional helpers for processing deltas. These methods build on top of the deltas() generator, so each one consumes the stream — you should use only one of them per stream instance.
Reducing to a Single Value
Thereduce() method works like array_reduce: it folds every delta into an accumulator and returns the final value. This is useful when you need a single result derived from the entire stream:
reduce() drains the entire stream before returning, it blocks until the response is complete.
Mapping Deltas
Themap() method transforms each delta into a new value and yields the results as a generator. Use it to extract or reshape data from each chunk without consuming the stream eagerly:
Filtering Deltas
Thefilter() method yields only the deltas that satisfy a given predicate. Deltas for which the callback returns false are silently skipped:
Collecting All Deltas
Theall() method drains the stream and returns every visible delta as an array. This is handy for inspection or testing, but keep in mind that it loads the entire stream into memory:
Accessing the Last Delta
After the stream has been consumed (either partially or fully), you can retrieve the most recently yielded delta withlastDelta():
Token Usage
Theusage() method returns the accumulated InferenceUsage object for the stream, containing input tokens, output tokens, and any cache or reasoning token counts reported by the provider:
Execution Metadata
Theexecution() method returns the underlying InferenceExecution object, which contains the original request, the finalized response (once the stream completes), and execution metadata such as the execution ID:
Summary of Available Methods
| Method | Returns | Consumes stream? | Description |
|---|---|---|---|
deltas() | Generator<PartialInferenceDelta> | Yes | Yields visible deltas one by one. |
map(callable) | iterable<T> | Yes | Transforms each delta via a callback. |
filter(callable) | iterable<PartialInferenceDelta> | Yes | Yields only deltas matching a predicate. |
reduce(callable, initial) | mixed | Yes (blocking) | Folds all deltas into a single value. |
all() | array<PartialInferenceDelta> | Yes (blocking) | Collects all deltas into an array. |
onDelta(callable) | self | No (registers callback) | Registers a callback fired for each visible delta. |
final() | ?InferenceResponse | Drains if needed | Returns the assembled final response. |
lastDelta() | ?PartialInferenceDelta | No | Returns the most recently yielded delta. |
usage() | InferenceUsage | No | Returns accumulated token usage. |
execution() | InferenceExecution | No | Returns the execution context and metadata. |