Symptoms
- Streams cutting off prematurely
LogicExceptionwith “Stream is exhausted and cannot be replayed”- Partial or incomplete responses
- No output appearing during streaming (buffering issue)
- Connection timeouts during long-running streams
Enable Streaming Correctly
Streaming must be explicitly enabled on the request. The simplest approach is to use thestream() shortcut on the inference builder, then consume the stream via deltas():
withStreaming() method followed by create()->stream():
Do Not Consume a Stream Twice
The most common streaming mistake is attempting to iterate overdeltas() more than once. Streams are single-pass by design. A second call to deltas() throws a LogicException.
Collect the Full Response After Streaming
To get the complete assembled response after consuming all deltas, use thefinal() method on the stream:
final() directly — it will drain the stream internally.
Flush Output Buffers
When streaming to a browser or CLI, PHP’s output buffering can delay visible output. Flush buffers explicitly after each delta:- Nginx — disable proxy buffering with
proxy_buffering off;or set the response headerX-Accel-Buffering: no - Apache mod_deflate / mod_gzip — compression modules buffer output; disable them for streaming endpoints
- PHP output buffering — check
output_bufferinginphp.iniand consider callingob_end_flush()before streaming begins
Handle Connection Timeouts
Streaming responses can take longer than non-streaming requests because the connection remains open while the model generates tokens. Increase the timeout settings to accommodate this:idleTimeout is particularly important for streaming. It controls how long the client waits for the next chunk before giving up. If a model pauses while generating (for example, during complex reasoning), a short idle timeout will cause the stream to terminate prematurely.
Handle Errors During Streaming
Wrap the stream consumption in a try-catch to handle errors that occur mid-stream. This is important because errors can arise after some deltas have already been received:Use the onDelta Callback
Instead of iterating overdeltas(), you can register a callback that is invoked for each visible delta:
Use Functional Stream Operations
The stream supportsmap(), filter(), and reduce() operations for functional-style processing:
Fallback to Non-Streaming
If streaming consistently fails for a particular model or provider, fall back to a non-streaming request:Verify Model Supports Streaming
Not all models support streaming. If enabling streaming causes errors, test with a plain non-streaming request first. If the non-streaming request succeeds, the model may not support streaming, or the provider may require a different endpoint for streamed responses.Common Pitfalls
- Consuming
deltas()twice. This is the most frequent mistake. UseResponseCachePolicy::Memoryif you need to replay. - Not flushing output. Without explicit
flush()calls, PHP buffers output and the user sees nothing until the stream completes. - Short timeouts. The default 30-second request timeout is too short for many streaming responses. Increase
requestTimeoutandidleTimeout. - Ignoring partial content on error. When a stream error occurs mid-way, you may have already received useful content. Always capture partial content in your error handler.
- Server-side buffering. Even with PHP
flush(), Nginx or Apache may buffer the response. Configure your web server to pass through responses immediately for streaming endpoints.