Benefits of Streaming
Streaming responses offer several advantages:- Improved User Experience: Display content to users as it’s generated, creating a more responsive interface
- Reduced Latency Perception: Users see the beginning of a response almost immediately
- Progressive Processing: Begin processing early parts of the response while later parts are still being generated
- Handling Long Outputs: Efficiently process responses that may be very long without hitting timeout limits
- Early Termination: Stop generation early if needed, saving resources
Enabling Streaming
Enabling streaming in Polyglot is straightforward - you need to set thestream option to true in your request:
stream() method:
Basic Stream Processing
The most common way to process a stream is to iterate through the partial responses:Understanding Partial Responses
Each iteration of the stream yields aPartialInferenceResponse object with these key properties:
contentDelta: The new content received in this chunkcontent: The accumulated content up to this pointfinishReason: The reason why the response finished (empty until the final chunk)usage: Token usage statistics
Retrieving the Final Response
After processing the stream, you can get the complete response:Stream Replay Contract
stream()->responses() is one-shot by default. Once fully consumed, iterating again raises an exception.
- Use
final()to get the terminal response safely and idempotently. - If you need replay, opt in with
ResponseCachePolicy::Memory: