Overview of Streaming
Learn how to work with streaming responses in Polyglot.
Streaming LLM responses may be preferred for user experience and system performance. Polyglot makes it easy to implement streaming with a consistent API across different providers.
Streaming responses are a powerful feature of modern LLM APIs that allow you to receive and process model outputs incrementally as they’re being generated, rather than waiting for the complete response. This chapter covers how to work with streaming responses in Polyglot, from basic setup to advanced processing techniques.
Benefits of Streaming
Streaming responses offer several advantages:
- Improved User Experience: Display content to users as it’s generated, creating a more responsive interface
- Reduced Latency Perception: Users see the beginning of a response almost immediately
- Progressive Processing: Begin processing early parts of the response while later parts are still being generated
- Handling Long Outputs: Efficiently process responses that may be very long without hitting timeout limits
- Early Termination: Stop generation early if needed, saving resources
Enabling Streaming
Enabling streaming in Polyglot is straightforward - you need to set the stream
option to true
in your request:
Once you have a streaming-enabled response, you can access the stream using the stream()
method:
Basic Stream Processing
The most common way to process a stream is to iterate through the partial responses:
Understanding Partial Responses
Each iteration of the stream yields a PartialLLMResponse
object with these key properties:
contentDelta
: The new content received in this chunkcontent
: The accumulated content up to this pointfinishReason
: The reason why the response finished (empty until the final chunk)usage
: Token usage statistics
Retrieving the Final Response
After processing the stream, you can get the complete response: