Learn how to work with streaming responses in Polyglot.
Streaming LLM responses may be preferred for user experience and system performance. Polyglot makes it easy to implement streaming with a consistent API across different providers.
Streaming responses are a powerful feature of modern LLM APIs that allow you to receive and process model outputs incrementally as they’re being generated, rather than waiting for the complete response. This chapter covers how to work with streaming responses in Polyglot, from basic setup to advanced processing techniques.
Streaming responses offer several advantages:
Enabling streaming in Polyglot is straightforward - you need to set the stream
option to true
in your request:
Once you have a streaming-enabled response, you can access the stream using the stream()
method:
The most common way to process a stream is to iterate through the partial responses:
Each iteration of the stream yields a PartialInferenceResponse
object with these key properties:
contentDelta
: The new content received in this chunkcontent
: The accumulated content up to this pointfinishReason
: The reason why the response finished (empty until the final chunk)usage
: Token usage statisticsAfter processing the stream, you can get the complete response:
Learn how to work with streaming responses in Polyglot.
Streaming LLM responses may be preferred for user experience and system performance. Polyglot makes it easy to implement streaming with a consistent API across different providers.
Streaming responses are a powerful feature of modern LLM APIs that allow you to receive and process model outputs incrementally as they’re being generated, rather than waiting for the complete response. This chapter covers how to work with streaming responses in Polyglot, from basic setup to advanced processing techniques.
Streaming responses offer several advantages:
Enabling streaming in Polyglot is straightforward - you need to set the stream
option to true
in your request:
Once you have a streaming-enabled response, you can access the stream using the stream()
method:
The most common way to process a stream is to iterate through the partial responses:
Each iteration of the stream yields a PartialInferenceResponse
object with these key properties:
contentDelta
: The new content received in this chunkcontent
: The accumulated content up to this pointfinishReason
: The reason why the response finished (empty until the final chunk)usage
: Token usage statisticsAfter processing the stream, you can get the complete response: