Streaming Responses
Learn how to handle streaming responses using the Instructor HTTP client API.
Streaming responses are a powerful feature that allows processing data as it arrives from the server, rather than waiting for the entire response to be received. This is particularly valuable when:
- Working with large responses that might exceed memory limits
- Processing real-time data streams
- Handling responses from AI models that generate content token by token
- Building user interfaces that show progressive updates
The Instructor HTTP client API provides robust support for streaming responses across all supported HTTP client implementations.
Enabling Streaming
To receive a streaming response, you need to configure the request with the stream
option set to true
:
The stream
option tells the HTTP client to treat the response as a stream, which means:
- It won’t buffer the entire response in memory
- It will provide a way to read the response incrementally
- The connection will remain open until all data is received or the stream is closed
Processing Streamed Data
Once you have a streaming response, you can process it using the stream()
method, which returns a PHP Generator:
By default, the stream()
method reads the response in small chunks. You can control the chunk size by passing a parameter:
Example: Downloading a Large File
Here’s an example of downloading a large file with streaming to avoid memory issues:
This approach allows downloading very large files without loading the entire file into memory.
Example: Processing Server-Sent Events (SSE)
Server-Sent Events (SSE) are a common streaming format used by many APIs. Here’s how to process them:
While this works, processing streaming responses line by line is common enough that the library provides a dedicated middleware for it, as we’ll see in the next section.
Line-by-Line Processing
For many streaming APIs, especially those that send event streams or line-delimited JSON, it’s useful to process the response line by line. The library provides the StreamByLineMiddleware
to simplify this task:
Customizing Line Processing
You can customize how lines are parsed by providing a parser function to the middleware:
If your parser returns null
, that line will be skipped in the stream.
Example: Processing OpenAI Chat Completions
Here’s a practical example of using the StreamByLineMiddleware
to process streaming responses from the OpenAI API:
This approach allows you to display the AI-generated content to the user in real-time as it’s being generated, providing a more responsive user experience.
Considerations for Streaming
When working with streaming responses, keep these considerations in mind:
-
Memory Usage: While streaming reduces memory usage overall, be careful not to accumulate the entire response in memory by appending to a variable unless necessary.
-
Connection Stability: Streaming connections can be more sensitive to network issues. Consider implementing error handling and retry logic for more robust applications.
-
Server Timeouts: Some servers or proxies might timeout long-running connections. Make sure your infrastructure is configured to allow the necessary connection times.
-
Middleware Order: When using middleware that processes streaming responses, the order of middleware can be important. Middleware is executed in the order it’s added to the stack.
In the next chapter, we’ll explore how to make multiple concurrent requests using request pools, which can significantly improve performance when fetching data from multiple endpoints.