Advanced Usage
This guide covers advanced patterns and features for power users who need fine-grained control over extraction behavior, streaming, validation, and multi-provider workflows.Streaming
Streaming lets you receive partial results as the LLM generates them, rather than waiting for the entire response. This is essential for long-running extractions where you want to show progress, or for real-time UIs that display data as it becomes available.Streaming with partials()
Each partial is a partially populated instance of your response model. Properties that have not been received yet will be null or their default value. This is useful for broadcasting live updates via WebSockets.
Streaming Sequences
When extracting an array of items, thesequence() method yields the growing collection as each new item is completed.
Validation and Retries
Automatic Validation
Response models are automatically validated after deserialization. When validation fails, the package sends the error messages back to the LLM with a retry prompt, asking it to correct the response. This loop continues up tomax_retries times.
Custom Validators
Implement theCanValidateObject contract for domain-specific validation logic that cannot be expressed with declarative attributes. The validate method must return a ValidationResult instance.
StructuredOutputRuntime, not on the facade directly:
Custom Retry Prompt
Customize the message sent to the LLM when validation fails. The{errors} placeholder is replaced with the actual error messages.
Data Transformation
Apply transformations to extracted data after deserialization. Transformers run after validation, so they can normalize, enrich, or restructure the data before it reaches your application code.StructuredOutputRuntime, not on the facade directly:
Output Modes
Different LLMs support different output modes. The output mode controls the mechanism used to extract structured data from the model’s response. You can set the default mode inconfig/instructor.php or override it per-request via the runtime.
Few-Shot Learning
Providing input/output examples significantly improves extraction quality, especially for ambiguous or domain-specific data. Each example pairs an input string with a fully populated response model instance.System Prompts
System prompts set the overall behavior and domain context for the LLM. They are especially valuable when extracting specialized data.Tool Descriptions
Customize how the response model is described to the LLM in the tool/function calling interface. This is particularly useful when the auto-generated name or description is not descriptive enough for the model to understand the task.Multiple Providers
Switch between providers based on the task at hand. Different providers offer different trade-offs in speed, accuracy, cost, and privacy.Cached Context (Prompt Caching)
For repeated extractions with the same system prompt, examples, or large context, usewithCachedContext() to signal that the context should be cached by providers that support prompt caching (e.g., Anthropic, OpenAI). This can significantly reduce latency and cost for subsequent calls.