Output Formats
By default, InstructorPHP deserializes LLM responses into PHP objects based on your response model class. The OutputFormat API allows you to change this behavior while keeping the same schema definition.Overview
The OutputFormat API decouples schema specification (what structure the LLM should produce) from output format (how you receive the result).Available Output Formats
1. intoArray() - Raw Associative Arrays
Returns extracted data as a plain associative array instead of an object.- Database storage - Direct array insertion without conversion
- JSON APIs - Return arrays for
json_encode() - Array manipulation - Easier to modify arrays than objects
- Debugging - Arrays are simpler to inspect with
dump() - Legacy integration - When existing code expects arrays
2. intoInstanceOf() - Different Output Class
Uses one class for schema definition and a different class for the output object.- Separate API contracts from internal models - Public schema vs internal DTO
- Simplify complex models - Extract rich data, return simple DTO
- Different validation rules - Schema validation vs output validation
- Decouple layers - Domain model for LLM, presentation model for API
3. intoObject() - Self-Deserializing Objects
Provides a custom object that controls its own deserialization from the extracted array.- Scalar values - Extract single values wrapped in objects
- Custom deserialization - Full control over how data becomes objects
- Value objects - Domain-driven design value objects
- Complex transformations - When standard deserialization isn’t enough
Streaming with Output Formats
Output formats work seamlessly with streaming responses. Key behavior:- During streaming: Partial updates are always objects (for validation and deduplication)
- Final result: Respects the output format you specified
- Objects during streaming enable validation and deduplication
- Array for final result provides convenience for your application
- Best of both worlds: safety during processing, flexibility for results
Comparison Matrix
| Feature | Default (Object) | intoArray() | intoInstanceOf() | intoObject() |
|---|---|---|---|---|
| Output type | Schema class | Array | Target class | Custom object |
| Validation | ✅ Yes | ❌ Skipped | ✅ Yes | Custom |
| Transformation | ✅ Yes | ❌ Skipped | ✅ Yes | Custom |
| Use case | Standard | Database/API | DTOs/Decoupling | Value objects |
| Streaming partials | Object | Object | Object | Object |
| Streaming final | Object | Array | Target class | Custom object |
Common Patterns
Pattern 1: Conditional Deserialization
Inspect data before creating objects:Pattern 2: Data Enrichment
Add computed fields to arrays:Pattern 3: Multi-Layer Architecture
Separate schema from application layer:Important Notes
Schema is Always Respected
The output format only changes how data is returned to you. The LLM always receives the full schema:Validation Behavior
- intoArray(): Validation is skipped (arrays can’t be validated like objects)
- intoInstanceOf(): Validation runs on the target class
- intoObject(): Validation is custom (depends on implementation)
Backward Compatibility
Default behavior is unchanged. Output formats are opt-in:Examples
See working examples in:examples/A05_Extras/OutputFormatArray/run.php- BasicintoArray()usageexamples/A05_Extras/OutputFormatInstanceOf/run.php- Different output classexamples/A05_Extras/OutputFormatStreaming/run.php- Streaming with arrays
Pluggable Extraction
InstructorPHP uses a pluggable extraction pipeline to convert raw LLM responses into canonical arrays. You can customize this pipeline for special formats or implement custom extraction logic.Default Content Extractors
The defaultResponseExtractor uses an extractor chain (tried in order):
| Extractor | Description |
|---|---|
DirectJsonExtractor | Parse content directly as JSON |
ResilientJsonExtractor | Handle malformed JSON (trailing commas, etc.) |
MarkdownBlockExtractor | Extract from ```json ``` blocks |
BracketMatchingExtractor | Find first { to last } |
SmartBraceExtractor | Handle escaped quotes in strings |
Custom Extractors
Replace the default extractors with your own:Custom Extractor Implementation
Create your own extractor for special formats:Custom Response Extractor
For complete control over the extraction pipeline, implementCanExtractResponse:
When to Customize Extraction
UsewithExtractors() when:
- You want to optimize for known response formats
- You need to add support for additional formats
- You want to change the extractor order
- Custom extractors are automatically used for both sync and streaming
withExtractor() when:
- You need completely custom extraction logic
- You’re integrating with a non-standard LLM response format
- You want to bypass the extractor chain entirely
Related Documentation
- Response Models - How schemas work
- Structures - Dynamic data models
- Validation - How validation works
- Streaming - Streaming responses