Output Formats

By default, InstructorPHP deserializes LLM responses into PHP objects based on your response model class. The OutputFormat API allows you to change this behavior while keeping the same schema definition.

Overview

The OutputFormat API decouples schema specification (what structure the LLM should produce) from output format (how you receive the result).

// Schema from User class, output as object (default)
$user = (new StructuredOutput)
    ->withResponseClass(User::class)
    ->get();
// Returns: User object

// Schema from User class, output as array
$data = (new StructuredOutput)
    ->withResponseClass(User::class)
    ->intoArray()
    ->get();
// Returns: ['name' => 'John', 'age' => 30]
// @doctest id="693c"

Available Output Formats

1. intoArray() - Raw Associative Arrays

Returns extracted data as a plain associative array instead of an object.

$data = (new StructuredOutput)
    ->withResponseClass(User::class)  // Schema definition
    ->intoArray()                      // Output format
    ->with(messages: 'Extract: John Doe, 30 years old')
    ->get();

// Result: ['name' => 'John Doe', 'age' => 30]
dump($data['name']);  // 'John Doe'
// @doctest id="7e69"

Use cases:

Database storage - Direct array insertion without conversion
JSON APIs - Return arrays for json_encode()
Array manipulation - Easier to modify arrays than objects
Debugging - Arrays are simpler to inspect with dump()
Legacy integration - When existing code expects arrays

Example:

class User {
    public function __construct(
        public string $name,
        public int $age,
        public string $email,
    ) {}
}

$userData = (new StructuredOutput)
    ->withResponseClass(User::class)
    ->intoArray()
    ->with(messages: 'Extract: Jane Smith, 25, jane@example.com')
    ->get();

// Store directly in database
DB::table('users')->insert($userData);

// Or return as JSON API response
return response()->json($userData);
// @doctest id="dc62"

2. intoInstanceOf() - Different Output Class

Uses one class for schema definition and a different class for the output object.

$dto = (new StructuredOutput)
    ->withResponseClass(UserProfile::class)  // Rich schema (5 fields)
    ->intoInstanceOf(UserDTO::class)         // Simple output (2 fields)
    ->with(messages: 'Extract user data')
    ->get();

// Result: UserDTO instance with subset of fields
// @doctest id="dfd4"

Use cases:

Separate API contracts from internal models - Public schema vs internal DTO
Simplify complex models - Extract rich data, return simple DTO
Different validation rules - Schema validation vs output validation
Decouple layers - Domain model for LLM, presentation model for API

Example:

// Rich schema sent to LLM (all user profile data)
class UserProfile {
    public string $fullName;
    public int $age;
    public string $email;
    public string $phoneNumber;
    public string $address;
}

// Simplified DTO for your application (only essential fields)
class UserDTO {
    public function __construct(
        public string $fullName = '',
        public string $email = '',
    ) {}
}

$user = (new StructuredOutput)
    ->withResponseClass(UserProfile::class)  // LLM sees all 5 fields
    ->intoInstanceOf(UserDTO::class)         // You get 2 fields
    ->with(
        messages: "Extract: John Smith, 30, john@example.com, 555-1234, 123 Main St"
    )
    ->get();

// $user is UserDTO with only fullName and email
echo $user->fullName;  // 'John Smith'
echo $user->email;     // 'john@example.com'
// phoneNumber and address were extracted but not included in output
// @doctest id="4b74"

3. intoObject() - Self-Deserializing Objects

Provides a custom object that controls its own deserialization from the extracted array.

$scalar = (new StructuredOutput)
    ->withResponseClass(Rating::class)
    ->intoObject(Scalar::integer('rating'))
    ->with(messages: 'Extract rating: 5 stars')
    ->get();

// Result: Scalar object with custom deserialization logic
// @doctest id="06ad"

Use cases:

Scalar values - Extract single values wrapped in objects
Custom deserialization - Full control over how data becomes objects
Value objects - Domain-driven design value objects
Complex transformations - When standard deserialization isn’t enough

Example with Scalar:

use Cognesy\Instructor\Extras\Scalar\Scalar;

// Extract a single integer value
$rating = (new StructuredOutput)
    ->withResponseClass(Rating::class)
    ->intoObject(Scalar::integer('rating'))
    ->with(messages: 'Extract rating from: "5 out of 5 stars"')
    ->get();

dump($rating);  // 5 (integer)

// Extract a single string value
$sentiment = (new StructuredOutput)
    ->withResponseClass(Sentiment::class)
    ->intoObject(Scalar::string('sentiment'))
    ->with(messages: 'Analyze sentiment: "This product is amazing!"')
    ->get();

dump($sentiment);  // 'positive' (string)
// @doctest id="d04a"

Custom self-deserializing object:

use Cognesy\Instructor\Deserialization\Contracts\CanDeserializeSelf;

class Money implements CanDeserializeSelf
{
    public function __construct(
        private int $amountInCents,
        private string $currency,
    ) {}

    public function fromArray(array $data, ?string $toolName = null): static {
        // Custom deserialization logic
        $amount = $data['amount'] ?? 0;
        $currency = $data['currency'] ?? 'USD';

        return new self(
            amountInCents: (int)($amount * 100),  // Convert to cents
            currency: strtoupper($currency),       // Normalize currency
        );
    }

    public function toArray(): array {
        return [
            'amount' => $this->amountInCents / 100,
            'currency' => $this->currency,
        ];
    }
}

$price = (new StructuredOutput)
    ->withResponseClass(Product::class)
    ->intoObject(new Money(0, 'USD'))
    ->with(messages: 'Extract price: $19.99 USD')
    ->get();

// $price is Money instance with custom deserialization
// @doctest id="4018"

Streaming with Output Formats

Output formats work seamlessly with streaming responses. Key behavior:

During streaming: Partial updates are always objects (for validation and deduplication)
Final result: Respects the output format you specified

$stream = (new StructuredOutput)
    ->withResponseClass(Article::class)
    ->intoArray()  // Final result will be array
    ->with(messages: 'Extract article data')
    ->stream();

// Iterate over partial objects
foreach ($stream->partials() as $partial) {
    // $partial is Article object during streaming
    echo "Progress: " . strlen($partial->content) . " characters\n";
}

// Get final result as array
$finalArticle = $stream->finalValue();
// Returns: ['title' => '...', 'author' => '...', 'content' => '...']
// @doctest id="1f29"

Why this design?

Objects during streaming enable validation and deduplication
Array for final result provides convenience for your application
Best of both worlds: safety during processing, flexibility for results

Comparison Matrix

Feature	Default (Object)	intoArray()	intoInstanceOf()	intoObject()
Output type	Schema class	Array	Target class	Custom object
Validation	✅ Yes	❌ Skipped	✅ Yes	Custom
Transformation	✅ Yes	❌ Skipped	✅ Yes	Custom
Use case	Standard	Database/API	DTOs/Decoupling	Value objects
Streaming partials	Object	Object	Object	Object
Streaming final	Object	Array	Target class	Custom object

Common Patterns

Pattern 1: Conditional Deserialization

Inspect data before creating objects:

$data = (new StructuredOutput)
    ->withResponseClass(User::class)
    ->intoArray()
    ->with(messages: 'Extract user')
    ->get();

// Choose class based on data
if ($data['age'] < 18) {
    $user = new MinorUser(...$data);
} else {
    $user = new AdultUser(...$data);
}
// @doctest id="6db1"

Pattern 2: Data Enrichment

Add computed fields to arrays:

$data = (new StructuredOutput)
    ->withResponseClass(Person::class)
    ->intoArray()
    ->with(messages: 'Extract person')
    ->get();

// Add computed field
$data['full_name'] = $data['first_name'] . ' ' . $data['last_name'];
$data['age_group'] = $data['age'] < 30 ? 'young' : 'senior';

// Then create object
$person = new Person(...$data);
// @doctest id="321b"

Pattern 3: Multi-Layer Architecture

Separate schema from application layer:

// Domain layer - rich schema for LLM
class OrderDomain {
    public string $orderId;
    public CustomerInfo $customer;
    public array $items;
    public PaymentDetails $payment;
    public ShippingInfo $shipping;
}

// Application layer - simplified DTO
class OrderDTO {
    public function __construct(
        public string $orderId,
        public string $customerName,
        public float $total,
    ) {}
}

$order = (new StructuredOutput)
    ->withResponseClass(OrderDomain::class)  // Rich domain model
    ->intoInstanceOf(OrderDTO::class)        // Simple application DTO
    ->with(messages: 'Extract order details')
    ->get();
// @doctest id="dea0"

Important Notes

Schema is Always Respected

The output format only changes how data is returned to you. The LLM always receives the full schema:

$data = (new StructuredOutput)
    ->withResponseClass(UserProfile::class)  // LLM sees all 5 fields
    ->intoArray()                             // You get array, but...
    ->get();

// Schema sent to LLM still includes all 5 fields from UserProfile
// Only the deserialization step is different
// @doctest id="3c40"

Validation Behavior

intoArray(): Validation is skipped (arrays can’t be validated like objects)
intoInstanceOf(): Validation runs on the target class
intoObject(): Validation is custom (depends on implementation)

Backward Compatibility

Default behavior is unchanged. Output formats are opt-in:

// still works
$user = (new StructuredOutput)
    ->withResponseClass(User::class)
    ->get();
// Returns: User object (default behavior)

// new capability
$data = (new StructuredOutput)
    ->withResponseClass(User::class)
    ->intoArray()  // ← New
    ->get();
// Returns: array
// @doctest id="d440"

Examples

See working examples in:

examples/A05_Extras/OutputFormatArray/run.php - Basic intoArray() usage
examples/A05_Extras/OutputFormatInstanceOf/run.php - Different output class
examples/A05_Extras/OutputFormatStreaming/run.php - Streaming with arrays

Pluggable Extraction

InstructorPHP uses a pluggable extraction pipeline to convert raw LLM responses into canonical arrays. You can customize this pipeline for special formats or implement custom extraction logic.

Default Content Extractors

The default ResponseExtractor uses an extractor chain (tried in order):

Extractor	Description
`DirectJsonExtractor`	Parse content directly as JSON
`ResilientJsonExtractor`	Handle malformed JSON (trailing commas, etc.)
`MarkdownBlockExtractor`	Extract from ```json ``` blocks
`BracketMatchingExtractor`	Find first `{` to last `}`
`SmartBraceExtractor`	Handle escaped quotes in strings

Custom Extractors

Replace the default extractors with your own:

use Cognesy\Instructor\Extraction\Extractors\DirectJsonExtractor;
use Cognesy\Instructor\Extraction\Extractors\MarkdownBlockExtractor;

$result = (new StructuredOutput)
    ->withExtractors(
        new DirectJsonExtractor(),       // Only these extractors
        new MarkdownBlockExtractor(),
    )
    ->withResponseClass(User::class)
    ->with(messages: 'Extract user')
    ->get();
// @doctest id="4bd3"

Custom Extractor Implementation

Create your own extractor for special formats:

use Cognesy\Instructor\Extraction\Contracts\CanExtractResponse;
use Cognesy\Instructor\Extraction\Data\ExtractionInput;
use Cognesy\Instructor\Extraction\Exceptions\ExtractionException;

class XmlJsonExtractor implements CanExtractResponse
{
    public function extract(ExtractionInput $input): array
    {
        // Extract JSON from <data>...</data> tags
        if (!preg_match('/<data>(.*?)<\/data>/s', $input->content, $matches)) {
            throw new ExtractionException('No data tags found');
        }

        $json = trim($matches[1]);
        try {
            $decoded = json_decode($json, associative: true, flags: JSON_THROW_ON_ERROR);
        } catch (\JsonException $e) {
            throw new ExtractionException('Invalid JSON in data tags', $e);
        }

        if (!is_array($decoded)) {
            throw new ExtractionException('Expected object or array in data tags');
        }

        return $decoded;
    }

    public function name(): string
    {
        return 'xml_json';
    }
}

// Use custom extractor
$result = (new StructuredOutput)
    ->withExtractors(
        new XmlJsonExtractor(),
        new DirectJsonExtractor(), // Fallback
    )
    ->withResponseClass(User::class)
    ->with(messages: 'Extract user')
    ->get();
// @doctest id="e9b4"

Single Custom Extractor

For complete control over the extraction pipeline, pass a single custom extractor to withExtractors():

use Cognesy\Instructor\Extraction\Contracts\CanExtractResponse;
use Cognesy\Instructor\Extraction\Data\ExtractionInput;
use Cognesy\Instructor\Extraction\Exceptions\ExtractionException;

class CustomExtractor implements CanExtractResponse
{
    public function extract(ExtractionInput $input): array
    {
        $content = $input->content;

        // Custom extraction logic
        $data = $this->parseCustomFormat($content);

        return $data;
    }

    private function parseCustomFormat(string $content): array
    {
        if ($content === '') {
            throw new ExtractionException('Empty content');
        }

        // Your custom parsing logic
        return ['parsed' => 'data'];
    }
}

// Use one custom extractor
$result = (new StructuredOutput)
    ->withExtractors(new CustomExtractor())
    ->withResponseClass(User::class)
    ->with(messages: 'Extract user')
    ->get();
// @doctest id="91c7"

When to Customize Extraction

Use withExtractors() when:

You want to optimize for known response formats
You need to add support for additional formats
You want to change the extractor order
Custom extractors are automatically used for both sync and streaming

Response Models - How schemas work
Structures - Dynamic data models
Validation - How validation works
Streaming - Streaming responses

Packages

Instructor

Polyglot

Agents

HTTP Client

Laravel

Output Formats

Output Formats

Overview

Available Output Formats

1. intoArray() - Raw Associative Arrays

2. intoInstanceOf() - Different Output Class

3. intoObject() - Self-Deserializing Objects

Streaming with Output Formats

Comparison Matrix

Common Patterns

Pattern 1: Conditional Deserialization

Pattern 2: Data Enrichment

Pattern 3: Multi-Layer Architecture

Important Notes

Schema is Always Respected

Validation Behavior

Backward Compatibility

Examples

Pluggable Extraction

Default Content Extractors

Custom Extractors

Custom Extractor Implementation

Single Custom Extractor

When to Customize Extraction

Packages

Instructor

Polyglot

Agents

HTTP Client

Laravel

​Output Formats

​Overview

​Available Output Formats

​1. intoArray() - Raw Associative Arrays

​2. intoInstanceOf() - Different Output Class

​3. intoObject() - Self-Deserializing Objects

​Streaming with Output Formats

​Comparison Matrix

​Common Patterns

​Pattern 1: Conditional Deserialization

​Pattern 2: Data Enrichment

​Pattern 3: Multi-Layer Architecture

​Important Notes

​Schema is Always Respected

​Validation Behavior

​Backward Compatibility

​Examples

​Pluggable Extraction

​Default Content Extractors

​Custom Extractors

​Custom Extractor Implementation

​Single Custom Extractor

​When to Customize Extraction

​Related Documentation

Output Formats

Overview

Available Output Formats

1. intoArray() - Raw Associative Arrays

2. intoInstanceOf() - Different Output Class

3. intoObject() - Self-Deserializing Objects

Streaming with Output Formats

Comparison Matrix

Common Patterns

Pattern 1: Conditional Deserialization

Pattern 2: Data Enrichment

Pattern 3: Multi-Layer Architecture

Important Notes

Schema is Always Respected

Validation Behavior

Backward Compatibility

Examples

Pluggable Extraction

Default Content Extractors

Custom Extractors

Custom Extractor Implementation

Single Custom Extractor

When to Customize Extraction

Related Documentation