Skip to main content

Output Formats

By default, InstructorPHP deserializes LLM responses into PHP objects based on your response model class. The OutputFormat API allows you to change this behavior while keeping the same schema definition.

Overview

The OutputFormat API decouples schema specification (what structure the LLM should produce) from output format (how you receive the result).
// Schema from User class, output as object (default)
$user = (new StructuredOutput)
    ->withResponseClass(User::class)
    ->get();
// Returns: User object

// Schema from User class, output as array
$data = (new StructuredOutput)
    ->withResponseClass(User::class)
    ->intoArray()
    ->get();
// Returns: ['name' => 'John', 'age' => 30]
// @doctest id="5138"

Available Output Formats

1. intoArray() - Raw Associative Arrays

Returns extracted data as a plain associative array instead of an object.
$data = (new StructuredOutput)
    ->withResponseClass(User::class)  // Schema definition
    ->intoArray()                      // Output format
    ->with(messages: 'Extract: John Doe, 30 years old')
    ->get();

// Result: ['name' => 'John Doe', 'age' => 30]
dump($data['name']);  // 'John Doe'
// @doctest id="b30a"
Use cases:
  • Database storage - Direct array insertion without conversion
  • JSON APIs - Return arrays for json_encode()
  • Array manipulation - Easier to modify arrays than objects
  • Debugging - Arrays are simpler to inspect with dump()
  • Legacy integration - When existing code expects arrays
Example:
class User {
    public function __construct(
        public string $name,
        public int $age,
        public string $email,
    ) {}
}

$userData = (new StructuredOutput)
    ->withResponseClass(User::class)
    ->intoArray()
    ->with(messages: 'Extract: Jane Smith, 25, [email protected]')
    ->get();

// Store directly in database
DB::table('users')->insert($userData);

// Or return as JSON API response
return response()->json($userData);
// @doctest id="0a5d"

2. intoInstanceOf() - Different Output Class

Uses one class for schema definition and a different class for the output object.
$dto = (new StructuredOutput)
    ->withResponseClass(UserProfile::class)  // Rich schema (5 fields)
    ->intoInstanceOf(UserDTO::class)         // Simple output (2 fields)
    ->with(messages: 'Extract user data')
    ->get();

// Result: UserDTO instance with subset of fields
// @doctest id="2cd3"
Use cases:
  • Separate API contracts from internal models - Public schema vs internal DTO
  • Simplify complex models - Extract rich data, return simple DTO
  • Different validation rules - Schema validation vs output validation
  • Decouple layers - Domain model for LLM, presentation model for API
Example:
// Rich schema sent to LLM (all user profile data)
class UserProfile {
    public string $fullName;
    public int $age;
    public string $email;
    public string $phoneNumber;
    public string $address;
}

// Simplified DTO for your application (only essential fields)
class UserDTO {
    public function __construct(
        public string $fullName = '',
        public string $email = '',
    ) {}
}

$user = (new StructuredOutput)
    ->withResponseClass(UserProfile::class)  // LLM sees all 5 fields
    ->intoInstanceOf(UserDTO::class)         // You get 2 fields
    ->with(
        messages: "Extract: John Smith, 30, [email protected], 555-1234, 123 Main St"
    )
    ->get();

// $user is UserDTO with only fullName and email
echo $user->fullName;  // 'John Smith'
echo $user->email;     // '[email protected]'
// phoneNumber and address were extracted but not included in output
// @doctest id="96ce"

3. intoObject() - Self-Deserializing Objects

Provides a custom object that controls its own deserialization from the extracted array.
$scalar = (new StructuredOutput)
    ->withResponseClass(Rating::class)
    ->intoObject(new Scalar('rating', 'integer'))
    ->with(messages: 'Extract rating: 5 stars')
    ->get();

// Result: Scalar object with custom deserialization logic
// @doctest id="5e0e"
Use cases:
  • Scalar values - Extract single values wrapped in objects
  • Custom deserialization - Full control over how data becomes objects
  • Value objects - Domain-driven design value objects
  • Complex transformations - When standard deserialization isn’t enough
Example with Scalar:
use Cognesy\Instructor\Extras\Scalar\Scalar;

// Extract a single integer value
$rating = (new StructuredOutput)
    ->withResponseClass(Rating::class)
    ->intoObject(new Scalar('rating', 'integer'))
    ->with(messages: 'Extract rating from: "5 out of 5 stars"')
    ->get();

dump($rating);  // 5 (integer)

// Extract a single string value
$sentiment = (new StructuredOutput)
    ->withResponseClass(Sentiment::class)
    ->intoObject(new Scalar('sentiment', 'string'))
    ->with(messages: 'Analyze sentiment: "This product is amazing!"')
    ->get();

dump($sentiment);  // 'positive' (string)
// @doctest id="b90b"
Custom self-deserializing object:
use Cognesy\Instructor\Deserialization\Contracts\CanDeserializeSelfFromArray;

class Money implements CanDeserializeSelfFromArray
{
    public function __construct(
        private int $amountInCents,
        private string $currency,
    ) {}

    public static function fromArray(array $data): self {
        // Custom deserialization logic
        $amount = $data['amount'] ?? 0;
        $currency = $data['currency'] ?? 'USD';

        return new self(
            amountInCents: (int)($amount * 100),  // Convert to cents
            currency: strtoupper($currency),       // Normalize currency
        );
    }

    public function toArray(): array {
        return [
            'amount' => $this->amountInCents / 100,
            'currency' => $this->currency,
        ];
    }
}

$price = (new StructuredOutput)
    ->withResponseClass(Product::class)
    ->intoObject(new Money(0, 'USD'))
    ->with(messages: 'Extract price: $19.99 USD')
    ->get();

// $price is Money instance with custom deserialization
// @doctest id="46f0"

Streaming with Output Formats

Output formats work seamlessly with streaming responses. Key behavior:
  • During streaming: Partial updates are always objects (for validation and deduplication)
  • Final result: Respects the output format you specified
$stream = (new StructuredOutput)
    ->withResponseClass(Article::class)
    ->intoArray()  // Final result will be array
    ->with(messages: 'Extract article data')
    ->stream();

// Iterate over partial objects
foreach ($stream->partials() as $partial) {
    // $partial is Article object during streaming
    echo "Progress: " . strlen($partial->content) . " characters\n";
}

// Get final result as array
$finalArticle = $stream->finalValue();
// Returns: ['title' => '...', 'author' => '...', 'content' => '...']
// @doctest id="af93"
Why this design?
  • Objects during streaming enable validation and deduplication
  • Array for final result provides convenience for your application
  • Best of both worlds: safety during processing, flexibility for results

Comparison Matrix

FeatureDefault (Object)intoArray()intoInstanceOf()intoObject()
Output typeSchema classArrayTarget classCustom object
Validation✅ Yes❌ Skipped✅ YesCustom
Transformation✅ Yes❌ Skipped✅ YesCustom
Use caseStandardDatabase/APIDTOs/DecouplingValue objects
Streaming partialsObjectObjectObjectObject
Streaming finalObjectArrayTarget classCustom object

Common Patterns

Pattern 1: Conditional Deserialization

Inspect data before creating objects:
$data = (new StructuredOutput)
    ->withResponseClass(User::class)
    ->intoArray()
    ->with(messages: 'Extract user')
    ->get();

// Choose class based on data
if ($data['age'] < 18) {
    $user = new MinorUser(...$data);
} else {
    $user = new AdultUser(...$data);
}
// @doctest id="62e9"

Pattern 2: Data Enrichment

Add computed fields to arrays:
$data = (new StructuredOutput)
    ->withResponseClass(Person::class)
    ->intoArray()
    ->with(messages: 'Extract person')
    ->get();

// Add computed field
$data['full_name'] = $data['first_name'] . ' ' . $data['last_name'];
$data['age_group'] = $data['age'] < 30 ? 'young' : 'senior';

// Then create object
$person = new Person(...$data);
// @doctest id="e88a"

Pattern 3: Multi-Layer Architecture

Separate schema from application layer:
// Domain layer - rich schema for LLM
class OrderDomain {
    public string $orderId;
    public CustomerInfo $customer;
    public array $items;
    public PaymentDetails $payment;
    public ShippingInfo $shipping;
}

// Application layer - simplified DTO
class OrderDTO {
    public function __construct(
        public string $orderId,
        public string $customerName,
        public float $total,
    ) {}
}

$order = (new StructuredOutput)
    ->withResponseClass(OrderDomain::class)  // Rich domain model
    ->intoInstanceOf(OrderDTO::class)        // Simple application DTO
    ->with(messages: 'Extract order details')
    ->get();
// @doctest id="e42f"

Important Notes

Schema is Always Respected

The output format only changes how data is returned to you. The LLM always receives the full schema:
$data = (new StructuredOutput)
    ->withResponseClass(UserProfile::class)  // LLM sees all 5 fields
    ->intoArray()                             // You get array, but...
    ->get();

// Schema sent to LLM still includes all 5 fields from UserProfile
// Only the deserialization step is different
// @doctest id="d8dc"

Validation Behavior

  • intoArray(): Validation is skipped (arrays can’t be validated like objects)
  • intoInstanceOf(): Validation runs on the target class
  • intoObject(): Validation is custom (depends on implementation)

Backward Compatibility

Default behavior is unchanged. Output formats are opt-in:
// still works
$user = (new StructuredOutput)
    ->withResponseClass(User::class)
    ->get();
// Returns: User object (default behavior)

// new capability
$data = (new StructuredOutput)
    ->withResponseClass(User::class)
    ->intoArray()  // ← New
    ->get();
// Returns: array
// @doctest id="e3c6"

Examples

See working examples in:
  • examples/A05_Extras/OutputFormatArray/run.php - Basic intoArray() usage
  • examples/A05_Extras/OutputFormatInstanceOf/run.php - Different output class
  • examples/A05_Extras/OutputFormatStreaming/run.php - Streaming with arrays

Pluggable Extraction

InstructorPHP uses a pluggable extraction pipeline to convert raw LLM responses into canonical arrays. You can customize this pipeline for special formats or implement custom extraction logic.

Default Content Extractors

The default ResponseExtractor uses an extractor chain (tried in order):
ExtractorDescription
DirectJsonExtractorParse content directly as JSON
ResilientJsonExtractorHandle malformed JSON (trailing commas, etc.)
MarkdownBlockExtractorExtract from ```json ``` blocks
BracketMatchingExtractorFind first { to last }
SmartBraceExtractorHandle escaped quotes in strings

Custom Extractors

Replace the default extractors with your own:
use Cognesy\Instructor\Extraction\Extractors\DirectJsonExtractor;
use Cognesy\Instructor\Extraction\Extractors\MarkdownBlockExtractor;

$result = (new StructuredOutput)
    ->withExtractors(
        new DirectJsonExtractor(),       // Only these extractors
        new MarkdownBlockExtractor(),
    )
    ->withResponseClass(User::class)
    ->with(messages: 'Extract user')
    ->get();
// @doctest id="1040"

Custom Extractor Implementation

Create your own extractor for special formats:
use Cognesy\Instructor\Extraction\Contracts\CanExtractContent;
use Cognesy\Utils\Result\Result;

class XmlJsonExtractor implements CanExtractContent
{
    public function extract(string $content): Result
    {
        // Extract JSON from <data>...</data> tags
        if (preg_match('/<data>(.*?)<\/data>/s', $content, $matches)) {
            return Result::success($matches[1]);
        }
        return Result::failure('No data tags found');
    }

    public function name(): string
    {
        return 'xml_json';
    }
}

// Use custom extractor
$result = (new StructuredOutput)
    ->withExtractors(
        new XmlJsonExtractor(),
        new DirectJsonExtractor(), // Fallback
    )
    ->withResponseClass(User::class)
    ->with(messages: 'Extract user')
    ->get();
// @doctest id="758f"

Custom Response Extractor

For complete control over the extraction pipeline, implement CanExtractResponse:
use Cognesy\Instructor\Extraction\Contracts\CanExtractResponse;
use Cognesy\Polyglot\Inference\Data\InferenceResponse;
use Cognesy\Polyglot\Inference\Enums\OutputMode;
use Cognesy\Utils\Result\Result;

class CustomExtractor implements CanExtractResponse
{
    public function extract(InferenceResponse $response, OutputMode $mode): Result
    {
        $content = $response->content();

        // Custom extraction logic
        $data = $this->parseCustomFormat($content);

        return Result::success($data);
    }

    private function parseCustomFormat(string $content): array
    {
        // Your custom parsing logic
        return ['parsed' => 'data'];
    }
}

// Use custom extractor
$result = (new StructuredOutput)
    ->withExtractor(new CustomExtractor())
    ->withResponseClass(User::class)
    ->with(messages: 'Extract user')
    ->get();
// @doctest id="046d"

When to Customize Extraction

Use withExtractors() when:
  • You want to optimize for known response formats
  • You need to add support for additional formats
  • You want to change the extractor order
  • Custom extractors are automatically used for both sync and streaming
Use withExtractor() when:
  • You need completely custom extraction logic
  • You’re integrating with a non-standard LLM response format
  • You want to bypass the extractor chain entirely