Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.instructorphp.com/llms.txt

Use this file to discover all available pages before exploring further.

LLMs are powerful, but their outputs are unpredictable. Instructor solves this.

The Problem

You’ve integrated an LLM into your PHP application. Now what?
<?php
// Typical LLM integration without Instructor
$response = $openai->chat([
    'messages' => [['role' => 'user', 'content' => 'Extract the person name and age from: "John is 25"']]
]);

$text = $response['choices'][0]['message']['content'];
// $text = "The person's name is John and they are 25 years old."
// or "Name: John, Age: 25"
// or "{ name: 'John', age: 25 }"
// or something else entirely...

// Now you need to:
// 1. Parse this somehow
// 2. Handle all possible formats
// 3. Validate the data
// 4. Handle errors
// 5. Retry on failure
// 6. Hope it works
The result? Fragile code, inconsistent data, and endless edge cases.

The Solution

Instructor gives you structured, validated, type-safe outputs:
<?php
class Person {
    public string $name;
    public int $age;
}

$person = (new StructuredOutput)
    ->withResponseClass(Person::class)
    ->withMessages('John is 25')
    ->get();

// Always a Person object
// Always with string $name
// Always with int $age
// Validated automatically
// Retries on failure

How It Works

Instructor uses a three-step process:
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Define    │ ──▶ │   Extract   │ ──▶ │   Validate  │
│  PHP Class  │     │   via LLM   │     │  & Return   │
└─────────────┘     └─────────────┘     └─────────────┘
  1. Define - You create a PHP class with typed properties
  2. Extract - Instructor sends your schema to the LLM with optimized prompts
  3. Validate - Results are validated; failures trigger automatic retry with feedback

Key Benefits

1. Type Safety

Your IDE understands the response. Autocomplete works. Static analysis catches errors.
<?php
$person = (new StructuredOutput)->withResponseClass(Person::class)->get();

// IDE knows $person->name is a string
// IDE knows $person->age is an int
// Typos like $person->naem are caught immediately

2. Automatic Validation

Use Symfony Validator constraints. Invalid responses trigger automatic retry:
<?php
class Person {
    #[Assert\NotBlank]
    #[Assert\Length(min: 2, max: 100)]
    public string $name;

    #[Assert\Range(min: 0, max: 150)]
    public int $age;
}

// If LLM returns age: -5, Instructor:
// 1. Detects validation failure
// 2. Sends error feedback to LLM
// 3. Requests corrected response
// 4. Repeats until valid or max retries

3. Self-Correcting Retries

LLMs make mistakes. Instructor handles this gracefully:
<?php
$person = (new StructuredOutput)
    ->withResponseClass(Person::class)
    ->withMessages($text)
    ->withMaxRetries(3)  // Try up to 3 times
    ->get();
On validation failure, Instructor tells the LLM exactly what went wrong:
"Validation failed: age must be greater than 0. Please correct and try again."

4. Provider Independence

Write once, run anywhere. Switch LLM providers without changing your code:
<?php
use Cognesy\Instructor\StructuredOutputRuntime;

// Development: Use local Ollama
$result = StructuredOutput::using('ollama')->withResponseClass(Task::class)->get();

// Staging: Use Groq for speed
$result = StructuredOutput::using('groq')->withResponseClass(Task::class)->get();

// Production: Use OpenAI for quality
$result = StructuredOutput::using('openai')->withResponseClass(Task::class)->get();

5. Multiple Output Modes

Works with any model capability:
ModeBest ForHow It Works
ToolsOpenAI, ClaudeUses function/tool calling
JsonSchemaGPT-4, newer modelsStrict JSON Schema mode
JsonMost modelsJSON response format
MdJsonAny modelPrompting-based extraction

6. Streaming Support

Get partial results as they arrive:
<?php
use Cognesy\Instructor\StructuredOutput;

$stream = (new StructuredOutput)
    ->withResponseClass(Person::class)
    ->with(messages: $text, options: ['stream' => true])
    ->stream();

foreach ($stream->partials() as $partial) {
    echo "Processing: " . ($partial->name ?? '...') . "\n";
}

$person = $stream->finalValue();

7. Multimodal Inputs

Process text, images, and chat conversations with the same API:
<?php
use Cognesy\Addons\Image\Image;

// Text
->withMessages("Extract from this text...")

// Images
->with(messages: Image::fromFile('receipt.jpg')->toMessage())
->withPrompt("Extract line items")

// Chat history
->withMessages([
    ['role' => 'system', 'content' => 'You extract data'],
    ['role' => 'user', 'content' => 'Process this...']
])

Comparison

Without Instructor

<?php
$response = $client->chat(['messages' => [...]]);
$json = json_decode($response['choices'][0]['message']['content'], true);

if (json_last_error() !== JSON_ERROR_NONE) {
    // Handle JSON parse error
    // Try to extract with regex?
    // Log and retry?
}

if (!isset($json['name']) || !is_string($json['name'])) {
    // Handle missing/invalid field
}

if (!isset($json['age']) || !is_int($json['age'])) {
    // Handle missing/invalid field
}

if ($json['age'] < 0) {
    // Handle validation error
    // Retry somehow?
}

$person = new Person();
$person->name = $json['name'];
$person->age = $json['age'];

With Instructor

<?php
$person = (new StructuredOutput)
    ->withResponseClass(Person::class)
    ->withMessages($text)
    ->get();
Same result. Zero boilerplate.

Why Not Just Use JSON Mode / JSON Schema?

“But OpenAI has response_format: json_object and strict JSON Schema mode now. Why do I need Instructor?” Good question. Here’s what you’re still stuck with:

1. Provider Inconsistency

Every provider does it differently:
ProviderJSON ModeJSON SchemaTool Calling
OpenAIresponse_format: {type: "json_object"}response_format: {type: "json_schema", ...}Yes
Anthropic❌ No native support❌ No native supportYes (different format)
GeminiDifferent API entirelyDifferent API entirelyYes (different format)
MistralPartial supportNoYes
OllamaModel-dependentModel-dependentModel-dependent
With raw APIs: You write different code for each provider. With Instructor: One API. Instructor picks the best extraction method automatically.
<?php
// Same code works everywhere
$result = StructuredOutput::using('anthropic')->withResponseClass(Person::class)
    ->get();

2. No Object Hydration

JSON Schema gives you… JSON. Not objects.
<?php
// OpenAI with JSON Schema
$response = $openai->chat([
    'messages' => [...],
    'response_format' => [
        'type' => 'json_schema',
        'json_schema' => [
            'name' => 'person',
            'schema' => [
                'type' => 'object',
                'properties' => [
                    'name' => ['type' => 'string'],
                    'age' => ['type' => 'integer'],
                ],
                'required' => ['name', 'age'],
            ],
        ],
    ],
]);

$json = json_decode($response['choices'][0]['message']['content'], true);
// $json = ['name' => 'John', 'age' => 25]

// Now you manually hydrate:
$person = new Person();
$person->name = $json['name'];
$person->age = $json['age'];
// For nested objects? More manual work.
// For arrays of objects? Even more.
With Instructor: Direct to typed objects, including nested structures.
<?php
$person = (new StructuredOutput)
    ->withResponseClass(Person::class)
    ->get();
// $person is already a Person object

3. Schema Definition Hell

JSON Schema is verbose and lives separately from your code:
<?php
// JSON Schema approach - 20+ lines for a simple object
$schema = [
    'type' => 'object',
    'properties' => [
        'name' => [
            'type' => 'string',
            'description' => 'The person\'s full name',
            'minLength' => 1,
        ],
        'age' => [
            'type' => 'integer',
            'description' => 'Age in years',
            'minimum' => 0,
            'maximum' => 150,
        ],
        'email' => [
            'type' => 'string',
            'format' => 'email',
            'description' => 'Contact email',
        ],
    ],
    'required' => ['name', 'age'],
    'additionalProperties' => false,
];
With Instructor: Your PHP class IS the schema.
<?php
class Person {
    /** The person's full name */
    #[Assert\NotBlank]
    public string $name;

    /** Age in years */
    #[Assert\Range(min: 0, max: 150)]
    public int $age;

    #[Assert\Email]
    public string|null $email;
}
Schema and validation rules in one place. IDE autocomplete. Type checking. Refactoring support.

4. No Validation Beyond Types

JSON Schema validates structure, not business logic:
// JSON Schema says this is valid:
{ "name": "", "age": -5, "email": "not-an-email" }
// All correct types! But completely useless data.
With Instructor: Full validation with Symfony constraints.
<?php
class Person {
    #[Assert\NotBlank]
    #[Assert\Length(min: 2)]
    public string $name;  // Empty string? Rejected.

    #[Assert\Positive]
    public int $age;  // Negative? Rejected.

    #[Assert\Email]
    public string $email;  // Invalid format? Rejected.
}

5. No Retry Mechanism

JSON Schema mode fails silently or throws. You handle recovery:
<?php
// What happens when the LLM returns invalid JSON despite schema?
try {
    $response = $openai->chat([...]);
    $json = json_decode($response['choices'][0]['message']['content'], true);
} catch (Exception $e) {
    // Now what?
    // Retry with same prompt? Probably same error.
    // Modify the prompt? How?
    // Log and give up?
}
With Instructor: Automatic retry with error feedback.
<?php
$person = (new StructuredOutput)
    ->withResponseClass(Person::class)
    ->withMaxRetries(3)
    ->get();

// On failure, Instructor tells the LLM:
// "Validation failed: 'age' must be positive. You returned -5. Please correct."
// LLM tries again with that context.

6. No Streaming Support for Structured Data

JSON Schema mode gives you complete-or-nothing:
<?php
// Can't do this with raw JSON Schema mode:
// - Show partial results as they arrive
// - Update UI progressively
// - Stream array items one by one
With Instructor: Full streaming with partial updates.
<?php
use Cognesy\Instructor\StructuredOutput;

$stream = (new StructuredOutput)
    ->withResponseClass(Person::class)
    ->with(messages: $text, options: ['stream' => true])
    ->stream();

foreach ($stream->partials() as $partial) {
    updateUI($partial);
}

$person = $stream->finalValue();

7. Anthropic Doesn’t Have JSON Mode

Claude is one of the best models, but Anthropic has no native JSON mode:
<?php
// This doesn't exist for Anthropic:
$response = $anthropic->messages([
    'response_format' => ['type' => 'json_object'],  // ❌ Not supported
]);

// You're stuck with:
// - Prompt engineering ("respond only in JSON...")
// - Hoping it complies
// - Parsing whatever comes back
With Instructor: Works seamlessly with Claude.
<?php
$person = StructuredOutput::using('anthropic')->withResponseClass(Person::class)
    ->get();
// Instructor uses tool calling or optimized prompts automatically

8. The Real-World Comparison

CapabilityRaw JSON/JSON SchemaInstructor
Works with all providers❌ Different APIs✅ Unified
Object hydration❌ Manual✅ Automatic
Nested objects❌ Manual recursion✅ Automatic
Business validation❌ None✅ Full
Retry on failure❌ Manual✅ Automatic
Error feedback to LLM❌ None✅ Built-in
Streaming partials❌ Not possible✅ Supported
Type safety in IDE❌ None✅ Full
Schema = Code❌ Separate✅ Same file
Works with Claude❌ No JSON mode✅ Yes

The Bottom Line

JSON Schema mode is a step forward, but it’s a low-level primitive. You still need to:
  • Write provider-specific code
  • Manually deserialize to objects
  • Implement your own validation
  • Build your own retry logic
  • Handle streaming yourself
  • Maintain schemas separate from code
Instructor handles all of this. You define a PHP class and call ->get().

When to Use Instructor

Great for:
  • Extracting structured data from unstructured text
  • Building forms that accept natural language
  • Processing documents (invoices, resumes, contracts)
  • Content classification and tagging
  • Data transformation pipelines
  • Any task requiring reliable LLM output structure
Not designed for:
  • Open-ended creative writing
  • Tasks where free-form text is the desired output
  • Simple completions without structure requirements

The Instructor Family

Instructor exists in multiple languages with consistent APIs:
Ready to get started? Jump to the Getting Started Guide or explore the Cookbook for practical examples.