Skip to main content
Instructor provides several ways the data model of LLM response.

Using classes

The default way is to use PHP classes to define the data model. You can also use PHPDoc comments to specify the types of fields of the response. Additionally, you can use attributes to provide more context to the language model or to provide additional instructions to the model.

Type Hints

Use PHP type hints to specify the type of extracted data.
Use nullable types to indicate that given field is optional.
<?php

class Person {
    public string $name;
    public ?int $age;
    public Address $address;
}
// @doctest id="d306"
Instructor will only fill in the fields that are public. Private and protected fields are ignored and their values are not going to be extracted (they will be left empty, with default values set as defined in your class).

Private vs public object field

Instructor only sets public fields of the object with the data provided by LLM. Private and protected fields are left unchanged, unless the class has setter methods defined or there are parameters in the constructor that match the field names. Provide default values for the fields that are not set by Instructor, to avoid unexpected behavior when accessing those fields. See:
  • examples/A01_Basics/BasicPrivateVsPublicFields/run.php to check the details on the behavior of extraction for classes with private and public fields,
  • examples/A01_Basics/BasicGetSet/run.php to see how Instructor uses getter and setter methods,
  • examples/A01_Basics/BasicConstructor/run.php to see how Instructor uses constructor parameters.

DocBlock type hints

You can also use PHP DocBlock style comments to specify the type of extracted data. This is useful when you want to specify property types for LLM, but can’t or don’t want to enforce type at the code level.
<?php

class Person {
    /** @var string */
    public $name;
    /** @var int */
    public $age;
    /** @var Address $address person's address */
    public $address;
}
// @doctest id="fd76"
See PHPDoc documentation for more details on DocBlock: https://docs.phpdoc.org/3.0/guide/getting-started/what-is-a-docblock.html#what-is-a-docblock

Using DocBlocks as Additional Instructions for LLM

You can use PHP DocBlocks (/** */) to provide additional instructions for LLM at class or field level, for example to clarify what you expect or how LLM should process your data. Instructor extracts PHP DocBlocks comments from class and property defined and includes them in specification of response model sent to LLM. Using PHP DocBlocks instructions is not required, but sometimes you may want to clarify your intentions to improve LLM’s inference results.
    /**
     * Represents a skill of a person and context in which it was mentioned. 
     */
    class Skill {
        public string $name;
        /** @var SkillType $type type of the skill, derived from the description and context */
        public SkillType $type;
        /** Directly quoted, full sentence mentioning person's skill */
        public string $context;
    }
// @doctest id="8310"

Attributes for data model descriptions and instructions

Instructor supports #[Description] and #[Instructions] attributes to provide more context to the language model or to provide additional instructions to the model. #[Description] attribute is used to describe a class or property in your data model. Instructor will use this text to provide more context to the language model. #[Instructions] attribute is used to provide additional instructions to the language model, such as how to process the data. You can add multiple attributes to a class or property - Instructor will merge them into a single block of text. Instructor will still include any PHPDoc comments provided in the class, but using attributes might be more convenient and easier to read.
<?php
#[Description("Information about user")]
class User {
    #[Description("User's age")]
    public int $age;
    #[Instructions("Make it ALL CAPS")]
    public string $name;
    #[Description("User's job")]
    #[Instructions("Ignore hobbies, identify profession")]
    public string $job;
}
// @doctest id="1c87"
NOTE: Technically both #[Description] and #[Instructions] attributes do the same thing - they provide additional context to the language model. Yet, providing them in separate attributes allows you to better organize your code and make it more readable. In the future, we may extend the functionality of these attributes to provide more specific instructions to the language model, so it is a good idea to use them now.

Typed Collections / Arrays

PHP currently does not support generics or typehints to specify array element types. Use PHP DocBlock style comments to specify the type of array elements.
<?php
class Person {
    // ...
}

class Event {
    // ...
    /** @var Person[] list of extracted event participants */
    public array $participants;
    // ...
}
// @doctest id="7c1a"

Example of complex data extraction

Instructor can retrieve complex data structures from text. Your response model can contain nested objects, arrays, and enums.
<?php
use Cognesy/Instructor/Instructor;

// define a data structures to extract data into
class Person {
    public string $name;
    public int $age;
    public string $profession;
    /** @var Skill[] */
    public array $skills;
}

class Skill {
    public string $name;
    public SkillType $type;
}

enum SkillType : string {
    case Technical = 'technical';
    case Other = 'other';
}

$text = "Alex is 25 years old software engineer, who knows PHP, Python and can play the guitar.";

$person = (new StructuredOutput)->with(
    messages: [['role' => 'user', 'content' => $text]],
    responseModel: Person::class,
)->get(); // client is passed explicitly, can specify e.g. different base URL

// data is extracted into an object of given class
assert($person instanceof Person); // true

// you can access object's extracted property values
echo $person->name; // Alex
echo $person->age; // 25
echo $person->profession; // software engineer
echo $person->skills[0]->name; // PHP
echo $person->skills[0]->type; // SkillType::Technical
// ...

var_dump($person);
// Person {
//     name: "Alex",
//     age: 25,
//     profession: "software engineer",
//     skills: [
//         Skill {
//              name: "PHP",
//              type: SkillType::Technical,
//         },
//         Skill {
//              name: "Python",
//              type: SkillType::Technical,
//         },
//         Skill {
//              name: "guitar",
//              type: SkillType::Other
//         },
//     ]
// }
// @doctest id="6dd0"

Dynamic data schemas with Structure class

In case you work with dynamic data schemas, you can use Structure class to define the data model. See Structures for more details on how to work with dynamic data schemas.

Optional data with Maybe class

The Maybe class provides a way to handle optional data that may or may not be present in the input text. It wraps a value type and indicates whether the data was found or not, along with an error message when the data is missing.

Basic Usage

<?php
use Cognesy\Instructor\StructuredOutput;
use Cognesy\Instructor\Extras\Maybe\Maybe;

class Person {
    public string $name;
    public int $age;
}

$maybe = Maybe::is(Person::class, 'person', 'Person data if found in the text');

$result = (new StructuredOutput)
    ->with(
        messages: "The document mentions some information but no person details.",
        responseModel: $maybe,
    )
    ->get();

if ($result->hasValue()) {
    $person = $result->get();
    echo "Found person: " . $person->name;
} else {
    echo "No person found. Error: " . $result->error();
}
// @doctest id="42e1"

Maybe Methods

  • Maybe::is(class, name?, description?) - Static factory method to create a Maybe instance
  • get() - Get the value if present, or null if not found
  • error() - Get the error message explaining why the value wasn’t found
  • hasValue() - Check if a value was successfully extracted
  • toJsonSchema() - Generate JSON schema for the Maybe wrapper
I