Data model
Instructor provides several ways the data model of LLM response.
Using classes
The default way is to use PHP classes to define the data model. You can also use PHPDoc comments to specify the types of fields of the response. Additionally, you can use attributes to provide more context to the language model or to provide additional instructions to the model.
Type Hints
Use PHP type hints to specify the type of extracted data.
Use nullable types to indicate that given field is optional.
Instructor will only fill in the fields that are public. Private and protected fields are ignored and their values are not going to be extracted (they will be left empty, with default values set as defined in your class).
Private vs public object field
Instructor only sets public fields of the object with the data provided by LLM. Private and protected fields are left unchanged. If you want to access them directly after extraction, consider providing default values for them.
See examples/PrivateVsPublicFields/run.php
to check the details on the behavior
of extraction for classes with private and public fields.
DocBlock type hints
You can also use PHP DocBlock style comments to specify the type of extracted data. This is useful when you want to specify property types for LLM, but can’t or don’t want to enforce type at the code level.
See PHPDoc documentation for more details on DocBlock: https://docs.phpdoc.org/3.0/guide/getting-started/what-is-a-docblock.html#what-is-a-docblock
Using DocBlocks as Additional Instructions for LLM
You can use PHP DocBlocks (/** */) to provide additional instructions for LLM at class or field level, for example to clarify what you expect or how LLM should process your data.
Instructor extracts PHP DocBlocks comments from class and property defined and includes them in specification of response model sent to LLM.
Using PHP DocBlocks instructions is not required, but sometimes you may want to clarify your intentions to improve LLM’s inference results.
Attributes for data model descriptions and instructions
Instructor supports #[Description]
and #[Instructions]
attributes to provide more
context to the language model or to provide additional instructions to the model.
#[Description]
attribute is used to describe a class or property in your data model.
Instructor will use this text to provide more context to the language model.
#[Instructions]
attribute is used to provide additional instructions to the language
model, such as how to process the data.
You can add multiple attributes to a class or property - Instructor will merge them into a single block of text.
Instructor will still include any PHPDoc comments provided in the class, but using attributes might be more convenient and easier to read.
Typed Collections / Arrays
PHP currently does not support generics or typehints to specify array element types.
Use PHP DocBlock style comments to specify the type of array elements.
Example of complex data extraction
Instructor can retrieve complex data structures from text. Your response model can contain nested objects, arrays, and enums.
Dynamic data schemas with Structure
class
In case you work with dynamic data schemas, you can use Structure
class to define the data model.
See Structures for more details on how to work with dynamic data schemas.