Context cache structured oai responses

Overview

Instructor offers a simplified way to work with LLM providers’ APIs supporting caching, so you can focus on your business logic while still being able to take advantage of lower latency and costs.

Example

When you need to process multiple requests with the same context, you can use context caching to improve performance and reduce costs. In our example we will be analyzing the README.md file of this Github project and generating its structured description for multiple audiences. Let’s start by defining the data model for the project details and the properties that we want to extract or generate based on README file.

<?php
require 'examples/boot.php';

use Cognesy\Instructor\StructuredOutput;
use Cognesy\Instructor\StructuredOutputRuntime;
use Cognesy\Instructor\Enums\OutputMode;
use Cognesy\Polyglot\Inference\LLMProvider;
use Cognesy\Schema\Attributes\Description;
class Project {
    public string $name;
    public string $targetAudience;
    /** @var string[] */
    #[Description('Technology platform and libraries used in the project')]
    public array $technologies;
    /** @var string[] */
    #[Description('Target audience domain specific features and capabilities of the project')]
    public array $features;
    /** @var string[] */
    #[Description('Target audience domain specific applications and potential use cases of the project')]
    public array $applications;
    #[Description('Explain the purpose of the project and the target audience domain specific problems it solves')]
    public string $description;
    #[Description('Target audience domain specific example code in Markdown demonstrating an application of the library')]
    public string $code;
}
?>

We read the content of the README.md file and cache the context, so it can be reused for multiple requests.

<?php
$content = file_get_contents(__DIR__ . '/../../../README.md');
$cacheKey = 'context_cache_structured_oai_responses_readme_v1';

$cachedSystem = 'Your goal is to respond questions about the project described in the README.md file'
    . "\n\n# README.md\n\n" . $content;
$cachedPrompt = 'Respond with strict JSON object using schema:' . "\n<|json_schema|>";
$jsonSchemaRuntime = StructuredOutputRuntime::fromProvider(LLMProvider::using('openai-responses'))
    ->withOutputMode(OutputMode::JsonSchema);
?>

At this point we can use Instructor structured output processing to extract the project details from the README.md file into the Project data model. Let’s start by asking the user to describe the project for a specific audience: P&C insurance CIOs.

<?php
/** @var array<string, mixed> $optionsBase */
$optionsBase = [
    'max_output_tokens' => 4096,
    // Improve cache routing for identical prefixes between requests.
    'prompt_cache_key' => $cacheKey,
    'prompt_cache_retention' => 'in_memory',
];

// get StructuredOutputResponse object to get access to usage and other metadata
$response1 = (new StructuredOutput($jsonSchemaRuntime))
    ->withCachedContext(
        system: $cachedSystem,
        prompt: $cachedPrompt,
    )
    ->with(
        messages: 'Describe the project in a way compelling to my audience: P&C insurance CIOs.',
        responseModel: Project::class,
        model: 'gpt-4.1',
        options: $optionsBase,
    )->create();

// get processed value - instance of Project class
$project1 = $response1->get();
dump($project1);
assert($project1 instanceof Project);
assert($project1->name !== '');
assert($project1->description !== '');

// Extract response id for Responses API server-side chaining
$body1 = json_decode($response1->inferenceResponse()->responseData()->body(), true);
$previousResponseId = is_array($body1) ? ($body1['id'] ?? '') : '';
if ($previousResponseId === '') {
    echo "Warning: previous_response_id not available; chaining will be skipped.\n";
}

// get usage information from inferenceResponse() when you need transport-level metadata
$usage1 = $response1->inferenceResponse()->usage();
echo "Usage #1: {$usage1->inputTokens} prompt tokens, {$usage1->cacheWriteTokens} cache write tokens\n";
?>

Now we can use the same context to ask the user to describe the project for a different audience: boutique CMS consulting company owner. OpenAI API can reuse the cached prefix from the previous request to provide the response, which results in faster processing and lower costs when prompt caching applies.

<?php
// get StructuredOutputResponse object to get access to usage and other metadata
$options2 = $optionsBase;
if ($previousResponseId !== '') {
    $options2['previous_response_id'] = $previousResponseId;
}

$response2 = (new StructuredOutput($jsonSchemaRuntime))
    ->withCachedContext(
        system: $cachedSystem,
        prompt: $cachedPrompt,
    )
    ->with(
        messages: "Describe the project in a way compelling to my audience: boutique CMS consulting company owner.",
        responseModel: Project::class,
        model: 'gpt-4.1',
        options: $options2,
    )->create();

// get the processed value - instance of Project class
$project2 = $response2->get();
dump($project2);
assert($project2 instanceof Project);
assert($project2->name !== '');
assert($project2->description !== '');

// get usage information from inferenceResponse() when you need transport-level metadata
$usage2 = $response2->inferenceResponse()->usage();
echo "Usage #2: {$usage2->inputTokens} prompt tokens, {$usage2->cacheReadTokens} cache read tokens\n";
if ($usage2->cacheReadTokens === 0) {
    echo "Note: cacheReadTokens is 0. Cache hits depend on model/provider eligibility.\n";
}
?>

Context cache structured oai Custom config

​Overview

​Example

Overview

Example