Context caching improves performance by reusing parts of a conversation, reducing token usage and API costs. This is particularly useful for multi-turn conversations or when processing large documents.

Using Cached Context

Polyglot supports context caching through the withCachedContext() method:

<?php
use Cognesy\Polyglot\LLM\Inference;

// Create an inference object
$inference = new Inference()->withConnection('anthropic');

// Set up a conversation with cached context
$inference->withCachedContext(
    messages: [
        ['role' => 'system', 'content' => 'You are a helpful assistant who provides concise answers.'],
        ['role' => 'user', 'content' => 'I want to discuss machine learning concepts.'],
        ['role' => 'assistant', 'content' => 'Great! I\'d be happy to discuss machine learning concepts with you. What specific aspect would you like to explore?'],
    ]
);

// First query using the cached context
$response1 = $inference->create(
    messages: 'What is supervised learning?'
)->response();

echo "Response 1: " . $response1->content() . "\n";
echo "Tokens from cache: " . $response1->usage()->cacheReadTokens . "\n\n";

// Second query, still using the same cached context
$response2 = $inference->create(
    messages: 'And what about unsupervised learning?'
)->response();

echo "Response 2: " . $response2->content() . "\n";
echo "Tokens from cache: " . $response2->usage()->cacheReadTokens . "\n";

Provider Support for Context Caching

Different providers have varying levels of support for context caching:

  • Anthropic: Supports native context caching with explicit cache markers
  • OpenAI: Provides automatic caching for optimization, but not as explicit as Anthropic
  • Other providers: May not support native caching, but Polyglot still helps manage conversation state

Processing Large Documents with Cached Context

Context caching is particularly valuable when working with large documents:

<?php
use Cognesy\Polyglot\LLM\Inference;

// Load a large document
$documentContent = file_get_contents('large_document.txt');

// Set up cached context with the document
$inference = new Inference()->withConnection('anthropic');
$inference->withCachedContext(
    messages: [
        ['role' => 'system', 'content' => 'You will help analyze and summarize documents.'],
        ['role' => 'user', 'content' => 'Here is the document to analyze:'],
        ['role' => 'user', 'content' => $documentContent],
    ]
);

// Ask multiple questions about the document without resending it each time
$questions = [
    'Summarize the key points of this document in 3 bullets.',
    'What are the main arguments presented?',
    'Are there any contradictions or inconsistencies in the text?',
    'What conclusions can be drawn from this document?',
];

foreach ($questions as $index => $question) {
    $response = $inference->create(messages: $question)->response();

    echo "Question " . ($index + 1) . ": $question\n";
    echo "Answer: " . $response->content() . "\n";
    echo "Tokens from cache: " . $response->usage()->cacheReadTokens . "\n\n";
}