Rate Limits

Provider rate limits can cause request failures during high traffic periods.

Symptoms

Error messages containing “rate limit exceeded,” “too many requests,” or “quota exceeded”
HTTP status code 429

Solutions

Implement Retry Logic: Add automatic retries with exponential backoff

// @doctest id="0802"
<?php
use Cognesy\Polyglot\Inference\Inference;
use Cognesy\Http\Exceptions\HttpRequestException;

function withRetry(callable $fn, int $maxRetries = 3): mixed {
    $attempt = 0;
    $lastException = null;

    while ($attempt < $maxRetries) {
        try {
            return $fn();
        } catch (HttpRequestException $e) {
            $lastException = $e;
            $attempt++;

            // Only retry on rate limit errors
            if (strpos($e->getMessage(), 'rate limit') === false &&
                $e->getCode() !== 429) {
                throw $e;
            }

            if ($attempt >= $maxRetries) {
                break;
            }

            // Exponential backoff
            $sleepTime = (2 ** $attempt);
            echo "Rate limit hit. Retrying in $sleepTime seconds...\n";
            sleep($sleepTime);
        }
    }

    throw $lastException;
}

// Usage
$inference = new Inference();

try {
    $response = withRetry(function() use ($inference) {
        return $inference->with(
            messages: 'What is the capital of France?'
        )->get();
    });

    echo "Response: $response\n";
} catch (HttpRequestException $e) {
    echo "All retry attempts failed: " . $e->getMessage() . "\n";
}

Request Throttling: Limit the rate of requests from your application

// @doctest id="e00e"
<?php
class RateLimiter {
    private $lastRequestTime = 0;
    private $requestsPerMinute;
    private $minTimeBetweenRequests;

    public function __construct(int $requestsPerMinute = 60) {
        $this->requestsPerMinute = $requestsPerMinute;
        $this->minTimeBetweenRequests = 60 / $requestsPerMinute;
    }

    public function waitIfNeeded(): void {
        $currentTime = microtime(true);
        $timeSinceLastRequest = $currentTime - $this->lastRequestTime;

        if ($timeSinceLastRequest < $this->minTimeBetweenRequests) {
            $waitTime = $this->minTimeBetweenRequests - $timeSinceLastRequest;
            usleep($waitTime * 1000000);
        }

        $this->lastRequestTime = microtime(true);
    }
}

// Usage
$limiter = new RateLimiter(30); // 30 requests per minute
$inference = new Inference();

for ($i = 0; $i < 10; $i++) {
    $limiter->waitIfNeeded();
    $response = $inference->with(
        messages: "This is request $i"
    )->toText();
    echo "Response $i: $response\n";
}

Request Batching: Combine multiple requests into batches when possible

// @doctest id="7103"
<?php
// Instead of making many small requests
$responses = [];
foreach ($questions as $question) {
    // This would hit rate limits quickly
    $responses[] = $inference->with(messages: $question)->get();
}

// Better: Use a context-aware batch approach
$batchedQuestions = "Please answer the following questions:\n";
foreach ($questions as $i => $question) {
    $batchedQuestions .= ($i + 1) . ". $question\n";
}

$batchResponse = $inference->with(messages: $batchedQuestions)->get();
// Then parse the batch response into individual answers

Upgrade API Plan: Consider upgrading to a higher tier with increased rate limits

Polyglot

Polyglot \ Essentials

Polyglot \ Streaming

Polyglot \ Embeddings

Polyglot \ Output Modes

Polyglot \ Advanced

Polyglot \ Troubleshooting

Polyglot \ Internals

Symptoms

Solutions

Polyglot

Polyglot \ Essentials

Polyglot \ Streaming

Polyglot \ Embeddings

Polyglot \ Output Modes

Polyglot \ Advanced

Polyglot \ Troubleshooting

Polyglot \ Internals

​Symptoms

​Solutions

Symptoms

Solutions