The Embeddings Class
Polyglot provides the Embeddings
class as the primary interface for generating and working with vector embeddings.
Creating an Embeddings Instance
<?php
use Cognesy\Polyglot\Embeddings\Embeddings;
// Create a basic embeddings instance with default settings
$embeddings = new Embeddings();
// Create an embeddings instance with a specific connection
$embeddings = new Embeddings('openai');
// Alternative method to specify connection
$embeddings = (new Embeddings())->withConnection('openai');
Key Methods
The Embeddings
class provides several important methods:
create()
: Generates embeddings for input text
withConnection()
: Specifies which connection to use
withConfig()
: Sets a custom configuration
withHttpClient()
: Specifies a custom HTTP client
withModel()
: Overrides the default model
findSimilar()
: Finds documents similar to a query
Generating Embeddings
The core functionality of the Embeddings
class is to transform text into vector representations.
Basic Embedding Generation
<?php
use Cognesy\Polyglot\Embeddings\Embeddings;
$embeddings = new Embeddings();
$result = $embeddings->create('The quick brown fox jumps over the lazy dog.');
// Get the vector values from the first (and only) result
$vector = $result->first()->values();
echo "Generated a vector with " . count($vector) . " dimensions.\n";
Embedding Multiple Texts
You can generate embeddings for multiple texts in a single request, which is more efficient than making separate requests:
<?php
use Cognesy\Polyglot\Embeddings\Embeddings;
$embeddings = new Embeddings();
$documents = [
"The quick brown fox jumps over the lazy dog.",
"Machine learning models can process text into vector representations.",
"Embeddings capture semantic relationships between words and documents."
];
$result = $embeddings->create($documents);
// Get all vectors
$vectors = $result->all();
foreach ($vectors as $index => $vector) {
echo "Document " . ($index + 1) . " has a vector with " . count($vector->values()) . " dimensions.\n";
}
Accessing Embedding Results
The create()
method returns an EmbeddingsResponse
object with several useful methods:
<?php
use Cognesy\Polyglot\Embeddings\Embeddings;
$embeddings = new Embeddings();
$result = $embeddings->create('Sample text for embedding');
// Get the first vector
$firstVector = $result->first();
// Get the last vector (useful when processing multiple inputs)
$lastVector = $result->last();
// Get all vectors
$allVectors = $result->all();
// Get all vector values as a simple array of arrays
$valuesArray = $result->toValuesArray();
// Get usage information
$usage = $result->usage();
echo "Input tokens: " . $usage->input() . "\n";
echo "Output tokens: " . $usage->output() . "\n";
echo "Total tokens: " . $usage->total() . "\n";
Working with Vector Objects
Each vector in the response is represented by a Vector
object with its own methods:
<?php
use Cognesy\Polyglot\Embeddings\Embeddings;
$embeddings = new Embeddings();
$result = $embeddings->create('Sample text for embedding');
$vector = $result->first();
// Get vector values
$values = $vector->values();
// Get vector ID (index)
$id = $vector->id();
// Compare with another vector
$otherVector = $result->create('Another text for comparison')->first();
$similarity = $vector->compareTo($otherVector, 'cosine');
Working with Different Providers
Polyglot supports multiple embedding providers, each with their own strengths and characteristics.
Switching Between Providers
<?php
use Cognesy\Polyglot\Embeddings\Embeddings;
// Compare embeddings from different providers
$text = "Artificial intelligence is transforming industries worldwide.";
// OpenAI embeddings
$openaiEmbeddings = new Embeddings('openai');
$openaiResult = $openaiEmbeddings->create($text);
echo "OpenAI embedding dimensions: " . count($openaiResult->first()->values()) . "\n";
// Cohere embeddings
$cohereEmbeddings = new Embeddings('cohere1');
$cohereResult = $cohereEmbeddings->create($text);
echo "Cohere embedding dimensions: " . count($cohereResult->first()->values()) . "\n";
// Mistral embeddings
$mistralEmbeddings = new Embeddings('mistral');
$mistralResult = $mistralEmbeddings->create($text);
echo "Mistral embedding dimensions: " . count($mistralResult->first()->values()) . "\n";
Provider-Specific Options
Different providers may support additional options for embedding generation:
<?php
use Cognesy\Polyglot\Embeddings\Embeddings;
// Example with OpenAI-specific options
$openaiEmbeddings = new Embeddings('openai');
$response = $openaiEmbeddings->create(
input: ["Sample text for embedding"],
options: [
'encoding_format' => 'float', // Get float values instead of base64
'dimensions' => 512, // Request a specific vector size (if supported)
]
);
// Example with Cohere-specific options
$cohereEmbeddings = new Embeddings('cohere1');
$response = $cohereEmbeddings->create(
input: ["Sample text for embedding"],
options: [
'input_type' => 'classification', // Cohere-specific option
'truncate' => 'END', // How to handle texts that exceed the token limit
]
);
Models and Dimensions
Different embedding models produce vectors of different dimensions:
<?php
use Cognesy\Polyglot\Embeddings\Embeddings;
use Cognesy\Polyglot\Embeddings\Data\EmbeddingsConfig;
// Create custom configuration with a specific model
$config = new EmbeddingsConfig(
apiUrl: 'https://api.openai.com/v1',
apiKey: getenv('OPENAI_API_KEY'),
endpoint: '/embeddings',
model: 'text-embedding-3-large', // Use the larger model
dimensions: 3072, // Specify expected dimensions
);
$embeddings = new Embeddings();
$embeddings->withConfig($config);
$response = $embeddings->create("Test text for large embedding model");
echo "Vector dimensions: " . count($response->first()->values()) . "\n";