Overview
This example demonstrates how to extract structured data from a web page and get
it as PHP object.
Example
In this example we will be extracting list of Laravel companies from The Manifest
website. The result will be a list of Company
objects.
We use Webpage extractor to get the content of the page and specify ‘none’ scraper,
which means that we will be using built-in file_get_contents
function to get the
content of the page.
In production environment you might want to use one of the supported scrapers:
browsershot
scrapingbee
scrapfly
jinareader
Commercial scrapers require API key, which can be set in the configuration file
(/config/web.php
).
<?php
$loader = require 'vendor/autoload.php';
$loader->add('Cognesy\\Instructor\\', __DIR__ . '../../src/');
use Cognesy\Instructor\Enums\Mode;
use Cognesy\Instructor\Features\Schema\Attributes\Instructions;
use Cognesy\Instructor\Instructor;
use Cognesy\Instructor\Utils\Web\Webpage;
class Company {
public string $name = '';
public string $location = '';
public string $description = '';
public int $minProjectBudget = 0;
public string $companySize = '';
#[Instructions('Remove any tracking parameters from the URL')]
public string $websiteUrl = '';
public array $clients = [];
}
$instructor = (new Instructor)->withConnection('openai');
$companyGen = Webpage::withScraper('none')
->get('https://themanifest.com/pl/software-development/laravel/companies?page=1')
->cleanup()
->select('.directory-providers__list')
->selectMany(
selector: '.provider-card',
callback: fn($item) => $item->asMarkdown(),
limit: 3
);
$companies = [];
foreach($companyGen as $companyDiv) {
$company = $instructor->respond(
messages: $companyDiv,
responseModel: Company::class,
mode: Mode::Json
);
$companies[] = $company;
dump($company);
}
assert(count($companies) === 3);
?>