
This example demonstrates how to extract structured data from a web page and get it as PHP object.


In this example we will be extracting list of Laravel companies from The Manifest website. The result will be a list of Company objects.

We use Webpage extractor to get the content of the page and specify ‘none’ scraper, which means that we will be using built-in file_get_contents function to get the content of the page.

In production environment you might want to use one of the supported scrapers:

  • browsershot
  • scrapingbee
  • scrapfly
  • jinareader

Commercial scrapers require API key, which can be set in the configuration file (/config/web.php).

require 'examples/boot.php';

use Cognesy\Auxiliary\Web\Webpage;
use Cognesy\Instructor\Features\Schema\Attributes\Instructions;
use Cognesy\Instructor\Instructor;
use Cognesy\Polyglot\LLM\Enums\Mode;

class Company {
    public string $name = '';
    public string $location = '';
    public string $description = '';
    public int $minProjectBudget = 0;
    public string $companySize = '';
    #[Instructions('Remove any tracking parameters from the URL')]
    public string $websiteUrl = '';
    /** @var string[] */
    public array $clients = [];

$instructor = (new Instructor)->withConnection('openai');

$companyGen = Webpage::withScraper('scrapfly')
        selector: '.provider-card',
        callback: fn($item) => $item->asMarkdown(),
        limit: 3

$companies = [];
foreach($companyGen as $companyDiv) {
    $company = $instructor->respond(
        messages: $companyDiv,
        responseModel: Company::class,
        mode: Mode::Json
    $companies[] = $company;

assert(count($companies) === 3);