Extraction Pipeline
When processing an LLM response, Instructor tries multiple extraction strategies in order until one succeeds.1. Direct JSON Parsing
The response content is parsed directly as JSON. This handles the common case where the LLM returns a well-formed JSON object.2. Markdown Code Block Extraction
Extracts JSON from fenced code blocks. Some providers (particularly Claude) tend to wrap JSON responses in markdown.json and markers
4. Smart Brace Matching
Handles complex cases with nested braces and escaped quotes inside string values. // @doctest id=“5099”Resilient Parsing
After extraction, if standardjson_decode fails, Instructor applies automatic repairs before parsing:
- Balance quotes — adds missing closing quotes
- Remove trailing commas — fixes
{"a": 1,}patterns - Balance braces — adds missing
}or]characters
Default Extractors
The built-in extractor chain includes these extractors, tried in order:| Extractor | Purpose |
|---|---|
DirectJsonExtractor | Parse content directly as JSON |
ResilientJsonExtractor | Handle malformed JSON (trailing commas, unbalanced braces) |
MarkdownBlockExtractor | Extract from ` |
| // @doctest id=“b5c5” | |
json ` blocks | |
BracketMatchingExtractor | Find first { to last } |
SmartBraceExtractor | Handle nested braces and escaped quotes in strings |
Custom Extractors
You can replace the default extractor with your own by callingwithExtractor() on the StructuredOutputRuntime. Use ResponseExtractor::fromExtractors() to compose multiple extractors into a chain.
Using Custom Extractors
Custom extractors are configured on the runtime and apply to both synchronous and streaming responses.ExtractionException, the next extractor in the chain is attempted. If all extractors fail, Instructor returns an empty result, triggers a validation error, and initiates the retry mechanism (if configured).
Error Handling
When extraction fails across all strategies, Instructor follows this sequence:- Returns an empty array from the extraction pipeline
- Triggers a validation error on the deserialized object
- If retries are configured, sends the error feedback to the LLM for self-correction
- Repeats until the retry limit is reached or extraction succeeds