Overview
Instructor uses a pluggable extraction system to parse structured content from LLM responses. Different LLMs and output modes may return content in various formats - wrapped in markdown, embedded in explanatory text, or with trailing commas. You can create custom extractors to handle specific response formats from your LLM or API. Extractors are tried in order until one succeeds.Built-in Extractors
Instructor provides these content extractors:DirectJsonExtractor- Parses content directly as JSON (fastest)BracketMatchingExtractor- Finds JSON by matching first{to last}MarkdownBlockExtractor- Extracts from markdown code blocksResilientJsonExtractor- Handles trailing commas, missing bracesSmartBraceExtractor- Smart brace matching with string escaping
Example: Custom XML Wrapper Extractor
Expected Output
Streaming with Custom Extractors
Custom extractors are automatically used for both sync and streaming modes. TheResponseExtractor handles buffer creation internally, using a subset
of extractors optimized for streaming (fast extractors by default).
Creating Your Own Extractor
ImplementCanExtractContent interface:
Extractor Chain Behavior
Extractors are tried in order until one succeeds:- First extractor is called with raw content
- If it returns
Result::success(), extraction is complete - If it returns
Result::failure(), next extractor is tried - If all fail, an error is raised