Skip to main content

Lab 10: Instructor Extraction Loops

Difficulty: Intermediate · Estimated time: ~2–4 hours

Objective

Build a robust structured-extraction microservice:

input: messy text (emails, reviews, OCR snippets)
output: validated JSON that matches a Pydantic schema
behavior: automatic retries on validation errors

Requirements

Use pydantic models as the contract
Use instructor (or an equivalent schema-first mechanism)
Add a small test set (at least 20 samples)

Suggested schema

python

class Invoice(BaseModel):
    invoice_id: str
    vendor: str
    date: str
    currency: str
    total: float
    line_items: list[LineItem]

Deliverables

schemas.py (Pydantic models)
extract.py (extraction function)
tests/test_extraction.py with 20 fixtures
README.md describing how to run + limitations

Stretch goals

Add a “repair only” mode: feed validation errors back to the model
Add metrics: validation failure rate + retry count histogram

Objective
Requirements
Suggested schema
Deliverables
Stretch goals
Related reading