Skip to main content

Lab 10: Instructor Extraction Loops

Difficulty: Intermediate · Estimated time: ~2–4 hours

Objective

Build a robust structured-extraction microservice:

  • input: messy text (emails, reviews, OCR snippets)
  • output: validated JSON that matches a Pydantic schema
  • behavior: automatic retries on validation errors

Requirements

  • Use pydantic models as the contract
  • Use instructor (or an equivalent schema-first mechanism)
  • Add a small test set (at least 20 samples)

Suggested schema

python
class Invoice(BaseModel):
invoice_id: str
vendor: str
date: str
currency: str
total: float
line_items: list[LineItem]

Deliverables

  • schemas.py (Pydantic models)
  • extract.py (extraction function)
  • tests/test_extraction.py with 20 fixtures
  • README.md describing how to run + limitations

Stretch goals

  • Add a “repair only” mode: feed validation errors back to the model
  • Add metrics: validation failure rate + retry count histogram