Lab 10: Instructor Extraction Loops
Difficulty: Intermediate · Estimated time: ~2–4 hours
Objective
Build a robust structured-extraction microservice:
- input: messy text (emails, reviews, OCR snippets)
- output: validated JSON that matches a Pydantic schema
- behavior: automatic retries on validation errors
Requirements
- Use
pydanticmodels as the contract - Use
instructor(or an equivalent schema-first mechanism) - Add a small test set (at least 20 samples)
Suggested schema
python
class Invoice(BaseModel):
invoice_id: str
vendor: str
date: str
currency: str
total: float
line_items: list[LineItem]
Deliverables
schemas.py(Pydantic models)extract.py(extraction function)tests/test_extraction.pywith 20 fixturesREADME.mddescribing how to run + limitations
Stretch goals
- Add a “repair only” mode: feed validation errors back to the model
- Add metrics: validation failure rate + retry count histogram