Week 3 — Large Language Models
Large Language Models (LLMs) have transformed data science. This week, you will learn how to interact with LLMs effectively — from crafting better prompts to extracting structured data and calling external tools. These skills are the foundation for building AI-powered applications.
Overview
This week covers the practical aspects of working with LLMs. You will learn prompt engineering techniques, how to get structured JSON output from LLMs, how to extract data from documents, how to give LLMs access to external tools, and how to use embeddings for semantic search.
Pages
| # | Page | Topic |
|---|---|---|
| 1 | Prompt Engineering | Zero-shot, few-shot, chain-of-thought, role prompting |
| 2 | Structured Output | JSON mode, Pydantic + LLM, instructor library |
| 3 | LLM Extraction | Named entities, tables from PDFs |
| 4 | Function Calling | Tool use, parallel tools, tool choice |
| 5 | LLM CLI | llm CLI tool, plugins, templates |
| 6 | Embeddings | Cosine similarity, embedding models |
| 7 | Prompt Caching | Cost + latency reduction via cache-friendly prompts |
| 8 | Structured Output with instructor | Schema-first extraction and validation loops |
Learning Outcomes
By the end of this week, you will be able to:
- Apply prompt engineering techniques (zero-shot, few-shot, chain-of-thought) to get better LLM outputs
- Extract structured JSON data from LLM responses using Pydantic schemas
- Use LLMs to extract named entities and structured data from unstructured text and PDFs
- Implement function calling to give LLMs access to external tools and APIs
- Use the
llmCLI tool for quick LLM interactions and template-based workflows - Generate and compare text embeddings for semantic similarity search
Time estimate
Expect to spend 8-10 hours on this week's material: ~3 hours reading, ~3 hours on walkthroughs, and ~3 hours on the lab.