Week 06 — Web Scraping & Data Processing

"Data is the new oil, but you have to drill for it."

AI applications are only as good as the data they consume. This week focuses on extracting data from the messy, real-world internet and structuring it for LLMs.

Topics Covered

Playwright & Selenium
Crawl4AI
Firecrawl & Apify
Scrapy
Anti-bot Patterns
Scheduled Scraping
Document Parsing
DuckDB + Parquet
Firestore Database
Vision Models for Scraping
Image Processing Pipeline
Speech AI — TTS & STT
Video Understanding
LLM Architecture

Hands-On Labs & Capstones

Job Posting Scraper & Tracker (Capstone)
AI Signature Detection & Cropper (Capstone)
Live Multilingual Travel Translator (Capstone)
Scheduled Scraper with GitHub Actions (Lab)

Get ready to deal with captchas, broken DOMs, and massive datasets.

Topics Covered​

Hands-On Labs & Capstones​

Topics Covered

Hands-On Labs & Capstones