Specialized Agents
General agents are jack-of-all-trades. Specialized agents excel at one specific modality.
Browser Agents (Playwright)
Browser agents navigate the web, click buttons, and read DOMs. They use visual bounding boxes or simplified HTML representations to understand pages.
Code Execution Agents
These agents write Python/Bash and execute it in a sandbox. They rely on REPL environments to iterate on code until it works.
Image Analysis Agents
Agents utilizing models like GPT-4o or Claude 3.5 Sonnet to inspect charts, UI screenshots, or physical documents to extract structured data.