Skip to main content

Specialized Agents

General agents are jack-of-all-trades. Specialized agents excel at one specific modality.

Browser Agents (Playwright)

Browser agents navigate the web, click buttons, and read DOMs. They use visual bounding boxes or simplified HTML representations to understand pages.

Code Execution Agents

These agents write Python/Bash and execute it in a sandbox. They rely on REPL environments to iterate on code until it works.

Image Analysis Agents

Agents utilizing models like GPT-4o or Claude 3.5 Sonnet to inspect charts, UI screenshots, or physical documents to extract structured data.