Agent Eyes

Visual observability. Provides DOM indexing and local VLM inference.

Architecture & Role

Agent Eyes allows the ecosystem to reason about visual state. It parses UI layouts into DOM indexes, captures Playwright screenshots, computes structural pixel diffs via SSIM, and runs native, local LLaVA vision-language inference to identify visual regressions.

Core Capabilities

DOM Indexing

SSIM Pixel Diffs

Local LLaVA VLM

Playwright Automation