Architecture & Role
Agent Eyes allows the ecosystem to reason about visual state. It parses UI layouts into DOM indexes, captures Playwright screenshots, computes structural pixel diffs via SSIM, and runs native, local LLaVA vision-language inference to identify visual regressions.
Core Capabilities
DOM Indexing
SSIM Pixel Diffs
Local LLaVA VLM
Playwright Automation