Dual-System GUI Agent | designpattern.fyi

Skip to main content

designpattern.fyi

The Blueprint OOP & Design Patterns

The Engine Algorithms & Data Structures

The Guardrails SOLID, DRY, Code Quality

Glossary Agentic AI Terminology

Agent Loop Autonomous AI Patterns

Agent Skills Knowledge Packaging

Agent Memory Persistent Context

Resource Discovery ARD Specification

Explainable AI (xAI) Healthcare XAI Framework

AI Adoption Principles Strategic AI Framework

Healthcare Lakehouse Cloud-Agnostic AI Architecture

Evolving Engineering in AI AI Engineering Disciplines

Ontological Engineering Patterns/anti-patterns for Ontological Engineering

Loop Engineering Engineering Patterns for Agent Loops

Fleet Engineering Agent Orchestration

Agentic Context Engineering Building Self-Improving AI Systems

Prompt Engineering English is a new programming language

Harness Engineering Designing everything around an AI model

Forward Deployed Engineering Shift left to accelerate tangible business impact

Feature Engineering Transforming Raw Data into Predictive Power

Agentic AI Patterns Patterns/anti-patterns for AI Agents

Cloud Architecture AWS, Azure, GCP, K8s

Microservices Distributed Systems

Event-Driven Async & Reactive

Enterprise Integration Message Patterns

Spec-Driven Development Development methodology for AI systems

Total Cost of Ownership Calculate and optimize AI implementation costs

Trade-offs System Decisions

Language Models LLM Patterns

Machine Learning MLOps Architecture

Data Science Data Pipelines

AI Token Economy Cost & Strategy

AI Security Threat Landscape & Risks

OWASP Security Top 10 Security Risks

OWASP LLM LLM Security Top 10

OWASP Agentic AI Agent Security Top 10

OWASP AIVSS AI Vulnerability Scoring System

OWASP Citizen Development Citizen Development Security

Data Protection Privacy & PII

OKF Specification Knowledge Format

Securing AI Agents GDM Safety Framework

Problem Solver Structured Problem Thinking

Statement Builder AI Coding Prompt Generator

Skills Builder Design Agent Skills

Prompt Engineering Interactive Prompt Workspace

Enterprise Pattern Cognitive Agent Patterns

Trip Planner Multi-Agent AI Pipeline

designpattern.fyi

Software Design Catalog

Agentic AI

Back to Catalog

Agentic AI Tool Use & Environment

Dual-System GUI Agent

Split a GUI agent into a decision model that plans and a grounding model that clicks — each optimized for its own job.

Intent & Description

🎯 Intent

Route planning and pixel-grounding to separate models that each handle their subproblem well.

📋 Context

You’re running a long multi-step GUI workflow — filling a multi-page form, booking a ride, confirming payment. You need both flexible high-level replanning (what to do when the form looks different than expected) and pixel-accurate click grounding. One model doing both underperforms on at least one.

💡 Solution

Define a clean intermediate vocabulary: the decision model emits high-level intents (“open the cart”, “swipe left to next item”) in a small typed vocabulary. The grounding model receives that intent plus the current screenshot and emits the concrete action (tap coordinates, key press). Decision model holds the plan and replans on failure; grounding model is stateless per action but specialized on screen interpretation.

Real-world Use Case

A single GUI model is dominated by either planning or grounding and underperforms on the other.
A clean intermediate vocabulary can express decisions for grounding.
Two specialized models are available and routing between them is feasible.

Source

View Original Source →

📌 TL;DR

Decision model plans. Grounding model clicks. Two specialized models working in sequence beat one generalist doing both.

Advantages

Each model is sized to its skill — total parameters smaller than a unified model.
Failure attribution is clean: planning problem vs. grounding problem.
Decision-model planning generalizes across desktop, web, and mobile; grounding model is per-surface.

Disadvantages

Two model calls per turn — latency and cost double.
The intermediate intent vocabulary is a real design problem; bad vocabulary = broken hand-off.
Hand-off mistakes (decision says X, grounding hears Y) are hard to debug.

286 of 329

Steer AGI - Your Codes Reflect!

© 2026 designpattern.fyi. Vibe Coded with ❤️ for modern software engineers by Dr. Amit Puri at OpenAGI