App Exploration Phase
Before deploying against an opaque app, run an exploration phase to build a per-element knowledge base the agent retrieves at task time.
Intent & Description
🎯 Intent
Teach the agent what every button does before it has to act on any of them.
📋 Context
You need an agent to drive a mobile or desktop app with no public API and no accessibility labels that name its controls. The only way to learn what a control does is to click it and see what happens — and you’ll be running this agent many times.
💡 Solution
Split the lifecycle into two phases. (1) Exploration — the agent pokes around autonomously, or watches a human demo, and writes per-element docs: what it is, what it does, when to use it. Stored in a structured knowledge base. (2) Deployment — for each task, retrieve the relevant element docs via vector search, inject into context, then act. Refresh docs when the UI changes.
Real-world Use Case
- The agent must drive a GUI app with no API docs for its UI elements.
- The agent will run against the same app many times, so upfront exploration cost amortizes.
- UI element semantics are stable enough to document once.
Source
📌 TL;DR
Explore first, deploy second. Let the agent (or a human demo) document every UI element once, then retrieve that knowledge at task time instead of guessing blind.
Advantages
- Deployment-time actions are grounded in learned semantics, not guesses.
- One exploration run pays for itself across many user tasks.
- Human-demo mode makes onboarding a new app low-effort.
Disadvantages
- Exploration is expensive and offline — production tasks must wait or run on a stale KB.
- KB drift when the app updates; staleness is non-trivial to detect automatically.
- Deployment quality is ceiling’d by how good the exploration docs are.