Full-Desktop Computer Use
Give the agent a full containerized OS desktop with native apps, a persistent filesystem, and credential stores — for workflows that span multiple apps.
Intent & Description
🎯 Intent
Handle multi-application desktop workflows that a browser-only surface can’t touch.
📋 Context
Your agent needs to download an invoice in a mail client, edit it in a spreadsheet, sign into a vendor portal through a password manager, then file the result locally. The apps have no shared API, some exist only on the desktop, and state must survive across steps.
💡 Solution
Provision a containerized desktop OS (e.g., Ubuntu with a lightweight window manager) preloaded with browser, mail client, editor, and terminal. The agent observes the screen and emits mouse/keyboard actions across the whole desktop. A mounted persistent filesystem retains downloads, installed packages, and intermediate artifacts. A desktop password manager supplies credentials and handles 2FA prompts.
Real-world Use Case
- The task spans multiple native desktop applications with no shared API.
- State (downloads, installed tools, logins) must persist across steps or sessions.
- The agent needs authenticated access through a desktop password manager, including 2FA.
Source
📌 TL;DR
Full containerized desktop = the agent can do anything a human can do at a computer, including multi-app workflows with persistent state and 2FA. Big power, big attack surface.
Advantages
- Handles workflows that span native desktop apps, not just web pages.
- Persistent filesystem and installed tooling carry state across steps and sessions.
- Desktop credential stores let the agent authenticate without hardcoded secrets.
Disadvantages
- A whole OS is slower and costlier to provision and snapshot than a single browser tab.
- Stored credentials and a persistent disk widen the blast radius if the agent is compromised or prompt-injected.
- Maintaining a desktop image (apps, drivers, window manager) is ongoing engineering work.