Skip to main content
toolsSource-backedReview first Safety Privacy

Cua Computer-Use Agents

MIT-licensed infrastructure for computer-use agents: background desktop drivers, MCP server support, Python SDKs, local/cloud sandboxes, macOS, Windows, Linux, Android, Cua Bench, GUI automation skills, and Lume virtualization for agents that see, click, type, and verify real desktops.

by TryCua·added 2026-06-18·
HarnessCLI
Review first review before installing

Open the source and read safety notes before installing.

Safety notes

  • Cua can give agents eyes and hands on a computer: screenshots, clicks, typing, dragging, shell commands, window control, file paths, and desktop automation. Treat it as high-impact automation.
  • The README documents remote shell and PowerShell installer commands for Cua Driver and Lume. Review script contents, pin versions where possible, and avoid blind execution on sensitive machines.
  • The GUI automation skill includes form fuzzing examples and shell/file actions; use only on authorized apps, test environments, or isolated sandboxes.
  • Host-machine mode and background desktop control can interact with real apps without stealing focus. Keep host consent, allowlists, window targeting, and stop controls explicit.
  • Cua Bench, Cua sandboxes, Lume, Docker, QEMU, cloud VMs, Windows sandbox, Android images, and BYOI environments need resource limits, cleanup, network policy, and credential isolation.
  • Optional third-party components have separate license and safety implications; the README calls out Kasm, OmniParser, and optional `cua-agent[omni]` dependencies.

Privacy notes

  • Computer-use sessions can capture screenshots, typed text, window titles, app contents, browser pages, files, clipboard-like content, shell output, paths, downloads, and user workflows.
  • Cua trajectories are recorded under `~/.cua/trajectories/...`; sharing trajectories can upload or expose screenshots, actions, commands, and replayable workflow details.
  • Cloud sandboxes, Cua cloud, E2B-like backends, Docker registries, Lume images, MCP clients, model providers, and AI annotation features may process or store prompts, screenshots, tool actions, and environment metadata.
  • Do not use host or cloud computer-control flows on confidential documents, customer systems, authenticated browser sessions, secrets, payments, destructive admin panels, or regulated data unless policy explicitly allows it.

Prerequisites

  • Python 3.12 or newer for the `cua` meta-package, with package-specific Python requirements checked for `cua-agent`, `cua-computer`, `cua-computer-server`, `cua-mcp-server`, and related packages.
  • A target runtime choice: local desktop, cloud Cua sandbox, Docker/container, QEMU VM, Lume macOS VM, Windows sandbox, Android, or bring-your-own image.
  • Explicit approval and scoping before giving an agent host desktop control, shell access, file access, screenshots, keyboard input, mouse input, or host-machine access.
  • Review of the Cua Driver install scripts before running curl-to-shell or PowerShell bootstrap commands from the README.
  • API keys, model/provider credentials, sandbox credentials, and trajectory-sharing policy when using AI annotations, cloud sandboxes, hosted services, or replay links.

Schema details

Install type
cli
Troubleshooting
No
Source repository stats
Scope
Source repo
Collection metadata
Estimated setup
45 minutes
Difficulty
advanced
Tool listing metadata
Pricing
free
Disclosure
editorial
Application category
DeveloperApplication
Operating system
macOS, Windows, Linux, Android
Full copyable content
pip install cua

# Cua Driver MCP registration after installing and reviewing the driver:
claude mcp add --transport stdio cua-driver -- cua-driver mcp

About this resource

Overview

Cua is open-source infrastructure for computer-use agents: agents that see screens, click buttons, type text, run tools, and verify UI state across real desktop or sandboxed environments. The repository includes Cua Drivers, Python SDK packages, a Cua MCP server package, Cua Bench, a GUI automation skill, and Lume virtualization for macOS/Linux VMs on Apple Silicon.

Use Cua when a team needs more than browser automation: native desktop app control, sandboxed OS environments, GUI task benchmarks, or an MCP-compatible driver that lets agent clients control a desktop in the background.

Install

For the Python SDK meta-package:

pip install cua

After installing and reviewing Cua Driver, the README documents this Claude Code MCP registration:

claude mcp add --transport stdio cua-driver -- cua-driver mcp

The README also documents remote installer scripts for Cua Driver and Lume. Review those scripts before running them, especially on a workstation with private files, authenticated apps, browser sessions, or production credentials.

Agent Capabilities

Area Cua Coverage
Cua Driver Background computer-use driver for macOS and Windows, with Linux pre-release support
MCP Driver and cua-mcp-server package for MCP-compatible agent clients
Cua SDK Python API for sandboxes, screenshots, shell commands, mouse, keyboard, and mobile gestures
Sandboxes Local and cloud Linux, macOS, Windows, Android, Docker/container, VM, and bring-your-own-image paths
Cua Bench Benchmarks and reinforcement-learning environments for OSWorld, ScreenSpot, Windows Arena, and custom tasks
Lume macOS/Linux VM management on Apple Silicon using Apple's Virtualization.Framework
Skills gui-automation Agent Skill for screenshot, click, type, drag, shell, zoom, window, and trajectory workflows

Use Cases

  • Give Claude Code, Codex, Cursor, OpenClaw, or another agent client a background desktop-control MCP driver.
  • Build computer-use agents that operate across Linux, macOS, Windows, Android, Docker, or VM environments.
  • Run UI automation against native apps, not only web pages.
  • Benchmark agents on computer-use tasks and export trajectories for training.
  • Use Lume to create macOS/Linux VMs on Apple Silicon for agent testing.
  • Package GUI automation skills for repeatable QA and operator workflows.

Source Review

Verified on 2026-06-18:

  • The upstream repository describes Cua as infrastructure to build, benchmark, and deploy agents that use computers.
  • The README describes Cua Drivers, Cua sandboxes, Cua Bench, Lume, Python packages, and Cua Driver MCP registration.
  • The README states that Cua Drivers support background computer-use on macOS and Windows, with Linux as a pre-release backend.
  • The README lists support for local and cloud Linux, macOS, Windows, Android, Docker/container, VM, and bring-your-own image workflows.
  • libs/cua-driver/README.md documents MCP over stdio and a Claude Code computer-use compatibility mode.
  • libs/python/mcp-server/pyproject.toml declares the cua-mcp-server package for Computer-Use Agent MCP.
  • skills/gui-automation/SKILL.md documents screenshot, click, type, drag, shell, window, zoom, provider, and trajectory-sharing workflows.
  • PyPI resolves cua version 0.1.6, cua-agent version 0.8.2, and cua-computer version 0.5.19.
  • The latest GitHub release is computer-v0.5.19, published on 2026-06-18.
  • The repository license is MIT, while the README notes separate third-party component licenses for Kasm, OmniParser, and optional cua-agent[omni] dependencies.

Safety and Privacy

Cua is powerful because it can connect agents to real or sandboxed computers. That also makes it easy to accidentally expose private screens, authenticated sessions, credentials, local files, browser state, app data, and production systems. Start in a sandbox, confirm the active target, and avoid host-machine control until consent, scope, and auditability are clear.

The GUI automation skill records trajectories and can share replay links. Treat screenshots, actions, shell output, paths, and typed text as sensitive unless the session was deliberately created for public demo or benchmark use.

Duplicate Check

Checked current content/tools/, content/agents/, content/mcp/, content/skills/, guides, open pull requests, and repository-wide content for trycua/cua, Cua, Cua Driver, Cua MCP driver, Cua Bench, Lume, computer-use agent SDK, computer-use agents, desktop automation agent, and Cua GUI automation skill. No dedicated Cua tools entry, exact source URL duplicate, target file, or open duplicate PR was found.

Disclosure

Editorial listing. No paid placement or affiliate link is used. Cua is MIT-licensed open-source software; Cua cloud services, model providers, MCP clients, cloud sandboxes, Docker, QEMU, Lume, Windows sandbox, Android images, trajectory hosting, and optional third-party components may have separate licenses, billing, terms, privacy controls, and access requirements.

Source citations

Add this badge to your README

Show that Cua Computer-Use Agents is listed on HeyClaude. Paste this Markdown into your README — it renders the badge and links back to this page.

Listed on HeyClaude
[![Listed on HeyClaude](https://heyclau.de/badge/tools/cua-computer-use-agents.svg)](https://heyclau.de/entry/tools/cua-computer-use-agents)

How it compares

Cua Computer-Use Agents side by side with 3 alternatives on trust, install, platform support, and disclosed safety notes — all from reviewed registry metadata.

FieldCua Computer-Use Agents

MIT-licensed infrastructure for computer-use agents: background desktop drivers, MCP server support, Python SDKs, local/cloud sandboxes, macOS, Windows, Linux, Android, Cua Bench, GUI automation skills, and Lume virtualization for agents that see, click, type, and verify real desktops.

Open dossier
Browser Harness

MIT-licensed CDP browser-control harness from Browser Use that lets Claude Code, Codex, and other coding agents connect to a real or cloud Chrome browser, use screenshots and coordinate clicks, edit task-specific helpers, and optionally learn reusable domain skills for web automation workflows.

Open dossier
Skills CLI

MIT-licensed `skills` CLI from Vercel Labs for installing, using, finding, listing, updating, removing, and initializing Agent Skills across Claude Code, Codex, Cursor, OpenCode, OpenClaw, Gemini CLI, GitHub Copilot, Windsurf, Zed, and dozens of other agent hosts.

Open dossier
Cherry Studio

Cross-platform AI desktop client with multiple LLM providers, local model support, 300+ assistants, document and image handling, WebDAV backup, MCP server support, mini programs, and enterprise deployment options.

Open dossier
Trust
Install riskReview firstReview firstReview firstReview first
Notes Safety Privacy Safety Privacy Safety Privacy Safety Privacy
Categorytoolstoolstoolstools
Sourcesource-backedsource-backedsource-backedsource-backed
AuthorTryCuaBrowser UseVercel LabsCherryHQ
Added2026-06-182026-06-182026-06-182026-06-18
Platforms
CLI
CodexCLI
CursorCodexCLI
CLI
Source repo
Safety notesCua can give agents eyes and hands on a computer: screenshots, clicks, typing, dragging, shell commands, window control, file paths, and desktop automation. Treat it as high-impact automation. The README documents remote shell and PowerShell installer commands for Cua Driver and Lume. Review script contents, pin versions where possible, and avoid blind execution on sensitive machines. The GUI automation skill includes form fuzzing examples and shell/file actions; use only on authorized apps, test environments, or isolated sandboxes. Host-machine mode and background desktop control can interact with real apps without stealing focus. Keep host consent, allowlists, window targeting, and stop controls explicit. Cua Bench, Cua sandboxes, Lume, Docker, QEMU, cloud VMs, Windows sandbox, Android images, and BYOI environments need resource limits, cleanup, network policy, and credential isolation. Optional third-party components have separate license and safety implications; the README calls out Kasm, OmniParser, and optional `cua-agent[omni]` dependencies.Browser Harness can connect agents to a real logged-in Chrome profile. Remote debugging may expose active sessions, extensions, bookmarks, history, page content, downloads, uploads, and account actions to the agent. The documented Way 1 setup uses the user's everyday Chrome profile through `chrome://inspect/#remote-debugging`; require explicit user consent before attaching to sensitive accounts. The documented Way 2 setup launches Chrome with a non-default `--user-data-dir` and remote debugging port; keep that isolated profile separate from everyday browser data. Remote Browser Use Cloud sessions require `BROWSER_USE_API_KEY`, may use proxies, can persist profile state, and can continue billing until timeout or shutdown. Agents using Browser Harness can edit `agent-workspace/agent_helpers.py` and optional domain-skill files; review generated helper code and public skill contributions before reuse. Browser automation can submit forms, send messages, purchase items, scrape websites, change account settings, and upload files. Keep destructive or account-writing tasks behind confirmation.Agent Skills are executable instructions for coding agents. Inspect `SKILL.md` and supporting files before installing or using skills from unknown repositories. `skills add`, `skills update`, `skills remove`, and `experimental_sync` can write, replace, symlink, copy, or remove skill folders across many local agent directories. Review `--agent`, `--skill`, `--all`, `--global`, and `--yes` flags before running broad operations. `skills use` can materialize a skill into a temporary directory and print the generated prompt, or start a supported agent interactively with that prompt. Treat untrusted skill text as prompt-bearing code. Symlink install mode keeps a canonical copy and links agent directories to it. Copy mode creates independent copies. Choose deliberately when working across shared repos, Windows environments, containers, or synchronized directories. The CLI includes explicit warnings for OpenClaw community skills in `skills use`; do not bypass those warnings unless you understand the trust model for the selected source. The security audit lookup is best-effort and never blocks installation. A missing or safe-looking audit result is not a substitute for reviewing the skill source.Cherry Studio is a desktop AI client that can connect to multiple cloud providers, local model servers, MCP servers, mini programs, document parsers, backup services, and enterprise backends; review each integration before adding sensitive data. MCP server support can expose model-callable tools. Only connect servers you trust, and scope file, shell, browser, SaaS, and write-capable tools carefully. Document and image processing can read local files and generate derived text, charts, summaries, or code blocks that may persist in app state or backups. WebDAV backup and sync can move local conversation or document state to a remote storage provider; verify endpoint, encryption, retention, and restore behavior. The README describes Enterprise Edition and private deployment options; confirm licensing, access control, data backup, and team management requirements before rollout.
Privacy notesComputer-use sessions can capture screenshots, typed text, window titles, app contents, browser pages, files, clipboard-like content, shell output, paths, downloads, and user workflows. Cua trajectories are recorded under `~/.cua/trajectories/...`; sharing trajectories can upload or expose screenshots, actions, commands, and replayable workflow details. Cloud sandboxes, Cua cloud, E2B-like backends, Docker registries, Lume images, MCP clients, model providers, and AI annotation features may process or store prompts, screenshots, tool actions, and environment metadata. Do not use host or cloud computer-control flows on confidential documents, customer systems, authenticated browser sessions, secrets, payments, destructive admin panels, or regulated data unless policy explicitly allows it.Browser Harness workflows can expose page screenshots, DOM text, URLs, cookies-backed login state, account data, downloads, uploads, form inputs, and extracted website data to the agent and configured model providers. Profile sync for Browser Use Cloud is documented as cookies-only, but it still moves browser authentication material into a remote browser environment. Cloud browser live URLs, proxy settings, profile identifiers, daemon logs, `/tmp` socket or pid files, and copied support artifacts may reveal browsing activity or account context. Public domain-skill PRs should not include secrets, private selectors tied to confidential apps, customer data, screenshots, credentials, tokens, or personal browsing history.By default, the CLI can send telemetry to `add-skill.vercel.sh` unless `DISABLE_TELEMETRY` or `DO_NOT_TRACK` is set. Telemetry fields in source include CLI version, CI flag, detected agent name, event type, source, selected skills, selected agents, global flag, source type, update counts, find query, and result counts. Security-audit lookup requests can send the skill source and selected skill slugs to the audit endpoint. Local project and global installs can persist source names, selected skills, agent targets, canonical paths, lock data, symlinks, and copied skill contents on disk. Skill contents used through `skills use` are embedded into the generated prompt and may be sent to the downstream model provider or interactive agent process.Prompts, model responses, local documents, images, Office files, PDFs, assistant settings, topic history, MCP tool arguments, WebDAV backups, provider keys, and logs may contain sensitive data. Cloud model providers, AI web services, local model servers, MCP servers, WebDAV endpoints, mini programs, and enterprise services may receive data depending on configuration. Keep provider API keys, WebDAV credentials, enterprise endpoints, local model URLs, MCP config, document contents, and exported chats out of public prompts, screenshots, issues, and examples. For team use, define which models, assistants, MCP servers, backups, knowledge bases, and enterprise admin controls are approved.
Prerequisites
  • Python 3.12 or newer for the `cua` meta-package, with package-specific Python requirements checked for `cua-agent`, `cua-computer`, `cua-computer-server`, `cua-mcp-server`, and related packages.
  • A target runtime choice: local desktop, cloud Cua sandbox, Docker/container, QEMU VM, Lume macOS VM, Windows sandbox, Android, or bring-your-own image.
  • Explicit approval and scoping before giving an agent host desktop control, shell access, file access, screenshots, keyboard input, mouse input, or host-machine access.
  • Review of the Cua Driver install scripts before running curl-to-shell or PowerShell bootstrap commands from the README.
  • Python 3.11 or newer, uv, git, and a durable local checkout for editable installation.
  • A Chrome or Chromium-based browser that can be attached through Chrome remote debugging, or a Browser Use Cloud API key for cloud browsers.
  • Codex, Claude Code, or another agent host that can read the Browser Harness `SKILL.md` instructions.
  • A clear boundary for which browser profile, logged-in sites, cloud browser sessions, downloads, uploads, and account actions the agent may access.
  • Node.js 18 or newer for the published `skills` npm package.
  • At least one supported agent host installed if using auto-detected targets, such as Claude Code, Codex, Cursor, OpenCode, OpenClaw, Gemini CLI, GitHub Copilot, Windsurf, Zed, or another supported agent.
  • A reviewed skill source from GitHub, GitLab, a git URL, a local path, a direct skill folder, or another supported provider.
  • A decision between project-scoped skills that live under the current repository and global skills that live under the user's home/config directories.
  • Windows, macOS, or Linux desktop environment.
  • Model provider credentials for cloud services, or local Ollama / LM Studio setup for local model use.
  • A review of AGPL-3.0 community edition terms and any Enterprise Edition terms before organization-wide use.
  • WebDAV credentials only if file backup and sync are needed.
Install
pip install cua
git clone https://github.com/browser-use/browser-harness && cd browser-harness && uv tool install -e .
npm install -g skills
Download the current Cherry Studio desktop release for your operating system from GitHub Releases.
Config
{
  "projectInstall": "npx skills add vercel-labs/agent-skills --skill frontend-design -a claude-code",
  "globalInstall": "npx skills add vercel-labs/agent-skills --skill frontend-design -g -a claude-code -y",
  "temporaryUse": "npx skills use vercel-labs/agent-skills@web-design-guidelines | claude",
  "disableTelemetry": "DISABLE_TELEMETRY=1 npx skills list"
}
Citations
ClaimUnclaimedUnclaimedUnclaimedUnclaimed

Signals

Loading live community signals…

More like this, weekly

A short, calm digest of reviewed Claude resources. Unsubscribe any time.