macos-use
$
npx mdskill add sonichi/sutando/macos-useControls macOS apps via accessibility API for non-interactive automation
- Solves GUI interaction tasks with macOS apps like Safari, Zoom, and Finder
- Uses mediar-ai's mcp-server-macos-use and macOS Accessibility API
- Drives actions like clicks, typing, and scrolling based on accessibility tree
- Enables automation in non-interactive Claude Code mode without holding locks
SKILL.md
.github/skills/macos-useView on GitHub ↗
--- name: macos-use description: "GUI control for macOS apps via mediar-ai's mcp-server-macos-use. Click, type, scroll, key-press, open apps — driven by accessibility tree, works in non-interactive Claude Code mode. Use this for any Sutando task that needs to drive another macOS application (Safari, Zoom, Mail, Finder, etc.)." user-invocable: false --- # macos-use Drive macOS applications from Claude Code via mediar-ai's [mcp-server-macos-use](https://github.com/mediar-ai/mcp-server-macos-use). A Swift MCP server that wraps the macOS Accessibility API. Unlike Claude's built-in `computer-use`, this works in non-interactive mode (which is how Sutando's proactive loop and task bridge run), does not hold a machine-wide lock, and does not require a Pro/Max subscription. ## When to use - "Open Safari and navigate to github.com" — anything requiring real GUI interaction with an app - "Click the Join button on the Zoom invite dialog" - "Type this into the Discord message box" - "Scroll the frontmost window to the bottom" - Any task that currently falls back to AppleScript + Quartz mouse events in `src/inline-tools.ts` Prefer this skill over: - `bash src/screen-capture.sh` — that captures screenshots; `macos-use` actually *interacts* - AppleScript `tell application` blocks — more reliable, better error handling - `cliclick` — lower-level, no accessibility context - Claude's built-in `computer-use` — that mode requires interactive sessions and holds a lock that contends with Sutando's own loop ## Tools exposed After install, these appear as `mcp__macos-use__*` in Claude Code: | Tool | Parameters | Purpose | |------|------------|---------| | `open_application_and_traverse` | `identifier` (name/bundle ID/path) | Launch or activate an app, return its a11y tree | | `click_and_traverse` | `pid`, `x`, `y` | Click at coordinates in a target app, return updated tree | | `type_and_traverse` | `pid`, `text` | Type into the frontmost element | | `press_key_and_traverse` | `pid`, `key` | Press a named key (Return, Tab, Escape, arrows, ...) | | `scroll_and_traverse` | `pid`, `direction`, `amount` | Scroll in a direction | | `refresh_traversal` | `pid` | Re-read the a11y tree without acting | Every tool returns an accessibility-tree snapshot of the target app — structured UI elements with roles, titles, positions, and identifiers. No pixels. Model reasons over the tree, not over screenshots. ## Install Two steps, one-time: ```bash # 1. Build the Swift binary (~35s) bash skills/macos-use/scripts/build.sh # 2. Register with Claude Code's MCP config (writes ~/.claude.json) bash skills/macos-use/scripts/install-mcp.sh # 3. Grant Accessibility permission # System Settings → Privacy & Security → Accessibility # Click +, navigate to ~/.macos-use-mcp/.build/release/mcp-server-macos-use, enable. ``` Restart Claude Code after install for the MCP tools to appear. ## Gotchas - **Swift 6 build fragility**: the `swift-sdk` transitive dep has data-race errors that Swift 6.3+ strict-concurrency trips on. `build.sh` uses `-Xswiftc -swift-version -Xswiftc 5` as a workaround. When upstream fixes this, remove the flag. - **Accessibility permission**: the binary must be added to System Settings → Privacy & Security → Accessibility, or every tool call will return "not authorized". First-run error is obvious; owner must click through once. - **Apps without good a11y trees**: Canvas / Electron / games degrade badly. For those, fall back to `screen-capture.sh` + Claude vision. - **Build dep pulled from GitHub**: air-gapped Macs won't work. No prebuilt releases yet. - **Multi-node**: each node builds its own binary. Not synced via `sutando-memory.git` (binaries are machine-specific). Run `build.sh` + `install-mcp.sh` on Mac Mini and MacBook separately. ## Quick self-test After install + restart: ``` Sutando, open Safari and navigate to https://github.com/sonichi/sutando ``` You should see Claude invoke `mcp__macos-use__open_application_and_traverse` with `identifier: "Safari"`, then `type_and_traverse` into the URL bar, then `press_key_and_traverse` with `Return`. ## Related - Research + decision memo: `notes/issue-65-computer-use-research.md` - Issue: [#65 Add Claude Computer Use support](https://github.com/sonichi/sutando/issues/65) - Upstream: https://github.com/mediar-ai/mcp-server-macos-use
More from sonichi/sutando
- agent-registryLocal Agent Registry — a standalone, dependency-free service that tracks running Claude Code (and other) agent instances. Agents self-register on startup and heartbeat while alive; the Electron overlay and Sutando dashboard read the live list. Use when you need to know which coding agents are running, where, and since when.
- bot2bot-postPost a coordination message from this bot to the shared bot2bot channel, @-mentioning the other Sutando node.
- claude-codexBash wrapper around the local Codex CLI for non-interactive runs from inside Sutando (bridges, cron, scripts). For interactive code review or task hand-off from this Claude Code session, prefer the official `/codex:*` plugin commands; this skill is the file-bridge-compatible path that `discord-bridge.py` invokes for team-tier sandboxed delegation.
- claude-geminiUse the local Gemini CLI from Claude Code with the user's existing Gemini authentication or API configuration. Use for large-context repo scans, multimodal analysis, second-opinion planning, or structured Gemini runs in the current workspace.
- claude-routerChoose between the local Codex CLI and Gemini CLI from Claude Code. Use for automatic model selection when the user wants the best local delegate for code review, repo-wide analysis, planning, or implementation.
- cross-node-syncRsync-over-ssh sync between Sutando nodes (Mac Studio and MacBook) for shared memory + notes. Optional — core runs fine without it; enables automatic cross-bot learning and note propagation by running from the proactive-loop cron on each pass.
- deal-finderScan configured sources (Craigslist now; eBay + Facebook Marketplace planned) for used-item listings matching the owner's criteria. Currently configured for a Mac mini search (M2+, 16GB+, 512GB+, ≤$500, near 94566). Notify owner via SMS + Telegram on a match.
- electron-overlay-dimmingReusable pattern for focus-based auto-dimming of Electron overlay windows — when the app loses focus, all overlay windows fade to a low opacity; when an overlay regains focus, they return to their configured opacity. Use when building always-on-top Electron overlays that should recede while the user works in other apps.
- gemini-ttsRender text to mp3 via Google Gemini Flash TTS. Free-tier eligible (1500 req/day). Use for video narration, demo voiceovers, audio notes. Parallels openai-tts; default for make-viral-video.
- macos-toolsmacOS native integrations: screen capture, calendar, reminders, contacts, email (Mail.app), Spotlight search. Use when the user asks about their screen, schedule, to-do list, contacts, or wants to send email on macOS.