app-control
$
npx mdskill add vellum-ai/vellum-assistant/app-controlThis skill exposes the `app_control_*` proxy tools for driving a single named macOS application via raw input — keyboard, mouse, screenshot — that bypasses the system Accessibility tree. Use it only when explicitly directed to a specific app where the AX tree is unhelpful (emulators, games, OpenGL canvases, custom-rendered Electron apps). For general macOS UI navigation prefer the `computer-use` skill.
SKILL.md
.github/skills/app-controlView on GitHub ↗
---
name: app-control
description: Drive a specific named macOS app via raw input bypassing the Accessibility tree
compatibility: "Designed for Vellum personal assistants"
metadata:
emoji: "🎯"
vellum:
display-name: "App Control"
feature-flag: "app-control"
activation-hints:
- "User explicitly directs the assistant to drive a specific named app via raw input (emulator, game, OpenGL canvas, custom-rendered Electron app)"
- "User says the macOS Accessibility tree is unhelpful or empty for the target app"
avoid-when:
- "Task can be done via the computer-use skill (general macOS UI navigation)"
- "Task can be done via a CLI / API alternative"
---
This skill exposes the `app_control_*` proxy tools for driving a single
named macOS application via raw input — keyboard, mouse, screenshot — that
bypasses the system Accessibility tree. Use it only when explicitly directed
to a specific app where the AX tree is unhelpful (emulators, games, OpenGL
canvases, custom-rendered Electron apps). For general macOS UI navigation
prefer the `computer-use` skill.
Tools in this skill are proxy tools — execution is forwarded to the connected
macOS client, never handled locally by the assistant.
## Cadence
Take 2-3 actions per turn, then yield with a short narration so the user can
interject. Do not chain long sequences without surfacing what you are doing.
## Always observe before acting
Call `app_control_observe` before your first input action whenever the screen
state matters (e.g. you need to know what is on screen, where a UI element is,
or whether the app is even running). Re-observe after actions that may have
moved the window or changed visibility.
`observe` waits a short settle delay (default ~200ms) before capturing so the
target app and the WindowServer can flush pending input and composite a fresh
frame. If the captured screenshot looks one input behind the latest state
(common with emulators or other slow-feedback apps), pass a larger
`settle_ms`. For static UIs where you just want a quick snapshot, pass
`settle_ms: 0` to skip the wait.
## Input choice
- Prefer `app_control_sequence` over multiple back-to-back `app_control_press`
calls when sending an ordered batch of presses (e.g. menu navigation,
repeated movement). Sequence runs in a single round-trip — the target app is
activated once at the start and the keys are sent serially without any
window for keyboard focus to drift to another app between presses. Each step
may carry its own `duration_ms` (hold) and `gap_ms` (pause after).
- Prefer `app_control_combo` over rapid sequential `app_control_press` for
simultaneous inputs (e.g. cmd+shift+4). `combo` holds every key at once;
sequential presses interleave key-down and key-up events.
- Use `app_control_type` for literal text into a focused field.
## Coordinate caveat
`app_control_click` and `app_control_drag` use **window-relative** coordinates.
The window may move or resize between observation and click — if you are
uncertain whether the window has shifted, re-observe first.
## App targeting
Use bundle IDs (e.g. `com.example.app`) when possible — they are the most
reliable identifier. Fall back to localized process names if a bundle ID is
unavailable.
## Ending the session
Call `app_control_stop` when you are done. Do **not** auto-quit the controlled
app — `stop` only ends the app-control session, leaving the app running.
More from vellum-ai/vellum-assistant
- acpSpawn external coding agents via the Agent Client Protocol (ACP)
- amazonShop on Amazon and Amazon Fresh through your browser
- api-mappingRecord and analyze API surfaces of web services
- app-builderBuild and edit small, personal visual tools and artifacts — dashboards, trackers, calculators, data visualizations, charts, simple landing pages, and slide decks the user wants for THEMSELVES. This is the right skill whenever the user asks to "visualize this," "make a chart," or "build an artifact" for their own use, or to edit an app they already built here. Do NOT reach for a ui_show dynamic_page to fake an artifact — build a real persistent app here. NOT for complex, multi-user, or shippable products — those go to a real project folder with a coding agent (see Scope below).
- assistant-migrationMigrate from ChatGPT, Claude, OpenClaw, Hermes, Manus, and other AI assistants into Vellum by inspecting their data exports, conversation archives, files, prompts, custom instructions, memory, saved memories, tools, GPTs, workflows, integrations, and relationships, then mapping as much as safely possible into Vellum primitives. Handles single-source and multi-source migrations with a unified, deduplicated inventory.
- chatgpt-importImport conversation history from ChatGPT into Vellum
- cli-discoverDiscover which CLI tools are installed, their versions, and authentication status
- computer-useControl the macOS desktop
- contactsManage contacts, communication channels, access control, and invite links
- conversation-launcherOffer the user several spin-off conversations as clickable buttons on a single persistent card. Each click spawns a fresh seeded conversation in the sidebar; the user keeps their place in the current conversation. Use when you want to branch into N focused threads (research directions, draft choices, pending replies, triage of N items) without losing the current context. Not for single-destination pivots — just reply inline.