web-browser

$npx mdskill add megalithic/dotfiles/web-browser

Automates web page interactions by connecting to a running browser with authenticated sessions via agent-browser CLI.

  • Enables agents to perform browser tasks without manual intervention, such as accessing logged-in accounts.
  • Integrates with agent-browser CLI and requires a browser running on port 9222 for remote debugging.
  • Decides actions by checking for existing tabs to avoid disrupting user work and opens new ones only when necessary.
  • Presents results through command-line outputs and maintains session continuity by reusing authenticated browser connections.
SKILL.md
.github/skills/web-browserView on GitHub ↗
---
name: web-browser
description: "Interact with web pages using agent-browser CLI. MUST run 'browser connect 9222' FIRST to use existing browser with authenticated sessions."
---

# Web Browser Skill

Browser automation using `agent-browser` CLI connected to your running browser.

## 🚨 MANDATORY FIRST STEP

**EVERY browser session MUST start with:**

```bash
browser connect 9222
```

This connects to your running browser with all authenticated sessions (Asana, Figma, GitHub, etc.).

**WITHOUT THIS STEP:**
- Commands will fail or timeout
- You'll get isolated sessions without logins
- User will have to re-authenticate everything

## ⚠️ CRITICAL REQUIREMENTS

### 1. ALWAYS connect to port 9222 FIRST

Before ANY browser operation, you MUST connect to the remote debugging port:

```bash
browser connect 9222
```

This is REQUIRED for accessing authenticated sessions. Without this step, commands will fail or create isolated sessions without your logins.

### 2. NEVER take over existing tabs

When navigating to a URL:
- First check if tab already exists: `browser tab list`
- If found, switch to it: `browser tab <index>`
- If NOT found, open a NEW tab: `browser open <url>`

**NEVER navigate an existing tab to a different URL** - this destroys the user's work/context.

## Correct workflow

```bash
# 1. ALWAYS connect first (required every session)
browser connect 9222

# 2. Check for existing tab
browser tab list

# 3a. If tab exists for your URL, switch to it
browser tab 14

# 3b. If tab doesn't exist, open NEW tab
browser open https://app.asana.com/...

# 4. Interact
browser snapshot -i
browser click @e5
```

## Check if browser is listening

```bash
lsof -i :9222 -sTCP:LISTEN
```

## Common commands

After connecting, use standard agent-browser commands:

### Navigation & tabs
```bash
browser tab list                    # List all tabs
browser tab 14                      # Switch to tab by index
browser open https://example.com    # Open URL (NEW tab)
browser back                        # Go back
browser reload                      # Reload page
```

### Inspection
```bash
browser snapshot -i                 # Get interactive elements with @refs
browser screenshot                  # Take screenshot
browser get title                   # Get page title
browser get url                     # Get current URL
browser get text @e1                # Get text of element
```

### Interaction
```bash
browser click @e1                   # Click element
browser fill @e2 "search text"      # Clear and type
browser type @e3 "append text"      # Type without clearing
browser select @e4 "option"         # Select dropdown
browser press Enter                 # Press key
browser scroll down 500             # Scroll
```

### Waiting
```bash
browser wait @e1                    # Wait for element
browser wait 2000                   # Wait milliseconds
```

## Tab targeting by URL

Instead of remembering tab numbers, find tabs by URL:

```bash
browser tab list | rg -i asana
browser tab list | rg -i localhost:4000
```

## Notes

- Tabs are numbered by CDP, not visual order in browser
- `snapshot -i` gives @refs like @e1, @e2 for clicking
- After page changes (navigation, clicks), re-run `snapshot -i`
- Your browser must be running with `--remote-debugging-port=9222`
More from megalithic/dotfiles