recording

Name: recording
Author: BuilderIO/agent-native

$npx mdskill add BuilderIO/agent-native/recording

Manage screen and camera recording with permission and chunked upload.

Enables agents to capture media streams and handle permission prompts.
Integrates with custom API routes for binary streaming and server-side finalization.
Decides execution based on user intent flags and navigation state changes.
Delivers results by populating recording metadata and transitioning status states.

SKILL.md

.github/skills/recordingView on GitHub ↗

---
name: recording
description: >-
  How screen and camera recording works in Clips — MediaRecorder lifecycle,
  chunked upload, permission handling, pause/resume, camera bubble overlay,
  and error recovery. Use when adding or modifying the recorder UI, the
  upload endpoint, or permission prompts.
---

# Recording

## When to use

Reach for this skill any time you touch the recorder: the record button, the in-progress toolbar, permission prompts, chunked upload flow, or the camera bubble. If you're adding support for a new source (e.g. tab capture, iPhone continuity camera) or changing how chunks are finalized server-side, this is your map.

## Data model touched

- **`recordings`** — the row gets created as soon as the user presses Record. `status` transitions `uploading` → `processing` → `ready` (or `failed`). `videoUrl`, `durationMs`, `videoSizeBytes`, `width`, `height`, `hasAudio`, `hasCamera` are populated as the upload streams in.
- **`application_state.record-intent`** — the agent writes this when it wants to start a recording. The UI reads and clears it, then prompts for permission.
- **`application_state.navigation`** — set to `{ view: "record" }` while the recorder is active.

Uploads hit the **custom API** routes (`/api/uploads/chunk`, `/api/uploads/complete`) rather than actions, because actions aren't the right tool for binary streaming bodies. See `server-plugins` for why.

## Lifecycle

1. **Intent.** Either the user clicks Record (global `Cmd+Shift+L`) or the agent calls `pnpm action start-recording --mode=screen`. The agent version writes `record-intent` to application state; the UI picks it up and initiates the same flow as a user click.
2. **Permission.** Call `navigator.mediaDevices.getDisplayMedia({ video, audio })` for screen, `getUserMedia({ video, audio })` for camera. Do **not** prompt without a user gesture. The agent path relies on the UI's button — we never bypass the browser's permission model.
3. **Create row.** As soon as the stream is granted, call `create-recording` to insert the row with `status: "uploading"` and a pre-generated id. That id is used for every subsequent chunk upload.
4. **Record.** Start a `MediaRecorder` with `mimeType: "video/webm;codecs=vp9,opus"` (fallback to vp8, then browser default). Use `timeslice: 2000` so chunks arrive every 2s.
5. **Upload each chunk.** `ondataavailable` POSTs the chunk bytes to `/api/uploads/chunk` with headers `X-Recording-Id` and `X-Chunk-Index`. Don't retry inline — buffer failed chunks in `IndexedDB` and let a background worker re-send.
6. **Live transcription.** Alongside the MediaRecorder, `useLiveTranscription` runs the Web Speech API to accumulate transcript text in real time. On stop, the client calls `save-browser-transcript` to persist the result immediately — no API key needed.
7. **Finalize.** On stop, call `/api/uploads/complete`. Server stitches chunks, probes for duration/dimensions, transitions `status` to `processing`, then kicks off `request-transcript` for higher-quality output (see `ai-video-tools`).
8. **Navigate.** Once the row is `ready` the UI navigates to `/r/:id`.

## Pause / resume

`MediaRecorder.pause()` / `.resume()` are supported in all evergreen browsers. Keep a single `MediaRecorder` instance across pauses — don't tear down the stream, or the permission prompt will fire again. While paused, the upload worker keeps draining its buffer so we catch up before the user stops.

## Camera bubble

When mode is `screen+camera`, we composite a circular camera feed in the corner. Render the bubble in a separate `<video>` element and record it into a second `MediaRecorder`; the server side stitches them with ffmpeg.wasm during `processing`. Do **not** try to pre-composite in the browser — that burns GPU and drops frames.

## Error recovery

| Failure                        | Handling                                                                    |
| ------------------------------ | --------------------------------------------------------------------------- |
| Permission denied              | Mark the recording row `status: "failed"`, `failureReason: "permission"`.   |
| Chunk upload fails (5xx)       | Retry 3× with backoff; if still failing, park the chunk in IndexedDB.       |
| `MediaRecorder` error event    | Stop, finalize what we have, set `failureReason`; let the user retry.       |
| User closes tab mid-recording  | On reload, check for unflushed chunks in IndexedDB and resume upload.       |

## Code sketch

```ts
// app/hooks/use-recorder.ts
export function useRecorder() {
  const start = async (mode: "screen" | "camera" | "screen+camera") => {
    const stream =
      mode === "camera"
        ? await navigator.mediaDevices.getUserMedia({ video: true, audio: true })
        : await navigator.mediaDevices.getDisplayMedia({ video: true, audio: true });

    const { id } = await callAction("create-recording", { title: "Untitled recording" });

    const rec = new MediaRecorder(stream, { mimeType: "video/webm;codecs=vp9,opus" });
    let chunkIndex = 0;
    rec.ondataavailable = async (e) => {
      if (!e.data.size) return;
      await fetch("/api/uploads/chunk", {
        method: "POST",
        headers: {
          "X-Recording-Id": id,
          "X-Chunk-Index": String(chunkIndex++),
          "Content-Type": "application/octet-stream",
        },
        body: e.data,
      });
    };
    rec.onstop = async () => {
      await fetch("/api/uploads/complete", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({ id }),
      });
    };
    rec.start(2000);
    return { id, stop: () => rec.stop(), pause: () => rec.pause(), resume: () => rec.resume() };
  };

  return { start };
}
```

## Rules

- **Never** start a `MediaRecorder` without a user gesture (or a user-initiated `record-intent`).
- **Never** re-prompt for permissions on pause/resume — reuse the stream.
- **Never** fire the upload from the main thread if the chunks are large — prefer a web worker for anything longer than ~60s.
- The `recordings` row must exist **before** the first chunk is sent.
- On every lifecycle change, write `navigation` → `{ view: "record" }` → `{ view: "recording", recordingId }` so the agent can see what's happening.
- All AI generated during/after recording goes through the agent chat — see `ai-video-tools`.

## Related skills

- `ai-video-tools` — transcription kicks off when upload completes.
- `video-editing` — after recording, users edit via non-destructive `editsJson`.
- `server-plugins` — why the upload is an `/api/` route, not an action.
- `real-time-sync` — how the UI learns about `status` transitions from `uploading` → `ready`.