recording

$npx mdskill add BuilderIO/agent-native/recording

Manage screen and camera recording with permission and chunked upload.

  • Enables agents to capture media streams and handle permission prompts.
  • Integrates with custom API routes for binary streaming and server-side finalization.
  • Decides execution based on user intent flags and navigation state changes.
  • Delivers results by populating recording metadata and transitioning status states.
SKILL.md
.github/skills/recordingView on GitHub ↗
---
name: recording
description: >-
  How screen and camera recording works in Clips — MediaRecorder lifecycle,
  chunked upload, permission handling, pause/resume, camera bubble overlay,
  and error recovery. Use when adding or modifying the recorder UI, the
  upload endpoint, or permission prompts.
---

# Recording

## When to use

Reach for this skill any time you touch the recorder: the record button, the in-progress toolbar, permission prompts, chunked upload flow, or the camera bubble. If you're adding support for a new source (e.g. tab capture, iPhone continuity camera) or changing how chunks are finalized server-side, this is your map.

## Data model touched

- **`recordings`** — the row gets created as soon as the user presses Record. `status` transitions `uploading` → `processing` → `ready` (or `failed`). `videoUrl`, `durationMs`, `videoSizeBytes`, `width`, `height`, `hasAudio`, `hasCamera` are populated as the upload streams in.
- **`application_state.record-intent`** — the agent writes this when it wants to start a recording. The UI reads and clears it, then prompts for permission.
- **`application_state.navigation`** — set to `{ view: "record" }` while the recorder is active.

Uploads hit the **custom API** routes (`/api/uploads/chunk`, `/api/uploads/complete`) rather than actions, because actions aren't the right tool for binary streaming bodies. See `server-plugins` for why.

## Lifecycle

1. **Intent.** Either the user clicks Record (global `Cmd+Shift+L`) or the agent calls `pnpm action start-recording --mode=screen`. The agent version writes `record-intent` to application state; the UI picks it up and initiates the same flow as a user click.
2. **Permission.** Call `navigator.mediaDevices.getDisplayMedia({ video, audio })` for screen, `getUserMedia({ video, audio })` for camera. Do **not** prompt without a user gesture. The agent path relies on the UI's button — we never bypass the browser's permission model.
3. **Create row.** As soon as the stream is granted, call `create-recording` to insert the row with `status: "uploading"` and a pre-generated id. That id is used for every subsequent chunk upload.
4. **Record.** Start a `MediaRecorder` with `mimeType: "video/webm;codecs=vp9,opus"` (fallback to vp8, then browser default). Use `timeslice: 2000` so chunks arrive every 2s.
5. **Upload each chunk.** `ondataavailable` POSTs the chunk bytes to `/api/uploads/chunk` with headers `X-Recording-Id` and `X-Chunk-Index`. Don't retry inline — buffer failed chunks in `IndexedDB` and let a background worker re-send.
6. **Live transcription.** Alongside the MediaRecorder, `useLiveTranscription` runs the Web Speech API to accumulate transcript text in real time. On stop, the client calls `save-browser-transcript` to persist the result immediately — no API key needed.
7. **Finalize.** On stop, call `/api/uploads/complete`. Server stitches chunks, probes for duration/dimensions, transitions `status` to `processing`, then kicks off `request-transcript` for higher-quality output (see `ai-video-tools`).
8. **Navigate.** Once the row is `ready` the UI navigates to `/r/:id`.

## Pause / resume

`MediaRecorder.pause()` / `.resume()` are supported in all evergreen browsers. Keep a single `MediaRecorder` instance across pauses — don't tear down the stream, or the permission prompt will fire again. While paused, the upload worker keeps draining its buffer so we catch up before the user stops.

## Camera bubble

When mode is `screen+camera`, we composite a circular camera feed in the corner. Render the bubble in a separate `<video>` element and record it into a second `MediaRecorder`; the server side stitches them with ffmpeg.wasm during `processing`. Do **not** try to pre-composite in the browser — that burns GPU and drops frames.

## Error recovery

| Failure                        | Handling                                                                    |
| ------------------------------ | --------------------------------------------------------------------------- |
| Permission denied              | Mark the recording row `status: "failed"`, `failureReason: "permission"`.   |
| Chunk upload fails (5xx)       | Retry 3× with backoff; if still failing, park the chunk in IndexedDB.       |
| `MediaRecorder` error event    | Stop, finalize what we have, set `failureReason`; let the user retry.       |
| User closes tab mid-recording  | On reload, check for unflushed chunks in IndexedDB and resume upload.       |

## Code sketch

```ts
// app/hooks/use-recorder.ts
export function useRecorder() {
  const start = async (mode: "screen" | "camera" | "screen+camera") => {
    const stream =
      mode === "camera"
        ? await navigator.mediaDevices.getUserMedia({ video: true, audio: true })
        : await navigator.mediaDevices.getDisplayMedia({ video: true, audio: true });

    const { id } = await callAction("create-recording", { title: "Untitled recording" });

    const rec = new MediaRecorder(stream, { mimeType: "video/webm;codecs=vp9,opus" });
    let chunkIndex = 0;
    rec.ondataavailable = async (e) => {
      if (!e.data.size) return;
      await fetch("/api/uploads/chunk", {
        method: "POST",
        headers: {
          "X-Recording-Id": id,
          "X-Chunk-Index": String(chunkIndex++),
          "Content-Type": "application/octet-stream",
        },
        body: e.data,
      });
    };
    rec.onstop = async () => {
      await fetch("/api/uploads/complete", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({ id }),
      });
    };
    rec.start(2000);
    return { id, stop: () => rec.stop(), pause: () => rec.pause(), resume: () => rec.resume() };
  };

  return { start };
}
```

## Rules

- **Never** start a `MediaRecorder` without a user gesture (or a user-initiated `record-intent`).
- **Never** re-prompt for permissions on pause/resume — reuse the stream.
- **Never** fire the upload from the main thread if the chunks are large — prefer a web worker for anything longer than ~60s.
- The `recordings` row must exist **before** the first chunk is sent.
- On every lifecycle change, write `navigation` → `{ view: "record" }` → `{ view: "recording", recordingId }` so the agent can see what's happening.
- All AI generated during/after recording goes through the agent chat — see `ai-video-tools`.

## Related skills

- `ai-video-tools` — transcription kicks off when upload completes.
- `video-editing` — after recording, users edit via non-destructive `editsJson`.
- `server-plugins` — why the upload is an `/api/` route, not an action.
- `real-time-sync` — how the UI learns about `status` transitions from `uploading` → `ready`.
More from BuilderIO/agent-native