ai-media
$
npx mdskill add arcasilesgroup/ai-engineering/ai-mediaGenerate images, videos, and audio using fal.ai models via MCP. Progressive quality pattern: iterate cheap, finalize expensive. Covers text-to-image, text/image-to-video, text-to-speech, and video-to-audio.
SKILL.md
.github/skills/ai-mediaView on GitHub ↗
---
name: ai-media
description: "Generates images, videos, and audio via AI models (fal-ai MCP): cheap iteration models, expensive production finals, cost-estimate before generation. Trigger for 'generate an image', 'create a thumbnail', 'make a voiceover', 'AI video', 'text to speech for'. Not for design composition; use /ai-visual instead. Not for animation specs; use /ai-animation instead."
effort: mid
argument-hint: "image|video|audio [description]"
mode: agent
tags: [media, generation, fal-ai]
requires:
mcp:
- fal-ai
model_tier: sonnet
mirror_family: copilot-skills
generated_by: ai-eng sync
canonical_source: .claude/skills/ai-media/SKILL.md
edit_policy: generated-do-not-edit
---
# Media
## Purpose
Generate images, videos, and audio using fal.ai models via MCP. Progressive quality pattern: iterate cheap, finalize expensive. Covers text-to-image, text/image-to-video, text-to-speech, and video-to-audio.
## When to Use
- `image`: generating images from text prompts (thumbnails, hero images, insert shots)
- `video`: creating videos from text or images (demos, b-roll, social clips)
- `audio`: generating speech, music, or sound effects (voiceover, background music, SFX)
## Process
### Step 1 -- Gate Check (MCP Required)
Verify the fal.ai MCP server is configured. If not available, inform the user and provide setup instructions:
```json
"fal-ai": {
"command": "npx",
"args": ["-y", "fal-ai-mcp-server"],
"env": { "FAL_KEY": "YOUR_FAL_KEY_HERE" }
}
```
Get an API key at [fal.ai](https://fal.ai).
### Step 2 -- Estimate Cost
Before generating, always check estimated cost:
```
estimate_cost(model_name: "fal-ai/...", input: {...})
```
Inform the user of the estimate before proceeding with expensive generations (video especially).
### Step 3 -- ElevenLabs Gate Check
Before using ElevenLabs, verify `ELEVENLABS_API_KEY` is set. If not, fall back to csm-1b or inform the user.
### Step 4 -- Generate with Progressive Quality
Start with cheaper models for prompt iteration, then switch to production models for finals.
### Step 5 -- Deliver
Provide the generated media with:
- file path or URL
- model used and parameters
- cost incurred
- suggestions for iteration if quality is not satisfactory
## Quick Reference
### Model Table
| Model | Type | Best For | Cost Tier |
| --------------------------- | ----- | ----------------------------------------------- | --------- |
| `fal-ai/nano-banana-2` | Image | Quick iterations, drafts, image editing | Low |
| `fal-ai/nano-banana-pro` | Image | Production images, realism, typography | Medium |
| `fal-ai/seedance-1-0-pro` | Video | Text-to-video, image-to-video, high motion | High |
| `fal-ai/kling-video/v3/pro` | Video | Text/image-to-video with native audio | High |
| `fal-ai/veo-3` | Video | Video with generated sound, high visual quality | High |
| `fal-ai/csm-1b` | Audio | Conversational text-to-speech | Low |
| `fal-ai/thinksound` | Audio | Video-to-audio (matching sounds from video) | Medium |
### Image Parameters
| Param | Type | Options | Notes |
| ---------------- | ------ | ---------------------------------------------------------------------------- | -------------------------------------------------------- |
| `prompt` | string | required | Describe what you want |
| `image_size` | string | `square`, `portrait_4_3`, `landscape_16_9`, `portrait_16_9`, `landscape_4_3` | Aspect ratio |
| `num_images` | number | 1-4 | How many to generate |
| `seed` | number | any integer | Reproducibility |
| `guidance_scale` | number | 1-20 | How closely to follow the prompt (higher = more literal) |
### Video Parameters
| Param | Type | Options | Notes |
| -------------- | ------ | --------------------------- | ------------------------------- |
| `prompt` | string | required | Describe the video |
| `duration` | string | `"5s"`, `"10s"` | Video length |
| `aspect_ratio` | string | `"16:9"`, `"9:16"`, `"1:1"` | Frame ratio |
| `seed` | number | any integer | Reproducibility |
| `image_url` | string | URL | Source image for image-to-video |
### MCP Tools Available
| Tool | Purpose |
| --------------- | -------------------------------- |
| `search` | Find available models by keyword |
| `find` | Get model details and parameters |
| `generate` | Run a model with parameters |
| `result` | Check async generation status |
| `status` | Check job status |
| `cancel` | Cancel a running job |
| `estimate_cost` | Estimate generation cost |
| `models` | List popular models |
| `upload` | Upload files for use as inputs |
## Progressive Quality Pattern
Iteration (low-cost) -> Production (high-cost): nano-banana-2 -> nano-banana-pro · seedance-1-0-pro -> veo-3 · csm-1b -> ElevenLabs. Use `seed` for reproducible results when iterating; lock once composition works, then switch to the production model.
## Image Editing
Use Nano Banana 2 with an input image for inpainting, outpainting, or style transfer:
```
upload(file_path: "/path/to/image.png")
generate(model_name: "fal-ai/nano-banana-2", input: {
"prompt": "same scene but in watercolor style",
"image_url": "<uploaded_url>",
"image_size": "landscape_16_9"
})
```
For non-MCP integrations (ElevenLabs, VideoDB), follow `handlers/external-apis.md`.
## Common Mistakes
Do not skip `estimate_cost`, use production models for first-pass iteration, ignore `seed`, choose pure text-to-video when image-to-video is more controlled, or assume fal.ai access covers ElevenLabs credentials.
## Examples
### Example 1 — generate a hero image for a blog post
User: "create a hero image for the blog post about parallel agent planning"
```
/ai-media image hero for parallel agent planning blog
```
Iterates with `nano-banana-2` (cheap), locks composition with `seed`, switches to `nano-banana-pro` for the production final, returns URL + cost.
### Example 2 — voiceover for a demo video
User: "make a 30-second voiceover for the v1.0 demo"
```
/ai-media audio voiceover for v1.0 demo
```
Iterates with `csm-1b` for cheap previews, finalizes with ElevenLabs for production-quality output.
## Integration
Called by: user directly, `/ai-build`, `ai-video-editing` (Layer 5 generated assets). Calls: fal.ai MCP, ElevenLabs API, VideoDB API. See also: `/ai-visual` (composed visuals), `/ai-slides` (deck visuals), `/ai-animation`.
$ARGUMENTS
More from arcasilesgroup/ai-engineering
- ai-adviseProactive governance advisor — checks standards, decisions, and quality trends during development. Always advisory, NEVER blocks. Three modes: `advise` (post-edit), `gate` (pre-dispatch), `drift` (on-demand decision audit). Trigger for 'governance check', 'advise on this change', 'check for drift', 'is this aligned with active decisions', 'shift-left advisory'. Not for blocking gates — use /ai-verify. Not for narrative code review — use /ai-review.
- ai-analyze-permissionsUse when Claude Code keeps asking to approve commands you have already approved, when settings.local.json has grown large, or when you want to consolidate permission grants into wildcard patterns. Trigger for 'too many permission prompts', 'clean up permissions', 'audit my settings', 'consolidate allow rules'. Claude Code only — not available in GitHub Copilot, Antigravity, or Codex.
- ai-animationDesigns motion, transitions, and micro-interactions for UI components: spring animations, gestures, easing, staggers — taste-driven detail compounding. Trigger for 'animate this', 'add transitions', 'micro-interactions for', 'gesture design', 'swipe to dismiss', 'easing for this', 'stagger the'. Not for design systems; use /ai-design instead. Not for visual art; use /ai-visual instead. Not for testing animation code; use /ai-test instead.
- ai-autopilotDelivers large multi-concern specs and backlog runs autonomously: decomposes specs into sub-specs (or normalizes work items into a backlog DAG), deep-plans with parallel agents, builds a dependency DAG, implements in waves, runs a single final quality loop with one bounded quality-remediation pass (verify+guard+review on full changeset), delivers via PR. Trigger for 'implement spec-NNN end to end', 'autopilot this', 'autonomous delivery', 'decompose and ship', 'run the backlog', 'execute these GitHub issues', 'process the sprint backlog'. Invocation is the approval gate. Not for small or single-concern tasks; use /ai-build instead. Not for ambiguous requirements; use /ai-brainstorm first.
- ai-boardOperates the project board (GitHub Projects v2 or Azure DevOps): discovers configuration after install (fields, state mappings, process templates) and syncs work-item state at lifecycle transitions. Trigger for 'set up the board', 'configure our ADO board', 'discover board fields', 'move this issue to in-review', 'update the board', 'mark as in progress', 'sync the work item state'. Two subcommands: `discover` (post-install configuration write) and `sync` (lifecycle state transitions). Auto-invoked via `sync` by /ai-brainstorm, /ai-build, and /ai-pr; fail-open. Not for backlog execution; use /ai-autopilot --backlog instead.
- ai-brainstormForces rigorous design interrogation BEFORE any code: explores approaches, surfaces ambiguity, gathers evidence, produces an approved spec that becomes the contract for /ai-plan. Trigger for 'lets add X', 'how should we handle Y', 'whats the best approach', 'I am thinking about', 'what should we build for'. Not for existing approved specs; use /ai-plan instead. Not for execution; use /ai-build instead.
- ai-branch-cleanupCleans branches safely: switches to the default branch, prunes merged and squash-merged branches, syncs to remote, sweeps stale specs, rotates `.ai-engineering/runtime/` per retention policy. Trigger for 'tidy up', 'tidy branches', 'sync to main', 'delete old branches', 'start fresh', 'rotate runtime'. Auto-invoked by /ai-pr after merge. Not for committing changes; use /ai-commit instead. Not for code-level dead-code removal; use /ai-simplify instead.
- ai-buildCanonical implementation gateway: reads approved plan.md, resolves stack from manifest, deterministic-routes each task to its adapter, dispatches the build agent in an isolated worktree, runs TDD self-validation per task, then a single final quality loop with one bounded quality-remediation pass on the full changeset before /ai-pr. Trigger for 'go', 'start building', 'execute the plan', 'implement it', 'lets do this', 'build the plan', 'resume', 'continue'. Not without an approved plan; run /ai-plan first. Not for multi-concern specs needing decomposition; use /ai-autopilot instead. Not for a single function or subcomponent; use /ai-code.
- ai-codeWrites production code that satisfies stack-context standards on the first pass: interface-first design, backward-compatibility checks, lightweight self-review. Trigger for 'implement this', 'write the code for', 'add X to Y', 'build this function', 'make this work'. Not for tests; use /ai-test instead. Not for debugging; use /ai-debug instead. Not for refactoring; use /ai-simplify instead. Not for executing an approved plan end-to-end; use /ai-build (the gateway).
- ai-commitRuns the governed commit pipeline: auto-branches from protected, stages selectively, formats and lints, scans for secrets, gates docs, composes a conventional message, pushes. Trigger for 'commit my changes', 'save my work', 'push this to remote', 'stage these files', 'ship it'. Not for opening a PR; use /ai-pr instead. Not for branch hygiene; use /ai-branch-cleanup instead.