nano-banana

$npx mdskill add vm0-ai/vm0-skills/nano-banana

Generate and edit images using Google's Gemini models.

  • Creates images from text prompts or modifies existing visuals.
  • Depends on the Gemini API via a dedicated connector.
  • Executes requests based on user intent keywords.
  • Delivers results through the standard generateContent endpoint.

SKILL.md

.github/skills/nano-bananaView on GitHub ↗
---
name: nano-banana
description: Google Gemini image generation (Nano Banana) via the Gemini API. Use when user mentions "Nano Banana", "Gemini image generation", "gemini-3-pro-image", "gemini-2.5-flash-image", or wants to generate/edit images with Google's native image model.
---

# Nano Banana (Gemini Image Generation)

Generate and edit images using Google's Gemini native image models. Supports text-to-image, image editing, and multi-image composition via the standard `generateContent` endpoint.

> Official docs: `https://ai.google.dev/gemini-api/docs/image-generation`

---

## When to Use

Use this skill when you need to:

- Generate images from text prompts
- Edit an existing image with a text instruction (inpaint / restyle / add-remove)
- Compose multiple input images into one output (e.g. put a product into a scene)
- Iterate on an image conversationally with fine-grained control

---

## Prerequisites

Connect the **Nano Banana** connector at [app.vm0.ai/connectors](https://app.vm0.ai/connectors). Enabling the connector provisions `NANO_BANANA_TOKEN` — no Google Cloud account or user-supplied key is required.

> **Troubleshooting:** If requests fail, run `zero doctor check-connector --env-name NANO_BANANA_TOKEN` or `zero doctor check-connector --url https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image:generateContent --method POST`

---

## How to Use

All calls hit `POST https://generativelanguage.googleapis.com/v1beta/models/<model>:generateContent` with header `x-goog-api-key: $NANO_BANANA_TOKEN`. The output image comes back Base64-encoded in `candidates[0].content.parts[*].inline_data.data`.

### 1. Text-to-Image (Flash — fast, cheap default)

Write to `/tmp/nano_banana_request.json`:

```json
{
  "contents": [
    {
      "parts": [
        { "text": "A golden retriever puppy wearing a tiny chef hat, studio lighting, photorealistic" }
      ]
    }
  ]
}
```

```bash
curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image:generateContent" --header "x-goog-api-key: $NANO_BANANA_TOKEN" --header "Content-Type: application/json" -d @/tmp/nano_banana_request.json > /tmp/nano_banana_response.json
```

### 2. Text-to-Image (Pro — highest quality)

```bash
curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" --header "x-goog-api-key: $NANO_BANANA_TOKEN" --header "Content-Type: application/json" -d @/tmp/nano_banana_request.json > /tmp/nano_banana_response.json
```

### 3. Extract and Save the Image

The response contains one or more parts; the image part has `inline_data.mime_type` starting with `image/`. Extract and decode:

```bash
jq -r '.candidates[0].content.parts[] | select(.inline_data != null) | .inline_data.data' /tmp/nano_banana_response.json | base64 -d > /tmp/nano_banana_output.png
```

### 4. Edit an Existing Image (Image-to-Image)

Pass the input image as a second part. Use a local file or URL → Base64:

```bash
base64 -w0 /path/to/input.jpg > /tmp/nano_banana_input_b64.txt
```

Write to `/tmp/nano_banana_request.json`:

```json
{
  "contents": [
    {
      "parts": [
        { "text": "Replace the background with a snowy mountain range at sunset. Keep the subject unchanged." },
        {
          "inline_data": {
            "mime_type": "image/jpeg",
            "data": "<PASTE_CONTENTS_OF_/tmp/nano_banana_input_b64.txt>"
          }
        }
      ]
    }
  ]
}
```

Or build the JSON with `jq` to avoid pasting:

```bash
jq -n --rawfile img /tmp/nano_banana_input_b64.txt '{
  contents: [{
    parts: [
      { text: "Replace the background with a snowy mountain range at sunset. Keep the subject unchanged." },
      { inline_data: { mime_type: "image/jpeg", data: $img } }
    ]
  }]
}' > /tmp/nano_banana_request.json
```

```bash
curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image:generateContent" --header "x-goog-api-key: $NANO_BANANA_TOKEN" --header "Content-Type: application/json" -d @/tmp/nano_banana_request.json > /tmp/nano_banana_response.json
```

### 5. Multi-Image Composition

Combine multiple input images into one output — e.g. put a product (image A) into a scene (image B):

```bash
jq -n \
  --rawfile a /tmp/product_b64.txt \
  --rawfile b /tmp/scene_b64.txt \
  '{
    contents: [{
      parts: [
        { text: "Place the product from the first image onto the wooden table in the second image. Match the lighting and shadows." },
        { inline_data: { mime_type: "image/png", data: $a } },
        { inline_data: { mime_type: "image/jpeg", data: $b } }
      ]
    }]
  }' > /tmp/nano_banana_request.json

curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" --header "x-goog-api-key: $NANO_BANANA_TOKEN" --header "Content-Type: application/json" -d @/tmp/nano_banana_request.json > /tmp/nano_banana_response.json
```

### 6. Control Output Modalities and Aspect Ratio

Gemini can return text alongside images. To request image-only output and a specific aspect ratio, add `generationConfig`:

```json
{
  "contents": [
    { "parts": [{ "text": "A minimalist poster for a jazz festival" }] }
  ],
  "generationConfig": {
    "responseModalities": ["IMAGE"],
    "imageConfig": {
      "aspectRatio": "16:9",
      "imageSize": "2K"
    }
  }
}
```

```bash
curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image:generateContent" --header "x-goog-api-key: $NANO_BANANA_TOKEN" --header "Content-Type: application/json" -d @/tmp/nano_banana_request.json > /tmp/nano_banana_response.json
```

### 7. Conversational Editing (Multi-Turn Refinement)

Continue refining by appending the previous model turn and a new user message. Reuse the Base64 image the model returned so you don't re-upload:

```bash
PREV_IMG=$(jq -r '.candidates[0].content.parts[] | select(.inline_data != null) | .inline_data.data' /tmp/nano_banana_response.json)

jq -n --arg img "$PREV_IMG" '{
  contents: [
    { role: "user",  parts: [{ text: "A minimalist poster for a jazz festival" }] },
    { role: "model", parts: [{ inline_data: { mime_type: "image/png", data: $img } }] },
    { role: "user",  parts: [{ text: "Make the typography bolder and shift the palette to deep blue and gold." }] }
  ]
}' > /tmp/nano_banana_request.json

curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image:generateContent" --header "x-goog-api-key: $NANO_BANANA_TOKEN" --header "Content-Type: application/json" -d @/tmp/nano_banana_request.json > /tmp/nano_banana_response.json
```

### 8. Inspect Any Text the Model Returns

The model may include a short text caption/explanation alongside the image:

```bash
jq -r '.candidates[0].content.parts[] | select(.text != null) | .text' /tmp/nano_banana_response.json
```

---

## Model Reference

| Model | Tier | Notes |
|---|---|---|
| `gemini-2.5-flash-image` | Fast | Default — good quality, low latency |
| `gemini-3.1-flash-image-preview` | Fast (newer) | Latest Flash preview |
| `gemini-3-pro-image-preview` | Pro | Highest quality, higher latency/cost |

## Aspect Ratios

`1:1`, `2:3`, `3:2`, `3:4`, `4:3`, `4:5`, `5:4`, `9:16`, `16:9`, `21:9`, `1:4`, `4:1`, `1:8`, `8:1`.

## Image Size

`generationConfig.imageConfig.imageSize` — `"512"`, `"1K"` (default), `"2K"`, `"4K"`. Larger sizes cost more and are only relevant to final renders; keep iteration at `1K`.

## Response Shape

```json
{
  "candidates": [{
    "content": {
      "parts": [
        { "text": "Optional caption..." },
        { "inline_data": { "mime_type": "image/png", "data": "<base64>" } }
      ]
    },
    "finishReason": "STOP"
  }]
}
```

## Guidelines

1. **Endpoint is per-model** — the URL ends with `<model>:generateContent`. Don't try `/v1beta/models:generateContent` with a `model` field in the body; the firewall only allows the per-model endpoints.
2. **Use JSON files for request bodies** — write to `/tmp/nano_banana_*.json` to avoid shell quoting issues with long prompts and Base64 payloads.
3. **Always `base64 -w0`** when preparing Linux image input — `base64` without `-w0` inserts newlines that break JSON escaping.
4. **Output is Base64, never a URL** — decode `inline_data.data` and write bytes directly to disk. The `mime_type` tells you the extension (`png` / `jpeg` / `webp`).
5. **Prefer Flash** for iteration, switch to Pro for finals — Flash turns around in a few seconds; Pro is noticeably slower but sharper on text, hands, and fine detail.
6. **Keep prompts concrete** — describe subject, style, lighting, composition, and mood. For edits, say what to change and what to keep.
7. **Input image size** — downscale very large inputs before Base64-encoding; the full round-trip cost scales with payload size.

More from vm0-ai/vm0-skills

SkillDescription
account-reconciliationPerform account reconciliations comparing general ledger balances against subledgers, bank statements, or external records. Use for bank reconciliation, GL-to-subledger reconciliation, intercompany reconciliation, balance sheet reconciliation, reconciling item analysis, outstanding item aging, or clearing open items.
agentphoneBuild AI phone agents with AgentPhone API. Use when the user wants to make phone calls, send/receive SMS, manage phone numbers, create voice agents, set up webhooks, or check usage — anything related to telephony, phone numbers, or voice AI.
ahrefsAhrefs SEO API for backlink and keyword analysis. Use when user mentions
amplitudeAmplitude product analytics API. Use when user mentions "Amplitude",
analysis-qaQuality-check a data analysis before sharing — verify joins, aggregations, denominators, time ranges, and metric definitions. Detect pitfalls like survivorship bias, average-of-averages, join explosion, timezone mismatches, incomplete periods, and selection bias. Includes documentation templates for reproducible analyses.
anthropic-managed-agentsAnthropic Managed Agents API for programmatically creating, running, and streaming AI agents on Anthropic's cloud infrastructure. Use when the user mentions "Managed Agents", "Anthropic agent sessions", or needs to create/run/stream an Anthropic agent with tool use (bash, git, web), attach GitHub repositories, or inject secrets via Vault. Do NOT use for standard Claude Messages API — use the Claude API skill instead.
apifyApify web scraping platform. Use when user mentions "scrape website",
asanaAsana API for tasks and projects. Use when user mentions "Asana", "asana.com",
atlassianAtlassian API for Confluence and Jira. Use when user mentions "Confluence
attioAttio REST API for AI-native CRM operations — manage companies, people, deals, and custom objects, plus notes, tasks, lists, and comments. Use when the user mentions "Attio", "CRM record", "create company", "add person", "list entry", "CRM note", or "CRM task".