synthesize

$npx mdskill add terrylica/cc-skills/synthesize

Synthesize text into speech using Kokoro TTS with customizable voices and parameters

  • Convert written text into natural-sounding audio for accessibility or automation
  • Relies on Kokoro TTS CLI tool and supported voice models
  • Accepts text input and optional parameters like voice, language, and speed
  • Generates WAV audio files or streams output for playback
SKILL.md
.github/skills/synthesizeView on GitHub ↗
---
name: synthesize
description: "Synthesize text to speech with Kokoro TTS. TRIGGERS - speak this, kokoro tts, text to speech, synthesize voice, say this."
allowed-tools: Read, Bash, Glob, AskUserQuestion
argument-hint: "[text to speak]"
---

# Synthesize Speech

Generate speech from text using the Kokoro TTS CLI tool. Supports single WAV output or chunked streaming for long text.

> **Self-Evolving Skill**: This skill improves through use. If instructions are wrong, parameters drifted, or a workaround was needed — fix this file immediately, don't defer. Only update for real, reproducible issues.

## Quick Usage

```bash
# Single WAV
~/.local/share/kokoro/.venv/bin/python ~/.local/share/kokoro/tts_generate.py \
  --text "Hello from Kokoro TTS" --voice af_heart --lang en-us --speed 1.0 \
  --output /tmp/kokoro-tts-$$.wav

# Play it
afplay /tmp/kokoro-tts-$$.wav
```

## Parameters

| Parameter  | Default    | Description                          |
| ---------- | ---------- | ------------------------------------ |
| `--text`   | (required) | Text to synthesize                   |
| `--voice`  | `af_heart` | Voice name (see voice catalog)       |
| `--lang`   | `en-us`    | Language code (en-us, zh, ja, etc.)  |
| `--speed`  | `1.0`      | Speech speed multiplier              |
| `--output` | (required) | Output WAV path                      |
| `--chunk`  | off        | Chunked streaming mode for long text |

## Voice Catalog

See [Voice Catalog](./references/voice-catalog.md) for all available voices with quality grades.

**Top voices**:

| Voice ID  | Name   | Grade | Gender |
| --------- | ------ | ----- | ------ |
| af_heart  | Heart  | A     | Female |
| af_bella  | Bella  | A-    | Female |
| af_nicole | Nicole | B-    | Female |

## Chunked Streaming

For long text, use `--chunk` to get progressive playback:

```bash
~/.local/share/kokoro/.venv/bin/python ~/.local/share/kokoro/tts_generate.py \
  --text "Long text here..." --voice af_heart --lang en-us --speed 1.0 \
  --output /tmp/kokoro-tts-$$.wav --chunk
```

Each chunk WAV path is printed to stdout as it becomes ready. The final line is `DONE <ms>`.

## Troubleshooting

| Issue            | Cause            | Solution                        |
| ---------------- | ---------------- | ------------------------------- |
| No audio output  | Model not loaded | Run `/kokoro-tts:install` first |
| Empty text error | Input was blank  | Provide non-empty `--text`      |
| Slow generation  | First-run warmup | Normal — subsequent runs faster |


## Post-Execution Reflection

After this skill completes, check before closing:

1. **Did the command succeed?** — If not, fix the instruction or error table that caused the failure.
2. **Did parameters or output change?** — If the underlying tool's interface drifted, update Usage examples and Parameters table to match.
3. **Was a workaround needed?** — If you had to improvise (different flags, extra steps), update this SKILL.md so the next invocation doesn't need the same workaround.

Only update if the issue is real and reproducible — not speculative.
More from terrylica/cc-skills