aliyun-cosyvoice-voice-clone

$npx mdskill add cinience/alicloud-skills/aliyun-cosyvoice-voice-clone

Clones voices using Alibaba Cloud Model Studio CosyVoice for later TTS reuse

  • Solves the problem of creating reusable custom voice models from reference audio
  • Uses Alibaba Cloud DashScope API with CosyVoice customization models
  • Selects target model based on deployment region and available model versions
  • Returns a voice_id to be used in subsequent text-to-speech synthesis requests
SKILL.md
.github/skills/aliyun-cosyvoice-voice-cloneView on GitHub ↗
---
name: aliyun-cosyvoice-voice-clone
description: Use when creating cloned voices with Alibaba Cloud Model Studio CosyVoice customization models, especially cosyvoice-v3.5-plus or cosyvoice-v3.5-flash, from reference audio and then reusing the returned voice_id in later TTS calls.
version: 1.0.0
---

Category: provider

# Model Studio CosyVoice Voice Clone

Use the CosyVoice voice enrollment API to create cloned voices from public reference audio.

## Critical model names

Use `model="voice-enrollment"` and one of these `target_model` values:
- `cosyvoice-v3.5-plus`
- `cosyvoice-v3.5-flash`
- `cosyvoice-v3-plus`
- `cosyvoice-v3-flash`
- `cosyvoice-v2`

Recommended default in this repo:
- `target_model="cosyvoice-v3.5-plus"`

## Region and compatibility

- `cosyvoice-v3.5-plus` and `cosyvoice-v3.5-flash` are available only in China mainland deployment mode (Beijing endpoint).
- In international deployment mode (Singapore endpoint), `cosyvoice-v3-plus` and `cosyvoice-v3-flash` do not support voice clone/design.
- The `target_model` used during enrollment must match the model used later in speech synthesis, otherwise synthesis fails.

## Endpoint

- Domestic: `https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization`
- International: `https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization`

## Prerequisites

- Set `DASHSCOPE_API_KEY` in your environment, or add `dashscope_api_key` to `~/.alibabacloud/credentials`.
- Provide a public audio URL for the enrollment sample.

## Normalized interface (cosyvoice.voice_clone)

### Request
- `model` (string, optional): fixed to `voice-enrollment`
- `target_model` (string, optional): default `cosyvoice-v3.5-plus`
- `prefix` (string, required): letters/digits only, max 10 chars
- `voice_sample_url` (string, required): public audio URL
- `language_hints` (array[string], optional): only first item is used
- `max_prompt_audio_length` (float, optional): only for `cosyvoice-v3.5-plus`, `cosyvoice-v3.5-flash`, `cosyvoice-v3-flash`
- `enable_preprocess` (bool, optional): only for `cosyvoice-v3.5-plus`, `cosyvoice-v3.5-flash`, `cosyvoice-v3-flash`

### Response
- `voice_id` (string): use this as the `voice` parameter in later TTS calls
- `request_id` (string)
- `usage.count` (number, optional)

## Operational guidance

- For Chinese dialect reference audio, keep `language_hints=["zh"]`; control dialect style later in synthesis via text or `instruct`.
- For `cosyvoice-v3.5-plus`, supported `language_hints` include `zh`, `en`, `fr`, `de`, `ja`, `ko`, `ru`, `pt`, `th`, `id`, `vi`.
- Avoid frequent enrollment calls; each call creates a new custom voice and consumes quota.

## Local helper script

Prepare a normalized request JSON:

```bash
python skills/ai/audio/aliyun-cosyvoice-voice-clone/scripts/prepare_cosyvoice_clone_request.py \
  --target-model cosyvoice-v3.5-plus \
  --prefix myvoice \
  --voice-sample-url https://example.com/voice.wav \
  --language-hint zh
```

## Validation

```bash
mkdir -p output/aliyun-cosyvoice-voice-clone
for f in skills/ai/audio/aliyun-cosyvoice-voice-clone/scripts/*.py; do
  python3 -m py_compile "$f"
done
echo "py_compile_ok" > output/aliyun-cosyvoice-voice-clone/validate.txt
```

Pass criteria: command exits 0 and `output/aliyun-cosyvoice-voice-clone/validate.txt` is generated.

## Output And Evidence

- Save artifacts, command outputs, and API response summaries under `output/aliyun-cosyvoice-voice-clone/`.
- Include `target_model`, `prefix`, and sample URL in the evidence file.

## References

- `references/api_reference.md`
- `references/sources.md`
More from cinience/alicloud-skills