aliyun-liveportrait

$npx mdskill add cinience/alicloud-skills/aliyun-liveportrait

Generates lightweight talking-head videos using Alibaba Cloud LivePortrait from portrait images and speech audio

  • Solves the need for long-form or broadcast-style portrait animation with speech
  • Uses Alibaba Cloud Model Studio LivePortrait and DashScope APIs
  • Runs liveportrait-detect first to verify portrait quality before generation
  • Saves output videos and logs under output/aliyun-liveportrait with detailed parameters
SKILL.md
.github/skills/aliyun-liveportraitView on GitHub ↗
---
name: aliyun-liveportrait
description: Use when generating lightweight talking-head portrait videos with Alibaba Cloud Model Studio LivePortrait (`liveportrait`) from a detected portrait image and speech audio. Use when you need long-form or simple broadcast-style portrait animation beyond the typical short expressive models.
version: 1.0.0
---

Category: provider

# Model Studio LivePortrait

## Validation

```bash
mkdir -p output/aliyun-liveportrait
python -m py_compile skills/ai/video/aliyun-liveportrait/scripts/prepare_liveportrait_request.py && echo "py_compile_ok" > output/aliyun-liveportrait/validate.txt
```

Pass criteria: command exits 0 and `output/aliyun-liveportrait/validate.txt` is generated.

## Output And Evidence

- Save normalized request payloads, template choice, and task polling snapshots under `output/aliyun-liveportrait/`.
- Record the exact portrait/audio URLs and motion-strength related parameters.

Use LivePortrait when the job is lightweight portrait animation with speech audio, especially for longer clips or simpler presenter-style motion.

## Critical model names

Use these exact model strings:
- `liveportrait-detect`
- `liveportrait`

Selection guidance:
- Run `liveportrait-detect` first to verify the portrait image.
- Use `liveportrait` for the actual video generation task.

## Prerequisites

- China mainland (Beijing) only.
- Set `DASHSCOPE_API_KEY` in your environment, or add `dashscope_api_key` to `~/.alibabacloud/credentials`.
- Input image and audio must be public HTTP/HTTPS URLs.

## Normalized interface (video.liveportrait)

### Detect Request
- `model` (string, optional): default `liveportrait-detect`
- `image_url` (string, required)

### Generate Request
- `model` (string, optional): default `liveportrait`
- `image_url` (string, required)
- `audio_url` (string, required)
- `template_id` (string, optional): `normal`, `calm`, or `active`
- `eye_move_freq` (number, optional): `0` to `1`
- `video_fps` (int, optional): `15` to `30`
- `mouth_move_strength` (number, optional): `0` to `1.5`
- `paste_back` (bool, optional)
- `head_move_strength` (number, optional): `0` to `1`

### Response
- `task_id` (string)
- `task_status` (string)
- `video_url` (string, when finished)

## Quick start

```bash
python skills/ai/video/aliyun-liveportrait/scripts/prepare_liveportrait_request.py \
  --image-url "https://example.com/portrait.png" \
  --audio-url "https://example.com/speech.mp3" \
  --template-id calm \
  --video-fps 24 \
  --paste-back
```

## Operational guidance

- Use a clear, front-facing portrait with low occlusion.
- Keep the audio clean and voice-dominant.
- `paste_back=false` outputs only the generated face region; keep it `true` for standard talking-head output.
- LivePortrait is a better fit than EMO when you need longer, simpler presenter-style clips.

## Output location

- Default output: `output/aliyun-liveportrait/request.json`
- Override base dir with `OUTPUT_DIR`.

## References

- `references/sources.md`
More from cinience/alicloud-skills