aliyun-pixverse-generation

$npx mdskill add cinience/alicloud-skills/aliyun-pixverse-generation

Generates videos using Alibaba Cloud Model Studio PixVerse models

  • Solves video generation tasks using text, images, or keyframes as input
  • Depends on Alibaba Cloud Model Studio and PixVerse model variants
  • Chooses appropriate model based on input type and user intent
  • Saves request data, model choices, and task progress in output directory
SKILL.md
.github/skills/aliyun-pixverse-generationView on GitHub ↗
---
name: aliyun-pixverse-generation
description: Use when generating videos with Alibaba Cloud Model Studio PixVerse models (`pixverse/pixverse-v5.6-t2v`, `pixverse/pixverse-v5.6-it2v`, `pixverse/pixverse-v5.6-kf2v`, `pixverse/pixverse-v5.6-r2v`). Use when building non-Wan text-to-video, first-frame image-to-video, keyframe-to-video, or multi-image reference-to-video workflows on Model Studio.
version: 1.0.0
---

Category: provider

# Model Studio Aishi Video Generation

## Validation

```bash
mkdir -p output/aliyun-pixverse-generation
python -m py_compile skills/ai/video/aliyun-pixverse-generation/scripts/prepare_aishi_request.py && echo "py_compile_ok" > output/aliyun-pixverse-generation/validate.txt
```

Pass criteria: command exits 0 and `output/aliyun-pixverse-generation/validate.txt` is generated.

## Output And Evidence

- Save normalized request payloads, chosen model variant, and task polling snapshots under `output/aliyun-pixverse-generation/`.
- Record region, resolution/size, duration, and whether audio generation was enabled.

Use Aishi when the user explicitly wants the non-Wan PixVerse family for video generation.

## Critical model names

Use one of these exact model strings:
- `pixverse/pixverse-v5.6-t2v`
- `pixverse/pixverse-v5.6-it2v`
- `pixverse/pixverse-v5.6-kf2v`
- `pixverse/pixverse-v5.6-r2v`

Selection guidance:
- Use `pixverse/pixverse-v5.6-t2v` for text-only generation.
- Use `pixverse/pixverse-v5.6-it2v` for first-frame image-to-video.
- Use `pixverse/pixverse-v5.6-kf2v` for first-frame + last-frame transitions.
- Use `pixverse/pixverse-v5.6-r2v` for multi-image character/style consistency.

## Prerequisites

- This family currently only supports China mainland (Beijing).
- Install SDK or call HTTP directly:

```bash
python3 -m venv .venv
. .venv/bin/activate
python -m pip install dashscope
```

- Set `DASHSCOPE_API_KEY` in your environment, or add `dashscope_api_key` to `~/.alibabacloud/credentials`.

## Normalized interface (video.generate)

### Request
- `model` (string, required)
- `prompt` (string, optional for `it2v`, required for other variants)
- `media` (array<object>, optional)
- `size` (string, optional): direct pixel size such as `1280*720`, used by `t2v` and `r2v`
- `resolution` (string, optional): `360P`/`540P`/`720P`/`1080P`, used by `it2v` and `kf2v`
- `duration` (int, required): `5`/`8`/`10`, except 1080P only supports `5`/`8`
- `audio` (bool, optional)
- `watermark` (bool, optional)
- `seed` (int, optional)

### Response
- `task_id` (string)
- `task_status` (string)
- `video_url` (string, when finished)

## Endpoint and execution model

- Submit task: `POST https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis`
- Poll task: `GET https://dashscope.aliyuncs.com/api/v1/tasks/{task_id}`
- HTTP calls are async only and must set header `X-DashScope-Async: enable`.

## Quick start

Text-to-video:

```bash
python skills/ai/video/aliyun-pixverse-generation/scripts/prepare_aishi_request.py \
  --model pixverse/pixverse-v5.6-t2v \
  --prompt "A compact robot walks through a rainy neon alley." \
  --size 1280*720 \
  --duration 5
```

Image-to-video:

```bash
python skills/ai/video/aliyun-pixverse-generation/scripts/prepare_aishi_request.py \
  --model pixverse/pixverse-v5.6-it2v \
  --prompt "The turtle swims slowly as the camera rises." \
  --media image_url=https://example.com/turtle.webp \
  --resolution 720P \
  --duration 5
```

## Operational guidance

- `t2v` and `r2v` use `size`; `it2v` and `kf2v` use `resolution`.
- For `kf2v`, provide exactly one `first_frame` and one `last_frame`.
- For `r2v`, you can pass up to 7 reference images.
- Aishi returns task IDs first; do not treat the initial response as the final video result.

## Output location

- Default output: `output/aliyun-pixverse-generation/request.json`
- Override base dir with `OUTPUT_DIR`.

## References

- `references/sources.md`
More from cinience/alicloud-skills