aliyun-qwen-vl
$
npx mdskill add cinience/alicloud-skills/aliyun-qwen-vlAnalyzes images using Alibaba Cloud Qwen VL models for visual understanding tasks
- Solves image Q&A, OCR, chart/table reading, and screenshot analysis tasks
- Uses DashScope-compatible API with Qwen3 VL models like qwen3-vl-plus and qwen3-vl-flash
- Selects appropriate model versions based on task requirements and aliases
- Returns raw model responses and normalized results for traceable, reproducible outputs
SKILL.md
.github/skills/aliyun-qwen-vlView on GitHub ↗
---
name: aliyun-qwen-vl
description: Use when understanding images with Alibaba Cloud Model Studio Qwen VL models (qwen3-vl-plus/qwen3-vl-flash and latest aliases). Use when building image Q&A, visual analysis, OCR-like extraction, chart/table reading, or screenshot understanding workflows.
version: 1.0.0
---
Category: provider
# Model Studio Qwen VL (Image Understanding)
## Validation
```bash
mkdir -p output/aliyun-qwen-vl
python -m py_compile skills/ai/multimodal/aliyun-qwen-vl/scripts/analyze_image.py && echo "py_compile_ok" > output/aliyun-qwen-vl/validate.txt
```
Pass criteria: command exits 0 and `output/aliyun-qwen-vl/validate.txt` is generated.
## Output And Evidence
- Save raw model responses and normalized extraction results to `output/aliyun-qwen-vl/`.
- Include input image reference and prompt for traceability.
Use Qwen VL models for image input + text output understanding tasks via DashScope compatible-mode API.
## Prerequisites
- Install dependencies (recommended in a venv):
```bash
python3 -m venv .venv
. .venv/bin/activate
python -m pip install requests
```
- Set `DASHSCOPE_API_KEY` in environment, or add `dashscope_api_key` to `~/.alibabacloud/credentials`.
## Critical model names
Prefer the Qwen3 VL family:
- `qwen3-vl-plus`
- `qwen3-vl-flash`
When you need explicit "latest" routing or reproducible snapshots, use supported aliases/snapshots from the official model list, such as:
- `qwen3-vl-plus-latest`
- `qwen3-vl-plus-2025-12-19`
- `qwen3-vl-flash-2026-01-22`
- `qwen3-vl-flash-latest`
Legacy names still seen in some workloads:
- `qwen-vl-max-latest`
- `qwen-vl-plus-latest`
For OCR-specialized extraction, prefer `skills/ai/multimodal/aliyun-qwen-ocr/` instead of using the general VL skill.
## Normalized interface (multimodal.chat)
### Request
- `prompt` (string, required): user question/instruction about image.
- `image` (string, required): HTTPS URL, local path, or `data:` URL.
- `model` (string, optional): default `qwen3-vl-plus`.
- `max_tokens` (int, optional): default `512`.
- `temperature` (float, optional): default `0.2`.
- `detail` (string, optional): `auto`/`low`/`high`, default `auto`.
- `json_mode` (bool, optional): return JSON-only response when possible.
- `schema` (object, optional): JSON Schema for structured extraction.
- `max_retries` (int, optional): retry count for `429/5xx`, default `2`.
- `retry_backoff_s` (float, optional): exponential backoff base seconds, default `1.5`.
### Response
- `text` (string): primary model answer.
- `model` (string): model actually used.
- `usage` (object): token usage if returned by backend.
## Quickstart
```bash
python skills/ai/multimodal/aliyun-qwen-vl/scripts/analyze_image.py \
--request '{"prompt":"Summarize the main content in this image","image":"https://example.com/demo.jpg"}' \
--print-response
```
Using local image:
```bash
python skills/ai/multimodal/aliyun-qwen-vl/scripts/analyze_image.py \
--request '{"prompt":"Extract key information from the image","image":"./samples/invoice.png","model":"qwen3-vl-plus"}' \
--print-response
```
Structured extraction (JSON mode):
```bash
python skills/ai/multimodal/aliyun-qwen-vl/scripts/analyze_image.py \
--request '{"prompt":"Extract fields: title, amount, date","image":"./samples/invoice.png"}' \
--json-mode \
--print-response
```
Structured extraction (JSON Schema):
```bash
python skills/ai/multimodal/aliyun-qwen-vl/scripts/analyze_image.py \
--request '{"prompt":"Extract invoice fields","image":"./samples/invoice.png"}' \
--schema skills/ai/multimodal/aliyun-qwen-vl/references/examples/invoice.schema.json \
--print-response
```
## cURL (compatible mode)
```bash
curl -sS https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model":"qwen3-vl-plus",
"messages":[
{
"role":"user",
"content":[
{"type":"image_url","image_url":{"url":"https://example.com/demo.jpg"}},
{"type":"text","text":"Describe this image and list executable actions"}
]
}
],
"max_tokens":512,
"temperature":0.2
}'
```
## Output location
- If `--output` is set, JSON response is saved to that file.
- Default output dir convention: `output/aliyun-qwen-vl/`.
## Smoke test
```bash
python tests/ai/multimodal/aliyun-qwen-vl-test/scripts/smoke_test_qwen_vl.py \
--image ./tmp/vl_test_cat.png
```
## Error handling
| Error | Likely cause | Action |
| --- | --- | --- |
| 401/403 | Missing or invalid key | Check `DASHSCOPE_API_KEY` and account permissions. |
| 400 | Invalid request schema or unsupported image source | Validate `messages` content and image URL/path format. |
| 429 | Rate limit | Retry with exponential backoff and lower concurrency. |
| 5xx | Temporary backend issue | Retry with backoff and idempotent request design. |
## Operational guidance
- For stable production behavior, pin snapshot model IDs instead of pure `-latest`.
- Compress very large images before upload to reduce latency and cost.
- Add explicit extraction constraints in prompt (fields, JSON shape, language).
- For OCR-like output, ask for confidence notes and unresolved text markers.
## Workflow
1) Confirm user intent, region, identifiers, and whether the operation is read-only or mutating.
2) Run one minimal read-only query first to verify connectivity and permissions.
3) Execute the target operation with explicit parameters and bounded scope.
4) Verify results and save output/evidence files.
## References
- Source list: `references/sources.md`
- API notes: `references/api_reference.md`
More from cinience/alicloud-skills
- aliyun-adb-mysqlUse when managing Alibaba Cloud AnalyticDB for MySQL (ADB) via OpenAPI/SDK, including the user needs AnalyticDB resource lifecycle and configuration operations, status checks, or troubleshooting ADB API and cluster workflow issues.
- aliyun-adb-mysql-testSmoke test for aliyun-adb-mysql. Validate minimal authentication, API reachability, and one read-only query path.
- aliyun-aicontent-generateUse when managing Alibaba Cloud AIContent (AiContent) via OpenAPI/SDK, including the user needs AI content generation or content workflow operations in Alibaba Cloud, including listing assets, creating/updating generation configurations, checking task status, or troubleshooting failed content jobs.
- aliyun-aicontent-generate-testSmoke test for aliyun-aicontent-generate. Validate minimal authentication, API reachability, and one read-only query path.
- aliyun-aimiaobi-generateUse when managing Alibaba Cloud Quan Miao (AiMiaoBi) via OpenAPI/SDK, including the user asks for Alibaba Cloud MiaoBi content operations, including listing resources, creating/updating configurations, querying runtime status, and diagnosing API or workflow failures.
- aliyun-aimiaobi-generate-testSmoke test for aliyun-aimiaobi-generate. Validate minimal authentication, API reachability, and one read-only query path.
- aliyun-airec-manageUse when managing Alibaba Cloud AIRec (Airec) via OpenAPI/SDK, including the user needs recommendation-engine resource operations in Alibaba Cloud, including list/create/update flows, status inspection, and troubleshooting AIRec configuration or runtime issues.
- aliyun-airec-manage-testSmoke test for aliyun-airec-manage. Validate minimal authentication, API reachability, and one read-only query path.
- aliyun-alb-manageUse when managing and troubleshoot Alibaba Cloud ALB (Application Load Balancer), including the user asks to inspect, create, change, or debug ALB instances, listeners, server groups, rules, certificates, ACLs, security policies, or health checks in Alibaba Cloud.
- aliyun-alb-manage-testSmoke test for Alibaba Cloud ALB skill. Validates SDK auth, script compilation, list instances, and health check flows.