aliyun-qwen-ocr
$
npx mdskill add cinience/alicloud-skills/aliyun-qwen-ocrExtracts text and structures from documents using Alibaba Cloud Qwen OCR models
- Solves document parsing, table extraction, and multilingual OCR tasks
- Uses Alibaba Cloud Model Studio Qwen OCR models for processing
- Chooses appropriate OCR model based on stability, latest features, or versioned snapshots
- Delivers structured text, formulas, and key information from visual inputs
SKILL.md
.github/skills/aliyun-qwen-ocrView on GitHub ↗
--- name: aliyun-qwen-ocr description: Use when OCR-specialized extraction is needed with Alibaba Cloud Model Studio Qwen OCR models (`qwen-vl-ocr`, `qwen-vl-ocr-latest`, and snapshots), including document parsing, table parsing, multilingual OCR, formula recognition, and key information extraction. version: 1.0.0 --- Category: provider # Model Studio Qwen OCR ## Validation ```bash mkdir -p output/aliyun-qwen-ocr python -m py_compile skills/ai/multimodal/aliyun-qwen-ocr/scripts/prepare_ocr_request.py && echo "py_compile_ok" > output/aliyun-qwen-ocr/validate.txt ``` Pass criteria: command exits 0 and `output/aliyun-qwen-ocr/validate.txt` is generated. ## Output And Evidence - Save request payloads, selected OCR task name, and normalized output expectations under `output/aliyun-qwen-ocr/`. - Keep the exact model, image source, and task configuration with each saved run. Use Qwen OCR when the task is primarily text extraction or document structure parsing rather than broad visual reasoning. ## Critical model names Use one of these exact model strings: - `qwen-vl-ocr` - `qwen-vl-ocr-latest` - `qwen-vl-ocr-2025-11-20` - `qwen-vl-ocr-2025-08-28` - `qwen-vl-ocr-2025-04-13` - `qwen-vl-ocr-2024-10-28` Selection guidance: - Use `qwen-vl-ocr` for the stable channel. - Use `qwen-vl-ocr-latest` only when you explicitly want the newest OCR behavior. - Pin `qwen-vl-ocr-2025-11-20` when you need reproducible document parsing based on the Qwen3-VL OCR upgrade. ## Prerequisites - Install dependencies (recommended in a venv): ```bash python3 -m venv .venv . .venv/bin/activate python -m pip install requests ``` - Set `DASHSCOPE_API_KEY` in environment, or add `dashscope_api_key` to `~/.alibabacloud/credentials`. ## Normalized interface (ocr.extract) ### Request - `image` (string, required): HTTPS URL, local path, or `data:` URL. - `model` (string, optional): default `qwen-vl-ocr`. - `prompt` (string, optional): use when you want custom extraction instructions. - `task` (string, optional): built-in OCR task. - `task_config` (object, optional): configuration for built-in task such as extraction fields. - `enable_rotate` (bool, optional): default `false`. - `min_pixels` (int, optional) - `max_pixels` (int, optional) - `max_tokens` (int, optional) - `temperature` (float, optional): recommended to keep near default/low values. ### Response - `text` (string): extracted text or structured markdown/html-style output. - `model` (string) - `usage` (object, optional) ## Built-in OCR tasks Use one of these values in `task`: - `text_recognition` - `key_information_extraction` - `document_parsing` - `table_parsing` - `formula_recognition` - `multi_lan` - `advanced_recognition` ## Quick start Custom prompt: ```bash python skills/ai/multimodal/aliyun-qwen-ocr/scripts/prepare_ocr_request.py \ --image "https://example.com/invoice.png" \ --prompt "Extract seller name, invoice date, amount, and tax number in JSON." ``` Built-in task: ```bash python skills/ai/multimodal/aliyun-qwen-ocr/scripts/prepare_ocr_request.py \ --image "https://example.com/table.png" \ --task table_parsing \ --model qwen-vl-ocr-2025-11-20 ``` ## Operational guidance - Prefer built-in OCR tasks for standard parsing jobs because they use official task prompts. - For critical business fields, add downstream validation rules after OCR. - `qwen-vl-ocr` and older snapshots default to `4096` max output tokens unless higher limits are approved by Alibaba Cloud; `qwen-vl-ocr-2025-11-20` follows the model maximum. - Increase `max_pixels` only when small text is missed; this raises token cost. ## Output location - Default output: `output/aliyun-qwen-ocr/request.json` - Override base dir with `OUTPUT_DIR`. ## References - `references/api_reference.md` - `references/sources.md`
More from cinience/alicloud-skills
- aliyun-adb-mysqlUse when managing Alibaba Cloud AnalyticDB for MySQL (ADB) via OpenAPI/SDK, including the user needs AnalyticDB resource lifecycle and configuration operations, status checks, or troubleshooting ADB API and cluster workflow issues.
- aliyun-adb-mysql-testSmoke test for aliyun-adb-mysql. Validate minimal authentication, API reachability, and one read-only query path.
- aliyun-aicontent-generateUse when managing Alibaba Cloud AIContent (AiContent) via OpenAPI/SDK, including the user needs AI content generation or content workflow operations in Alibaba Cloud, including listing assets, creating/updating generation configurations, checking task status, or troubleshooting failed content jobs.
- aliyun-aicontent-generate-testSmoke test for aliyun-aicontent-generate. Validate minimal authentication, API reachability, and one read-only query path.
- aliyun-aimiaobi-generateUse when managing Alibaba Cloud Quan Miao (AiMiaoBi) via OpenAPI/SDK, including the user asks for Alibaba Cloud MiaoBi content operations, including listing resources, creating/updating configurations, querying runtime status, and diagnosing API or workflow failures.
- aliyun-aimiaobi-generate-testSmoke test for aliyun-aimiaobi-generate. Validate minimal authentication, API reachability, and one read-only query path.
- aliyun-airec-manageUse when managing Alibaba Cloud AIRec (Airec) via OpenAPI/SDK, including the user needs recommendation-engine resource operations in Alibaba Cloud, including list/create/update flows, status inspection, and troubleshooting AIRec configuration or runtime issues.
- aliyun-airec-manage-testSmoke test for aliyun-airec-manage. Validate minimal authentication, API reachability, and one read-only query path.
- aliyun-alb-manageUse when managing and troubleshoot Alibaba Cloud ALB (Application Load Balancer), including the user asks to inspect, create, change, or debug ALB instances, listeners, server groups, rules, certificates, ACLs, security policies, or health checks in Alibaba Cloud.
- aliyun-alb-manage-testSmoke test for Alibaba Cloud ALB skill. Validates SDK auth, script compilation, list instances, and health check flows.