快速判断
使用GLM-OCR API从图像中提取文本。支持高精度OCR、表格识别、公式提取和手写识别的图片和PDF。每当用户想要从图像中提取文本、对图片进行OCR处理、扫描文档、将图像转换为文本或处理任何图像文件以获取其文本内容时,都可以使用此技能。
适合任务
- 按 ModelScope 收录说明完成平台、开发或工作流任务。
- 通过下载包离线保存 Skill 内容。
- 结合下载量、访问量和喜欢数评估优先级。
输入与输出
输入:任务目标、上下文材料、平台信息、文件路径、约束条件或需要处理的内容。
输出:按 Skill 说明生成的文档、代码、检查结果、计划、建议或操作步骤。
示例任务
- 使用 glmocr 帮我完成当前任务,并先确认必要上下文。
- 根据 glmocr 的说明,列出操作步骤和风险检查点。
安装方式
- 下载本站提供的 Skill ZIP 并解压。
- 把解压后的 Skill 目录放入当前 AI 工具支持的
skills目录。 - 如需在线查看原始内容,可打开 GitHub 的
SKILL.md。
风险边界
使用前请检查权限、外部依赖和要处理的数据类型。第三方平台数据、支付、部署、账号和密钥相关内容应先核对官方说明。
SKILL.md 文档介绍
GLM-OCR Text Extraction Skill
Extract text from images and PDFs using the GLM-OCR layout parsing API.
When to Use
- Extract text from images (PNG, JPG, PDF)
- Convert screenshots to text
- Process scanned documents
- OCR photos containing text (including handwritten text)
- Recognize tables and formulas in documents
- User mentions "OCR", "文字识别", "文档解析"
Key Features
- Table recognition: Detects and converts tables to Markdown format
- Formula extraction: LaTeX format output
- Handwriting support: Strong recognition for handwritten text
- Local file & URL: Supports both local files and remote URLs
Resource Links
| Resource | Link |
|----------|------|
| Get API Key | https://www.bigmodel.cn/usercenter/proj-mgmt/apikeys |
| GitHub | https://github.com/zai-org/GLM-OCR |
Prerequisites
- ZHIPU_API_KEY configured (see Setup below)
Security Notes
- No runtime package installation is performed by the scripts.
- OCR requests use the fixed official GLM endpoint and do not accept custom API URLs.
- Only
ZHIPU_API_KEY(and optional timeout) is read from environment variables.
⛔ MANDATORY RESTRICTIONS - DO NOT VIOLATE ⛔
1. ONLY use GLM-OCR API - Execute the script python scripts/glm_ocr_cli.py
2. NEVER parse documents directly - Do NOT try to extract text yourself
3. NEVER offer alternatives - Do NOT suggest "I can try to analyze it" or similar
4. IF API fails - Display the error message and STOP immediately
5. NO fallback methods - Do NOT attempt text extraction any other way
Setup
1. Get your API key: https://www.bigmodel.cn/usercenter/proj-mgmt/apikeys
2. Configure:
python scripts/config_setup.py setup --api-key YOUR_KEYHow to Use
Extract from URL
python scripts/glm_ocr_cli.py --file-url "URL provided by user"Extract from Local File
python scripts/glm_ocr_cli.py --file /path/to/image.jpgSave result to file (recommended)
python scripts/glm_ocr_cli.py --file-url "URL" --output result.jsonCLI Reference
python {baseDir}/scripts/glm_ocr_cli.py (--file-url URL | --file PATH) [--output FILE] [--pretty]| Parameter | Required | Description |
|-----------|----------|-------------|
| --file-url | One of | URL to image/PDF |
| --file | One of | Local file path to image/PDF |
| --output, -o | No | Save result JSON to file |
| --pretty | No | Pretty-print JSON output |
Response Format
{
"ok": true,
"text": "# Extracted text in Markdown...",
"layout_details": [[...]],
"result": { "raw_api_response": "..." },
"error": null,
"source": "/path/to/file.jpg",
"source_type": "file"
}Key fields:
ok— whether extraction succeededtext— extracted text in Markdown (use this for display)layout_details— layout analysis detailsresult— raw API responseerror— error details on failure
Error Handling
API key not configured:
Error: ZHIPU_API_KEY not configured. Get your API key at: https://www.bigmodel.cn/usercenter/proj-mgmt/apikeys→ Show exact error to user, guide them to configure
Authentication failed (401/403): API key invalid/expired → reconfigure
Rate limit (429): Quota exhausted → inform user to wait
File not found: Local file missing → check path
Reference
references/output_schema.md— detailed output format specification