PaddleOCR UI Test - Work Journal

开源链接

2026-04-06

Security Check

Architecture Review & Improvement Planning

从 ui-ux-pro-max skill 借鉴架构思路，制定 5 个改进方向：

规则数据化 — 将硬编码阈值提取为 JSON 规则文件
行业 Profile 预设 — 不同产品类型的差异化测试配置
基线持久化 + 增量 Diff — 回归测试核心场景
Checklist 转可执行断言 — L4/L6 检测增强
CLI 工具增强 — CI 模式、独立工具脚本

Phase 1: 基础文件创建

创建 rules/ 目录及 6 个规则 JSON 文件：
- text-consistency.json (L1)
- layout-anomaly.json (L2)
- dom-ocr-crossval.json (L3)
- accessibility.json (L4)
- i18n.json (L5)
- dynamic-content.json (L6)
创建 profiles/ 目录及 6 个预设 JSON 文件：
- saas.json, ecommerce.json, form.json, content.json, dashboard.json, mobile.json
创建 scripts/annotate_screenshot.py — 标注截图生成
创建 scripts/baseline_diff.py — 基线回归对比引擎

Phase 2-3-5: 核心脚本重构

scripts/ui_test.py — 全面重构：
- 新增 load_rules() / load_profile() / apply_rule_overrides() 函数
- UITestEngine 增加 rules 参数，所有 L1-L6 检测方法改为数据驱动
- L2 增强：元素重叠检测 (IoU)、触控区域尺寸检查
- L3 增强：LCS 模糊匹配替代精确集合差
- L4 增强：缺失 label 检测、canvas 渲染文字检测、emoji 作图标检测
- L5 增强：多语言可扩展（zh/en/ja/ko）
- L6 增强：--actions 参数支持自动化动作序列 (click/wait/type/screenshot)
- 集成 --baseline / --baseline-file 基线工作流
- 集成 --annotate 标注截图
- 集成 --source-map 源码位置映射
scripts/compare_ocr_dom.py — 增强：
- 新增 --ci 模式（exit code 1 当发现问题）
- 新增 --fail-on 参数控制失败严重级别
- 新增 --rules 参数加载规则覆盖
- 支持 ignore_patterns 过滤

SKILL.md 迭代

第一版 → 第二版：精简为 Agent 决策导向

333 行 → 141 行
从人类 CLI 手册改为 5 个控制旋钮结构
每个旋钮对应一个决策维度（测试范围、页面类型、规则调优、具体期望、回归对比）

第二版 → 第三版：柔性适配层

重写 Integration 部分，定义输入/输出契约
6 种上游适配模式（dogfood 自由文本、ui-ux-pro-max 设计系统、dev-browser 页面状态等）
协作模式决策树（根据用户意图自动选择）
dev-browser 会话复用方案（避免重复启动浏览器）
三 skill 协作流程：ui-ux-pro-max → dogfood → paddleocr-ui-test

Rules & Profiles 元数据增强

所有 rules/*.json 添加 agent_hints 字段：
- when_to_change — 何时调整该规则
- safe_defaults — 安全默认值说明
- tunable_params — 可调参数列表
所有 profiles/*.json 添加 when_to_use 字段：
- 描述什么场景下使用该 profile

文档更新

README.md — 添加架构图、5 个控制旋钮、完整项目结构、CI/CD 示例
README.zh-CN.md — 同步更新中文版本，包含相同架构图和内容
中英文 README 互链

国际化

将所有 rules/ 和 profiles/ 中的中文注释/描述翻译为英文

Git 推送

远程仓库：git@github.com:aotenjou/paddleOCR-UI-test.git
共 4 次 commit：
1. d332dd8 — refactor: data-driven rules, profiles, baseline testing, and flexible integration layer
2. 8225493 — docs: update Chinese README with new architecture diagram and 5 control knobs
3. 068c566 — docs: sync English README with Chinese version, add architecture diagram and 5 control knobs
4. 4f798a3 — refactor: translate all Chinese comments and descriptions in rules/profiles to English

最终架构

paddleOCR-UItest/
├── SKILL.md                          # Agent 指令（5 个控制旋钮 + 柔性适配层）
├── skill.json                        # Skill 元数据
├── README.md                         # 英文文档
├── README.zh-CN.md                   # 中文文档
├── rules/                            # 数据驱动规则（6 个文件，含 agent_hints）
│   ├── text-consistency.json         # L1: 匹配策略、阈值、忽略模式
│   ├── layout-anomaly.json           # L2: 溢出、重叠、触控区域
│   ├── dom-ocr-crossval.json         # L3: 模糊匹配、数量差异
│   ├── accessibility.json            # L4: alt、label、canvas、emoji
│   ├── i18n.json                     # L5: 语言正则、误判词
│   └── dynamic-content.json          # L6: 状态转换、追踪数
├── profiles/                         # 行业预设（6 个文件，含 when_to_use）
│   ├── saas.json                     # 后台管理系统
│   ├── ecommerce.json                # 电商网站
│   ├── form.json                     # 表单/登录页
│   ├── content.json                  # 博客/文章页
│   ├── dashboard.json                # 数据大屏
│   └── mobile.json                   # 移动端 H5
├── scripts/
│   ├── ui_test.py                    # 主测试脚本（规则加载、profile、baseline、actions）
│   ├── compare_ocr_dom.py            # OCR vs DOM 交叉验证（支持 --ci）
│   ├── baseline_diff.py              # 基线回归对比引擎
│   ├── annotate_screenshot.py        # 标注截图生成
│   └── source_map_lookup.py          # 源码位置映射
├── references/
│   ├── ocr-api.md                    # PaddleOCR API 配置
│   ├── a11y-tree.md                  # 无障碍树格式
│   └── test-patterns.md              # 常见测试模式 + CI/CD 示例
└── examples/
    └── test-config.json              # 测试配置示例

5 个控制旋钮

旋钮	参数	Agent 何时调
测试范围	`--levels`	用户说"检查文字"→L1；"全面检查"→全开
页面类型	`--profile`	根据页面类型自动选 levels/viewport/规则
规则调优	`rules/*.json`	需要调整阈值/策略时直接修改规则文件
具体期望	`--config`	用户有具体文字预期时生成 config
回归对比	`--baseline`	用户说"保存基准"或"和之前比"

与其他 Skill 的协作

ui-ux-pro-max  →  定义设计意图 (颜色/排版/文案/无障碍要求)
     ↓
dogfood        →  探索实际页面 (发现问题/意外行为)
     ↓
paddleocr-ui-test →  验证并守卫 (把发现转为自动化回归检查)

dev-browser 是执行引擎：所有 skill 都可用它做页面导航和交互
ui-ux-pro-max 是理想态：定义页面"应该"长什么样
dogfood 是发现机制：找出"实际"有什么问题
本 skill 是验证层：把发现固化为可持续运行的自动化检查