【测试】Umi-OCR 支持数学公式识别啦

### 预览截图：

![image](https://github.com/hiroi-sora/Umi-OCR/assets/56373419/f4f14a94-e429-4b2c-8726-8bcaa84287a7)


### 预览输出：

gradients in at least two (significantly) different orientations are the easiest to localiz, as shown schematically in Figure 7.4a.
These intuitions can be formalized by looking at the simplest possible matching criterion for comparing two image patches, i., their (weighted) summed square difference,

$`
E_{\mathrm{W S S D}} ( {\bf u} )=\sum_{i} w ( {\bf x}_{i} ) [ I_{1} ( {\bf x}_{i}+{\bf u} )-I_{0} ( {\bf x}_{i} ) ]^{2},
`$ (7.1)

where $I_{0}$ and $I_{1}$ are the two images being compared, ${\mathbf u}=( u, v )$ is the displacement vector,  $w ( {\bf x} )$ is a spatially varying weighting (or window) function, and the summation i is over all the pixels in the patch. Note that this is the same formulation we later use to estimate motion between complete images (Section 9.1).
When performing feature detection, we do not know which other image locations the feature will end up being matched against. Therefore, we can only compute how stable this

---

# 前言

[Pix2Text](https://github.com/breezedeus/Pix2Text) 是一个开源OCR项目，能够识别既包含文字又包含**数学公式**的混合图片。

我将其封装为插件，可导入 Umi-OCR v2 任意版本使用。支持 win7 x64 及以上的系统。

Pix2Text插件的用法与Paddle、Rapid插件相同，支持截图OCR和批量OCR。你可以同时导入这些插件，但不能同时启用。你需要在软件中切换它们。

**P2T插件 当前为测试阶段**，可能不稳定或有bug。遇到任何相关问题，可以在本贴反馈。

请注意：下载插件后，第一次执行OCR时，P2T插件需要大量时间（10~60s）进行初始化、构建缓存，请耐心等待。后续OCR速度将恢复正常。

P2T是离线的，无需网络即可使用。

# 如何导入插件

1. 前往 https://github.com/hiroi-sora/Umi-OCR_plugins/releases
2. 下载 **win7_x64_Pix2Text.v1.0.7z** （注意版本号，建议选择最新版本）
3. 解压，放入 `UmiOCR-data/plugins`
4. 打开 Umi-OCR ，全局设置→文字识别→接口改为`Pix2Text`→点击**应用修改**

务必点击 **应用修改** ！

5. 然后，回到截图/批量标签页，像往常一样使用Umi即可。

![image](https://github.com/hiroi-sora/Umi-OCR/assets/56373419/e77a23cc-2eb1-41c4-b05b-daee98e95621)

# P2T的特色

与 Paddle、Rapid相比，P2T插件有以下**优点**：

1. 支持中+英+公式混合识别。
2. 中英场景下，不会出现空格丢失现象。
3. 中英场景下，识别速度较快。

P2T插件也存在这些**不足**：

1. 初始化时间较长。首次OCR任务，可能需10~60s时间加载。
2. 体积较大。极限压缩下约350MB，部署时约1.6GB。
3. 对于复杂排版的图片，可能文本框检测的精度略低于别的OCR插件。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【测试】Umi-OCR 支持数学公式识别啦 #254

预览截图：

预览输出：

前言

如何导入插件

P2T的特色

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development