Free Chinese OCR — 中文 PDF to Text

Free Chinese OCR supporting both Simplified (简体) and Traditional (繁體) characters. Extract text from PDF, images, scanned books. Browser-based, no upload.

About Chinese OCR

中文 OCR — Chinese OCR extracts both Simplified (简体中文) and Traditional (繁體中文) characters from scanned PDFs and images. Our engine handles the full CJK Unified Ideographs range — over 20,000 common Hanzi (汉字/漢字) — plus punctuation (,。!?「」『』), pinyin annotations, and mixed Chinese-English documents common in academic papers, tech manuals, and business contracts.

我们使用 Tesseract 5 的 chi_sim 和 chi_tra LSTM 模型,针对大陆简体(GB 18030)和台湾/香港繁体(Big5)分别优化。整个过程在您的浏览器中运行 — 您的合同、身份证、银行对账单永远不会上传到服务器。免费,无需注册,无水印。

Key Features

How to Use Free Chinese OCR — 中文 PDF to Text

  1. Step 1: 拖放您的中文 PDF 或图片 (JPG/PNG/TIFF supported)
  2. Step 2: Select Simplified (简体) or Traditional (繁體) — detected automatically where possible
  3. Step 3: Add English as secondary for mixed bilingual documents
  4. Step 4: 点击"提取" — Hanzi recognised page-by-page with live progress
  5. Step 5: 复制文本或下载 .docx / searchable PDF with Chinese text layer

Who Uses This Tool

Why Choose PDF AI Tools

We've built PDF AI Tools to replace expensive desktop software like Adobe Acrobat for 95% of common document workflows — at zero cost to you. Unlike competitors who gate features behind paywalls, add watermarks, or limit file sizes, our tools are genuinely free and genuinely unlimited. Your privacy matters: files processed client-side in your browser never touch our servers, and even AI-powered features use encrypted, auto-deleting processing pipelines.

Frequently Asked Questions

简体和繁体有什么区别?应该选哪个?

Simplified (简体) is used in Mainland China and Singapore. Traditional (繁體) is used in Taiwan, Hong Kong, and Macau. A 繁體 document scanned as 简体 will produce garbled output — pick the right one. If unsure, look at 国/國, 学/學, 语/語 — if you see the simpler left variant, it's Simplified.

Can it read handwritten Chinese (手写体)?

No. Tesseract's Chinese model is trained on printed text only. Handwritten Hanzi (especially cursive 行书/草书) requires dedicated handwriting models (e.g., Baidu's OCR or Google Vision) which we're evaluating for a future release.

竖排古文 (vertical classical Chinese) 支持吗?

Yes — vertical layout is auto-detected. Classical texts like 史记, 论语, and 四书五经 scans process correctly. Accuracy on pre-1950 fonts and stone-carving rubbings may be lower due to glyph variance.

How accurate is it on Chinese newspapers?

On clean printed Simplified (人民日报, 南方周末) accuracy is 93-96%. Traditional Chinese newspapers (聯合報, 蘋果日報) reach 91-95%. Very small font sizes (< 8pt) or low-contrast scans drop to 80-85%.

我的身份证或合同会被上传吗?

绝对不会。所有 OCR 处理都在您的浏览器本地进行 — 没有文件上传到我们的服务器。This is especially important for sensitive Chinese documents like 身份证, 户口本, 银行对账单, 劳动合同 where privacy is non-negotiable.