Free Chinese OCR — 中文 PDF to Text

Free Chinese OCR supporting both Simplified (简体) and Traditional (繁體) characters. Extract text from PDF, images, scanned books. Browser-based, no upload.

About Chinese OCR

中文 OCR — Chinese OCR extracts both Simplified (简体中文) and Traditional (繁體中文) characters from scanned PDFs and images. Our engine handles the full CJK Unified Ideographs range — over 20,000 common Hanzi (汉字/漢字) — plus punctuation (，。！？「」『』), pinyin annotations, and mixed Chinese-English documents common in academic papers, tech manuals, and business contracts.

我们使用 Tesseract 5 的 chi_sim 和 chi_tra LSTM 模型，针对大陆简体（GB 18030）和台湾/香港繁体（Big5）分别优化。整个过程在您的浏览器中运行 — 您的合同、身份证、银行对账单永远不会上传到服务器。免费，无需注册，无水印。

Key Features

Simplified Chinese (简体) — mainland China standard, 20,000+ Hanzi covered
Traditional Chinese (繁體) — Taiwan / Hong Kong / Macau standard
Vertical text layout detection — classical texts, manga-style documents, old newspapers
Mixed Chinese-English — bilingual contracts, academic papers, software manuals
CJK punctuation preserved — 全角 (full-width) ，。；：「」『』 kept intact
Pinyin (拼音) and Zhuyin (注音) annotations captured when present
In-browser processing — 身份证 (ID cards), 合同 (contracts), 户口本 (household registration) never upload

How to Use Free Chinese OCR — 中文 PDF to Text

Step 1: 拖放您的中文 PDF 或图片 (JPG/PNG/TIFF supported)
Step 2: Select Simplified (简体) or Traditional (繁體) — detected automatically where possible
Step 3: Add English as secondary for mixed bilingual documents
Step 4: 点击"提取" — Hanzi recognised page-by-page with live progress
Step 5: 复制文本或下载 .docx / searchable PDF with Chinese text layer

Who Uses This Tool

律师 digitising Chinese legal contracts and court judgments
International students extracting text from 教科书 (textbooks) for translation
Researchers processing 古籍 (classical Chinese texts) and historical archives
Business professionals converting scanned 合同 (contracts) to editable format
Taiwanese / Hong Kong users working with 繁體 government forms and legal filings

Why Choose PDF AI Tools

We've built PDF AI Tools to replace expensive desktop software like Adobe Acrobat for 95% of common document workflows — at zero cost to you. Unlike competitors who gate features behind paywalls, add watermarks, or limit file sizes, our tools are genuinely free and genuinely unlimited. Your privacy matters: files processed client-side in your browser never touch our servers, and even AI-powered features use encrypted, auto-deleting processing pipelines.

Frequently Asked Questions

简体和繁体有什么区别？应该选哪个？

Simplified (简体) is used in Mainland China and Singapore. Traditional (繁體) is used in Taiwan, Hong Kong, and Macau. A 繁體 document scanned as 简体 will produce garbled output — pick the right one. If unsure, look at 国/國, 学/學, 语/語 — if you see the simpler left variant, it's Simplified.

Can it read handwritten Chinese (手写体)?

No. Tesseract's Chinese model is trained on printed text only. Handwritten Hanzi (especially cursive 行书/草书) requires dedicated handwriting models (e.g., Baidu's OCR or Google Vision) which we're evaluating for a future release.

竖排古文 (vertical classical Chinese) 支持吗?

Yes — vertical layout is auto-detected. Classical texts like 史记, 论语, and 四书五经 scans process correctly. Accuracy on pre-1950 fonts and stone-carving rubbings may be lower due to glyph variance.

How accurate is it on Chinese newspapers?

On clean printed Simplified (人民日报, 南方周末) accuracy is 93-96%. Traditional Chinese newspapers (聯合報, 蘋果日報) reach 91-95%. Very small font sizes (< 8pt) or low-contrast scans drop to 80-85%.

我的身份证或合同会被上传吗？

绝对不会。所有 OCR 处理都在您的浏览器本地进行 — 没有文件上传到我们的服务器。This is especially important for sensitive Chinese documents like 身份证, 户口本, 银行对账单, 劳动合同 where privacy is non-negotiable.