👉 Project Official Website: https://www.python-office.com/ 👈

👉 Communication Group for This Open Source Project 👈

atomgit star github star gitee star PyPI Downloads AI Communication Group

Snipaste_2025-10-28_23-30-56.jpg

Hello everyone, this is Programmer Wanfeng, currently all in AI Programming Practice.

1. DeepSeek-OCR Released

On October 20, news came that the artificial intelligence team DeepSeek AI officially released the new multimodal model DeepSeek-OCR.

This model with only 3 billion parameters achieves efficient compression of text information through contextual optical compression technology. While maintaining 97% recognition accuracy, it reduces computing costs by 10 times. A single A100-40G graphics card can process more than 200,000 pages of documents per day, completely subverting the performance boundaries of traditional OCR tools.

The model provides five size configurations: Tiny/Small/Base/Large/Gundam. The Gundam version is specially optimized for ultra-high-definition documents, supports 1024×640 mixed size processing, and perfectly handles professional scenarios with multi-column layout and mixed graphics and text.

Snipaste_2025-10-28_23-32-23.jpg

All output results natively support Markdown format. With the built-in bounding box detection function, it can accurately locate the position information of text blocks, tables, and illustrations in the original image, solving the industry pain point of traditional OCR only recognizing text, not understanding layout.

Currently, the model has been fully open sourced on GitHub and HuggingFace, and is licensed under MIT for free commercial use. Developers can directly load and use it through the transformers library. The official also provides auxiliary tools such as PDF to image conversion and batch processing scripts, so even non-professional technical personnel can deploy it quickly.

Code Example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
from transformers import AutoModel, AutoTokenizer
import torch
import os
os.environ["CUDA_VISIBLE_DEVICES"] = '0'
model_name = 'deepseek-ai/DeepSeek-OCR'

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name, _attn_implementation='flash_attention_2', trust_remote_code=True, use_safetensors=True)
model = model.eval().cuda().to(torch.bfloat16)

# prompt = "<image>\nFree OCR. "
prompt = "<image>\n<|grounding|>Convert the document to markdown. "
image_file = 'your_image.jpg'
output_path = 'your/output/dir'

res = model.infer(tokenizer, prompt=prompt, image_file=image_file, output_path = output_path, base_size = 1024, image_size = 640, crop_mode=True, save_results = True, test_compress = True)

2. Play with poocr with Zero Foundation: Batch Invoice Recognition in 3 Lines of Code

Faced with professional models like DeepSeek-OCR, non-technical users may be intimidated.

But with the poocr tool encapsulated by Tencent Cloud OCR interface, ordinary people can implement batch invoice recognition with only 3 lines of Python code. The monthly free quota of 1000 times fully meets personal office needs.

Preparation: Complete Environment Configuration in 3 Minutes

First, install the poocr library through Alibaba Cloud mirror:

1
pip install -i https://mirrors.aliyun.com/pypi/simple/ poocr -U

Register a Tencent Cloud account and activate the OCR service, create a key on the API key management page, and obtain SecretId and SecretKey. Please keep the key information properly to avoid security risks caused by leakage.

Core Code: Batch Recognition in One Command

Create a Python file and enter the following code:

1
2
3
4
5
6
7
8
9
10
11
12
13
import poocr

# Replace with your Tencent Cloud key
r_id = 'Your SecretId'
r_key = 'Your SecretKey'

# Batch recognize invoices in the specified folder and export to Excel
poocr.ocr2excel.VatInvoiceOCR2Excel(
input_path=r'C:\Invoice Image Folder', # Directory storing invoice images
output_path=r'C:\Recognition Results', # Save path for exported Excel
id=r_id,
key=r_key
)

The input_path in the code supports multiple formats such as JPG, PNG, PDF, etc. The program will automatically traverse all files in the directory. After recognition is completed, an Excel table containing 18 key information such as invoice code, number, date, amount, tax amount, etc. will be generated at the location specified by output_path, with an accuracy rate of over 99%.

For electronic invoices in PDF format, the program will automatically split and process by page; when encountering duplicate invoices, the system will automatically deduplicate by invoice number to avoid data redundancy. The recognition result Excel contains a confidence score, which is convenient for users to quickly verify low-confidence fields.

This open source tool maintained by developer Programmer Wanfeng:

It has integrated more than 100 recognition scenarios of Tencent Cloud OCR. In addition to value-added tax invoices, it also supports structured extraction of documents such as train tickets, ID cards, business licenses, etc. With the desktop version program it provides, users with zero coding foundation can also complete batch processing through drag-and-drop operations, truly realizing technology democratization.

From the technological breakthrough of DeepSeek-OCR to the usability optimization of poocr, OCR technology is undergoing a transformation from professional tools to public applications.

Whether it is document archiving for corporate finance, or reimbursement processing for individual users, these open source tools are lowering the threshold of digitalization through technological innovation, allowing AI efficiency tools to truly enter daily life.



Also, everyone please go give Xiaoming's Xiaohongshu account👇 a like~! I don't want to work hard anymore, I want to eat soft rice.

Xiaohongshu: Xiaoming Who Loves Hot Pot

Scan to Get Red Envelope

Meituan Red Envelope

Didi Red Envelope

Programmer Wanfeng focuses on AI programming training. Beginners can start doing AI projects after watching the tutorial "30 Lectures · AI Programming Training Camp" that he collaborated on with Turing Community.

🎓 AI 编程实战课程

想系统学习 AI 编程?程序员晚枫的 AI 编程实战课 帮你从零上手!