

大家好,这里是程序员晚枫,正在all in AI编程实战。
第18讲:AI图像处理——让AI看懂图片
AI能对图片做什么?
- 识别图片内容(物体、文字、人脸)
- 生成图片描述
- OCR文字识别
- 图片风格转换
1、安装图像处理库
1
| pip install openai python-office Pillow
|
2、AI图片理解
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
| import base64 from openai import OpenAI
client = OpenAI(api_key="你的Key", base_url="https://api.deepseek.com")
response = client.chat.completions.create( model="deepseek-chat", messages=[ {"role": "user", "content": [ {"type": "text", "text": "请描述这张图片的内容"}, {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}} ]} ] ) print(response.choices[0].message.content)
|
3、OCR文字识别
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
| import office
text = office.ocr.read_image("发票.jpg") print(f"识别结果:{text}")
response = client.chat.completions.create( model="deepseek-chat", messages=[ {"role": "system", "content": "你是文字校对专家"}, {"role": "user", "content": f"请校对以下OCR识别结果,修正错别字:\n{text}"} ] ) corrected = response.choices[0].message.content print(f"校正后:{corrected}")
|
4、批量图片重命名
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
| import office from openai import OpenAI
client = OpenAI(api_key="你的Key", base_url="https://api.deepseek.com") import os
img_dir = "照片文件夹" for filename in os.listdir(img_dir): if filename.endswith((".jpg", ".png")): filepath = os.path.join(img_dir, filename) text = office.ocr.read_image(filepath) response = client.chat.completions.create( model="deepseek-chat", messages=[ {"role": "user", "content": f"根据以下内容,给这张图片取一个简短的文件名(中文,不超过10个字):\n{text[:200]}"} ] ) new_name = response.choices[0].message.content.strip() print(f"{filename} -> {new_name}.jpg")
|
5、图片格式批量转换
1 2 3 4 5
| import office
office.image.img2pdf("./images", output_dir="./pdfs") office.image.add_watermark("照片.jpg", "晚枫出品", "水印版.jpg")
|
6、实战:发票信息提取器
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
| import office from openai import OpenAI import json
client = OpenAI(api_key="你的Key", base_url="https://api.deepseek.com")
def extract_invoice(image_path): """从发票图片中提取关键信息""" text = office.ocr.read_image(image_path) response = client.chat.completions.create( model="deepseek-chat", messages=[{ "role": "user", "content": f"""从以下发票文本中提取信息,以JSON格式返回: {{ "invoice_number": "发票号码", "date": "开票日期", "amount": "金额", "seller": "销售方", "buyer": "购买方" }} 文本:{text}""" }], temperature=0 ) return json.loads(response.choices[0].message.content)
result = extract_invoice("发票.jpg") print(f"发票号:{result['invoice_number']}") print(f"金额:{result['amount']}")
|
下讲预告
学会了AI图像处理,下一讲我们学 AI语音处理——让AI听懂你说的话。
敬请期待!
程序员晚枫专注AI编程培训,小白看完他和图灵社区合作的教程《30讲 · AI编程训练营》就能上手做AI项目。
前3讲可以试听,试听链接:https://www.bilibili.com/cheese/play/ss982042944