github star gitee star atomgit star PyPI Downloads AI编程 AI交流群

大家好,这里是程序员晚枫,正在all in AI编程实战

第18讲:AI图像处理——让AI看懂图片

AI能对图片做什么?

  • 识别图片内容(物体、文字、人脸)
  • 生成图片描述
  • OCR文字识别
  • 图片风格转换

1、安装图像处理库

1
pip install openai python-office Pillow

2、AI图片理解

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import base64
from openai import OpenAI

client = OpenAI(api_key="你的Key", base_url="https://api.deepseek.com")

# 方法1:通过URL识别
response = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "user", "content": [
{"type": "text", "text": "请描述这张图片的内容"},
{"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
]}
]
)
print(response.choices[0].message.content)

3、OCR文字识别

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import office

# 用python-office做OCR
text = office.ocr.read_image("发票.jpg")
print(f"识别结果:{text}")

# 结合AI理解和校正
response = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "system", "content": "你是文字校对专家"},
{"role": "user", "content": f"请校对以下OCR识别结果,修正错别字:\n{text}"}
]
)
corrected = response.choices[0].message.content
print(f"校正后:{corrected}")

4、批量图片重命名

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import office
from openai import OpenAI

client = OpenAI(api_key="你的Key", base_url="https://api.deepseek.com")
import os

img_dir = "照片文件夹"
for filename in os.listdir(img_dir):
if filename.endswith((".jpg", ".png")):
filepath = os.path.join(img_dir, filename)
# OCR识别
text = office.ocr.read_image(filepath)
# AI生成文件名
response = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "user", "content": f"根据以下内容,给这张图片取一个简短的文件名(中文,不超过10个字):\n{text[:200]}"}
]
)
new_name = response.choices[0].message.content.strip()
print(f"{filename} -> {new_name}.jpg")

5、图片格式批量转换

1
2
3
4
5
import office

# 批量转换图片格式
office.image.img2pdf("./images", output_dir="./pdfs")
office.image.add_watermark("照片.jpg", "晚枫出品", "水印版.jpg")

6、实战:发票信息提取器

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import office
from openai import OpenAI
import json

client = OpenAI(api_key="你的Key", base_url="https://api.deepseek.com")

def extract_invoice(image_path):
"""从发票图片中提取关键信息"""
text = office.ocr.read_image(image_path)

response = client.chat.completions.create(
model="deepseek-chat",
messages=[{
"role": "user",
"content": f"""从以下发票文本中提取信息,以JSON格式返回:
{{
"invoice_number": "发票号码",
"date": "开票日期",
"amount": "金额",
"seller": "销售方",
"buyer": "购买方"
}}
文本:{text}"""
}],
temperature=0
)
return json.loads(response.choices[0].message.content)

result = extract_invoice("发票.jpg")
print(f"发票号:{result['invoice_number']}")
print(f"金额:{result['amount']}")

下讲预告

学会了AI图像处理,下一讲我们学 AI语音处理——让AI听懂你说的话。

敬请期待!


程序员晚枫专注AI编程培训,小白看完他和图灵社区合作的教程《30讲 · AI编程训练营》就能上手做AI项目。

前3讲可以试听,试听链接:https://www.bilibili.com/cheese/play/ss982042944