一、1行黑科技

📍 1.1 安装库

【pobd】库是基于百度的API实现各种证件识别并且生成Excel文件的Python库。

1	pip install pobd

📍 1.2 1行代码

1 2	pobd.ocr2excel.divorce_certificate(img_path=input_file, output_excel_path=output_file, api_key=api_key, secret_key=secret_key)

只需这2步，就可以轻松解决这个问题啦！而我们的老朋友 api_key 和 secret_key ，不知道怎么申请的伙伴们，留言区见！

二、爱提问的朋友就要问了：How ？

1、调接口

base64_image = self.image_to_base64(img_path)
request_url = f"https://aip.baidubce.com/rest/2.0/ocr/v1/divorce_certificate?access_token={self.access_token}"
headers = {'Content-Type': 'application/x-www-form-urlencoded'}
data = {
    "image": base64_image
}
response = requests.post(request_url, headers=headers, data=data)
return response.json()

就会得到像这样的数据

  "words_result": {
    "姓名_男": [{"word": "王帆"}],
    "姓名_女": [{"word": "杨丹"}],
    "登记日期": [{"word": "2021年10月25日"}],
    ...
  }
}

2、洗数据

    "姓名_男": res['words_result'].get("姓名_男", [{}])[0].get("word", ""),
    "姓名_女": res['words_result'].get("姓名_女", [{}])[0].get("word", ""),
    "登记日期": res['words_result'].get("登记日期", [{}])[0].get("word", ""),
    ...
}

结构化提取字段 → 转成 DataFrame

3.成表格