首页 > 其他分享> > LayoutParser ------ 检测相关接口

LayoutParser ------ 检测相关接口

2022-03-01 10:32:45 作者：互联网

LayoutParser 文本版面分析工具包

作者：elfin 参考资料来源：GitHub

一、布局目标检测
二、Interval区间布局查询
三、LP的ocr文字识别
四、BBox绘制
五、模型选择
六、提交自己的模型到LP项目

LayoutParser是一个版面分析工具包，它提供了布局检测、OCR识别、布局分析等接口，项目可以从https://github.com/Layout-Parser/layout-parser进行获取

一、布局目标检测

import warnings
import layoutparser as lp
from PIL import Image
warnings.filterwarnings("ignore")
config = "lp://PubLayNet/tf_efficientdet_d0/config"
model = lp.EfficientDetLayoutModel(
    config_path=config,
    model_path="./publaynet-tf_efficientdet_d0.pth.tar"
)
image = Image.open("images_1.png")
layout = model.detect(image)

layout记录了TextBlock列表，TextBlock对象中包含面积area、边框block信息、坐标coordinates、高度height、宽度width、id、next、顶点坐标points、置信度score、文本text、类型type。

Top---Bottom

二、Interval区间布局查询

此类描述了某个轴上一个区间的对象，另一轴上的基础画布相同。参数分别为：

start：数值类型
end：相同轴的结束坐标
axis：设置要划分区间的轴，必须是"x"或者"y"
canvas_height：画布的高
canvas_width：画布的宽

使用案例

left_column = lp.Interval(0, image_width / 2, axis="x")
left = layout.filter_by(left_column, center=True)

通过这两行代码我们就得到了位于页面左侧的所有TextBlock对象。

Top---Bottom

三、LP的ocr文字识别

TesseractAgent、GCVAgent这两个是项目自带的处理器，但是前者需要单独本地安装，前者国内使用可能会引擎连接超时。根据使用经验来说，TesseractAgent使用效果一般，建议还是使用自己的ocr模型。

ocr_agent = lp.TesseractAgent()
for left_region in left:
    img_seg = left_region.crop_image(image)
    text = ocr_agent.detect(img_seg)

注意这里的image如果是PIL库的Image读入的，会报错，'PngImageFile' object is not subscriptable

Top---Bottom

四、BBox绘制

lp.draw_box(image, layout, box_width=1, show_element_id=True, box_alpha=0.25)

自己实现边框、文字绘制：

for bbox in layout:
    cv.rectangle(image, 
                 (int(bbox.coordinates[0]), int(bbox.coordinates[1])),
                 (int(bbox.coordinates[2]), int(bbox.coordinates[3])), 
                 (0, 255, 0), 2)
    cv.putText(image, str(bbox.id), 
               (int(bbox.coordinates[0]), int(bbox.coordinates[1])),
               cv.FONT_HERSHEY_SIMPLEX, 1.5, (0, 0, 255), 2)

Top---Bottom

五、模型选择

参考地址：https://layout-parser.github.io/platform/

所有模型配置直接复制即可：

lp.AutoModel("lp://detectron2/TableBank/faster_rcnn_R_50_FPN_3x")
lp.AutoModel("lp://detectron2/PubLayNet/mask_rcnn_X_101_32x8d_FPN_3x")

Top---Bottom

六、提交自己的模型到LP项目

在https://github.com/Layout-Parser/platform下有详细的介绍，主要分为：

训练自己的检测模型
提交拉取请求
提交模型细节到开放平台

Top---Bottom

完！

标签：layout,Bottom,image,接口,bbox,lp,LayoutParser,------,coordinates
来源： https://www.cnblogs.com/dan-baishucaizi/p/15949028.html