其他分享
首页 > 其他分享> > LPRNet翻译

LPRNet翻译

作者:互联网

链接

https://www.52cv.net/?p=854
https://zhuanlan.zhihu.com/p/144530956

Abstract 概要

第一段
This paper proposes LPRNet - end-to-end method for Automatic License Plate Recognition without preliminary character segmentation.
本文提出了一种LPRNet -端到端车牌自动识别方法,该方法不需要进行初步的字符分割。
Our approach is inspired by recent breakthroughs in Deep Neural Networks, and works in real-time with recognition accuracy up to 95% for Chinese license plates: 3 ms/plate on nVIDIA GeForceTMGTX 1080 and 1.3 ms/plate on Intel R CoreTMi7-6700K CPU
我们的方法受到了深度神经网络最近的突破的启发,可以实时识别中国车牌,识别率高达95%:在nVIDIA GeForceTMGTX 1080上为3 ms/牌照,在Intel R CoreTMi7-6700K CPU上为1.3 ms/牌照

第二段
LPRNet consists of the lightweight Convolutional Neural Network, so it can be trained in end-to-end way.
LPRNet由轻量级的卷积神经网络组成,可以端到端的方式进行训练。
To the best of our knowledge, LPRNet is the first real-time License Plate Recognition system that does not use RNNs.
据我们所知,LPRNet是第一个不使用rnn的实时车牌识别系统。
As a result, the LPRNet algorithm may be used to create embedded solutions for LPR that feature high level accuracy even on challenging Chinese license plates.
因此,LPRNet算法可以用于创建嵌入式的LPR解决方案,即使在具有挑战性的中国牌照上也具有高水平的准确性。

1.Introduction 介绍
第一段
Automatic License Plate Recognition is a challenging and important task which is used in traffic management, digital security surveillance, vehicle recognition, parking management of big cities.
车牌自动识别在大城市的交通管理、数字安全监控、车辆识别、停车管理等领域都有着重要的应用。
This task is a complex problem due to many factors which include but are not limited to: blurry images, poor lighting conditions, variability of license plates numbers (including special characters e.g. logograms for China, Japan), physical impact (deformations), weather conditions (see some examples in Fig. 1).
这个任务是一个复杂的问题,因为有很多因素,包括但不限于:模糊的图像,糟糕的光照条件,可变性车牌号码(包括特殊字符,如中国、日本的标识)、物理影响(变形)、天气状况(见图1中的一些例子)。

第二段
license plates numbers (including special characters e.g. logograms for China, Japan), physical impact (deformations), weather conditions (see some examples in Fig. 1).
车牌号码(包括特殊字符,如中国、日本的标识)、物理影响(变形)、天气状况(见图1中的一些例子)。
第三段
This paper tackles the License Plate Recognition problem and introduces the LPRNet algorithm, which is designed to work without pre-segmentation and consequent recognition of characters.
本文针对车牌识别问题,介绍了LPRNet算法,该算法不需要进行字符的预分割和后续识别。
In the present paper, we do not consider License Plate Detection problem, however, for our particular case it can be done through LBP-cascade.
在本文中,我们不考虑车牌检测问题,但在我们的特殊情况下,可以通过lbp级联来实现。
第四段
LPRNet is based on Deep Convolutional Neural Network.
LPRNet是基于深度卷积神经网络的。
Recent studies proved effectiveness and superiority
最近的研究证明了它的有效性和优越性
of Convolutional Neural Networks in many Computer Vision tasks such as image classification, object detection and semantic segmentation.
卷积神经网络在许多计算机视觉任务中的应用,如图像分类、目标检测和语义分割。
However, running most of them on embedded devices still remains a challenging problem.
然而,在嵌入式设备上运行它们仍然是一个具有挑战性的问题。

第五段
LPRNet is a very efficient neural network, which takes only 0.34 GFLops to make a single forward pass.
LPRNet是一种非常高效的神经网络,它只需要0.34 GFLops就可以完成一次前进。
Also, our model is real-time on Intel Core i7-6700K SkyLake CPU with high accuracy on challenging Chinese License plates and can be trained end-to-end.
此外,我们的模型在英特尔酷睿i7-6700K SkyLake CPU上是实时的,对具有挑战性的中国牌照具有很高的准确性,并且可以端到端训练。
Moreover, LPRNet can be partially ported on FPGA, which can free up CPU power for other parts of the pipeline.
此外,LPRNet可以部分移植到FPGA上,这可以为流水线的其他部分释放CPU功率。
Our main contributions can be summarized as follows:
我们的主要贡献可以总结如下:

● LPRNet is a real-time framework for high-quality license plate recognition supporting template and character independent variable-length license plates, performing LPR without character pre-segmentation, trainable end-to-end from scratch for different national license plates.
● LPRNet是一个实时的高质量车牌识别框架,支持模板和字符独立的变长车牌,对不同国家的车牌进行无字符预分割的LPR,从头到尾可训练。

● LPRNet is the first real-time approach that does not use Recurrent Neural Networks and is lightweight enough to run on variety of platforms, including embedded devices
● LPRNet是第一个不使用循环神经网络的实时方法,它足够轻量级,可以在各种平台上运行,包括嵌入式设备
● Application of LPRNet to real traffic surveillance video shows that our approach is robust enough to handle difficult cases, such as perspective and cameradependent distortions, hard lighting conditions, change of viewpoint, etc.
● LPRNet在实际交通监控视频中的应用表明,该方法具有足够的鲁棒性,能够处理诸如视角和摄像机依赖畸变、光照条件恶劣、视点变化等困难情况。

● The rest of the paper is organized as follows.
● 本文的其余部分组织如下。
● Section 2 describes the related work.
● 第2节描述相关的工作。
● In sec. 3 we review the details of the LPRNet model.
● 在第3节中,我们将回顾LPRNet模型的细节。
● Sec. 4 reports the results on Chinese License Plates and includes an ablation study of our algorithm.
● 第四部分报告了中国车牌的结果,并包括对我们算法的消融研究。
● We summarize and conclude our work in sec. 5.
● 我们在第5部分中总结和总结我们的工作。

2.相关工作

第一段
In the earlier works on general LP recognition, such as the pipeline consist of character segmentation and char acter classification stages:
在早期的一般LP识别工作中,如管道包括字符分割和字符分类阶段
Character segmentation typically uses different handcrafted algorithms, combining projections, connectivity and contour based image components.
字符分割通常使用不同的手工算法,结合基于投影、连接性和轮廓的图像组件。
It takes a binary image or intermediate representation as input so character segmentation quality is highly affected by the input image noise, low resolution, blur or deformations.
它以二值图像或中间表示作为输入,因此字符分割的质量受到输入图像噪声、低分辨率、模糊或变形的严重影响。

Character classification typically utilizes one of the optical character recognition (OCR) methods - adopted for LP character set.
字符分类通常使用光学字符识别(OCR)方法之一-采用的LP字符集。
第二段
Since classification follows the character segmentation, end-to-end recognition quality depends heavily on the applied segmentation method.
由于分类是在字符分割的基础上进行的,所以端到端识别的质量很大程度上取决于所采用的分割方法。
In order to solve the problem of character segmentation there were proposed endto-end Convolutional Neural Networks (CNNs) based solutions taking the whole LP image as input and producing the output character sequence.
为了解决字符分割问题,提出了一种基于端到端卷积神经网络(CNNs)的字符分割方法,以整个LP图像作为输入,生成输出字符序列。

第三段
The segmentation-free model in [2] is based on variable length sequence decoding driven by connectionist temporal classification (CTC) loss [3, 4].
[2]中的无分割模型是基于连接主义时序分类(CTC)损耗驱动的变长序列解码[3,4]。
[ ] H. Li and C. Shen, “Reading Car License Plates Using Deep Convolutional Neural Networks and LSTMs,”
arXiv:1601.05610 [cs], Jan. 2016, arXiv: 1601.05610. 2,4
It uses hand-crafted features LBP built on a binarized image as CNN input to produce character classes probabilities.
它使用手工制作的特征LBP建立在二值化图像作为CNN的输入,以产生角色类别的概率。
Applied to all input image positions via the sliding window approach it makes the input sequence for the bi-directional Long-Short Term Memory (LSTM) [5] based decoder.
通过滑动窗口方法应用于所有的输入图像位置,使输入序列用于双向长短期记忆(LSTM)[5]解码器。
Since the decoder output and target character sequence lengths are different, CTC loss is used for the pre-segmentation free end-to-end training.
由于解码器输出和目标字符序列长度不同,使用CTC损耗进行预分割自由端到端训练。
第四段
The model in [6] mostly follows the approach described in [2] except that the sliding window method was replaced by CNN output spatial splitting to the RNN input sequence (”sliding window” over feature map instead of input).
[6]中的模型除了用CNN输出空间分割到RNN输入序列(feature map上的“滑动窗口”而不是输入)来代替滑动窗口方法外,基本遵循了[2]中描述的方法。
[ ] T. K. Cheang, Y. S. Chong, and Y. H. Tay, “Segmentationfree Vehicle License Plate Recognition using ConvNetRNN,” arXiv:1701.06439 [cs], Jan. 2017, arXiv:
1701.06439. 2
第五段
In contrast [7] uses the CNN-based model for the whole LP image to produce the global LP embedding which is decoded to a 11-character-length sequence via 11 fully connected model heads.
而[7]对整个LP图像使用基于cnn的模型产生全局LP嵌入,通过11个全连接的模型头将全局LP嵌入解码为11个字符长度的序列。
[ ] V. Jain, Z. Sasindran, A. Rajagopal, S. Biswas, H. S. Bharadwaj, and K. R. Ramakrishnan, “Deep Automatic License Plate Recognition System,” in Proceedings of the Tenth Indian Conference on Computer Vision, Graphics and Image Processing, ser. ICVGIP ’16. New York, NY, USA: ACM, 2016, pp. 6:1–6:8. 2

Each of the heads is trained to classify the i-th target string character (which is assumed to be padded to the predefined fixed length), so the whole recognition can be done in a single feed-forward pass.
每个头部都经过训练,对第i个目标字符串字符进行分类(假设该字符被填充到预定义的固定长度),因此整个识别可以在一次前馈传递中完成。
It also utilizes the Spatial Transformer Network (STN) [8] to reduce the effect of input image deformations.
该算法还利用空间变压器网络(STN)[8]来降低输入图像变形的影响。
[ ] M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, “Spatial Transformer Networks, arXiv:1506.02025 [cs], Jun. 2015, arXiv: 1506.02025. 2, 3

第56段
The algorithm in [9] makes an attempt to solve both license plate detection and license plate recognition problems by single Deep Neural Network.
[9]中的算法尝试用单个深度神经网络同时解决车牌检测和车牌识别问题。
[ ] H. Li, P. Wang, and C. Shen, “Towards End-to-End Car License Plates Detection and Recognition with Deep Neural Networks,” ArXiv e-prints, Sep. 2017. 2

Recent work [10] tries to exploit synthetic data generation approach based on Generative Adversarial Networks [11] for data generation procedure to obtain large representative license plates dataset.
最近的工作[10]尝试开发基于生成式对抗网络[11]的合成数据生成方法,用于数据生成过程中获取具有代表性的大型车牌数据集。
[ ] X. Wang, Z. Man, M. You, and C. Shen, “Adversarial Generation of Training Examples: Applications to Moving Vehicle License Plate Recognition,” ArXiv e-prints, Jul. 2017. 2
[ ] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio,
“Generative Adversarial Networks,” ArXiv e-prints, Jun. 2014. 2
第七段
In our approach, we avoided using hand-crafted features over a binarized image - instead we used raw RGB pixels as CNN input.
在我们的方法中,我们避免在二值化图像上使用手工制作的特征,而是使用原始的RGB像素作为CNN输入。
The LSTM-based sequence decoder working on outputs of a sliding window CNN was replaced with a fully convolutional model which output is interpreted as character probabilities sequence for CTC loss training and greedy or prefix search string inference.
将基于lstm的序列解码器对滑动窗口CNN的输出进行解码,采用全卷积模型,将输出解释为字符概率序列,用于CTC损失训练和贪婪或前缀搜索字符串推理。
For better performance the pre-decoder intermediate feature map was augmented by the global context embedding as described in [12].
为了获得更好的性能,通过[12]中所述的全局上下文嵌入来增强预解码器中间特征图。
Also the backbone CNN model was reduced significantly using the low computation cost basic building block inspired by SqueezeNet Fire Blocks [13] and Inception Blocks of [14, 15, 16].
此外,利用受SqueezeNet Fire Blocks[13]和Inception Blocks[14,15,16]启发的低计算成本的基本构建块,大大减少了主干CNN模型。
Batch Normalization [17] and Dropout [18] techniques were used for regularization.
使用Batch Normalization[17]和Dropout[18]技术进行正则化。

第八段
LP image input size affects both the computational cost and the recognition quality [19], as a result there is a tradeoff between using high [6] or moderate [7, 2] resolution.
LP图像输入大小影响计算成本和识别质量[19],因此需要在使用高[6]或中等[7,2]分辨率之间进行权衡。
[ ] S. Agarwal, D. Tran, L. Torresani, and H. Farid, “Deciphering Severely Degraded License Plates,” San Francisco, CA, 2017. 2

3 LPRNet

结构设计
第一段
In this section we describe our LPRNet network architecture design in detail.
在本节中,我们将详细描述我们的LPRNet网络架构设计。
In recent studies tend to use parts of the powerful classification networks such as VGG, ResNet or GoogLeNet as ‘backbone‘ for their tasks by applying transfer learning techniques.
在最近的研究中,通过应用迁移学习技术,倾向于使用部分强大的分类网络,如VGG、ResNet或GoogLeNet作为他们任务的“骨干”。
However, this is not the best option for building fast and lightweight networks, so in our case we redesigned main ‘backbone‘ network applying recently discovered architecture tricks.
然而,这并不是构建快速和轻量级网络的最佳选择,所以在我们的案例中,我们应用最近发现的架构技巧重新设计了主要的“骨干”网络。
第二段
The basic building block of our CNN model backbone (Table 2) was inspired by SqueezeNet Fire Blocks [13] and Inception Blocks of [14, 15, 16].
我们CNN模型主干的基本构建模块(表2)的灵感来自于SqueezeNet Fire Blocks[13]和Inception Blocks[14,15,16]。
We also followed the research best practices and used Batch Normalization [17] and ReLU activation after each convolution operation.
我们也遵循研究的最佳实践,并在每次卷积操作后使用Batch Normalization[17]和ReLU激活。

第三段
In a nutshell our design consists of:
简而言之,我们的设计包括:
• location network with Spatial Transformer Layer [8]
•具有空间变压器层[8]的定位网络(optional)(可选)STN
• light-weight convolutional neural network (backbone)
•轻量级卷积神经网络(骨干)
• per-position character classification head
•逐位字符分类头
• character probabilities for further sequence decoding
•进一步序列解码的字符概率
• post-filtering procedure
•后过滤过程

第四段
First, the input image is preprocessed by the Spatial Transformer Layer, as proposed in [8].
首先,按照[8]中提出的空间变换层对输入图像进行预处理。
This step is optional but allows to explore how one can transform the input image to have better characteristics for recognition.
这一步是可选的,但可以探索如何转换输入图像以获得更好的识别特征。
The original LocNet (see the Table 1) architecture was used to estimate optimal transformation parameters.
最初的LocNet(见表1)架构被用来估计最佳的转换参数。

第五段
The backbone network architecture is described in Table 3.
骨干网的结构如表3所示。

The backbone takes a raw RGB image as input and calculates spatially distributed rich features.
主干采用原始RGB图像作为输入,并计算空间分布的丰富特征。
Wide convolution (with 1 × 13 kernel) utilizes the local character context instead of using LSTM-based RNN.
宽卷积(1 × 13核)利用局部字符上下文,而不是使用基于lstm的RNN。
The backbone subnetwork output can be interpreted as a sequence of character probabilities whose length corresponds to the input image pixel width.
主干网的子网络输出可以解释为一个字符概率序列,其长度对应于输入图像的像素宽度。
Since the decoder output and the target character sequence lengths are of different length, we apply the method of CTC loss [20] - for segmentation-free end-to-end training.
由于解码器输出和目标字符序列长度不同,我们采用CTC损失[20]-的方法进行无分割端到端训练。
CTC loss is a well-known approach for situations where input and output sequences are misaligned and have variable lengths.
当输入和输出序列不对齐且长度可变时,CTC损耗是一种众所周知的方法。
Moreover, CTC provides an efficient way to go from probabilities at each time step to the probability of an output sequence.
此外,CTC提供了一种有效的方法,将每个时间步长的概率转化为输出序列的概率。
More detailed explanation about CTC loss can be found in .
关于CTC损失的详细解释请见。
第六段
To further improve performance, the pre-decoder intermediate feature map was augmented with the global context embedding as in [12].
为了进一步提高性能,在预解码器中间特征图中加入[12]中所示的全局上下文嵌入。
It is computed via a fully-connected layer over backbone output, tiled to the desired size and concatenated with backbone output.
它通过骨干网输出的全连接层计算,平铺到所需的大小,并与骨干网输出连接。
In order to adjust the depth of feature map to the character class number additional 1 × 1 convolution is applied.
为了使特征图的深度与字符类数相适应,采用了额外的1 × 1卷积。

For the decoding procedure at the inference stage we considered 2 options: greedy search and beam search.
对于推理阶段的解码过程,我们考虑了两种选择:贪婪搜索和波束搜索。
While greedy search takes the maximum of class probabilities in each position, beam search maximizes the total probability of the output sequence [3, 4].
贪婪搜索使每个位置的类概率最大,而波束搜索使输出序列的总概率最大[3,4]。

For post-filtering we use a task-oriented language model implemented as a set of the target country LP templates.
对于后过滤,我们使用一组目标国家LP模板实现的面向任务的语言模型。
Note that post-filtering is applied together with Beam Search.
注意,后过滤是与波束搜索一起应用的。
The post-filtering procedure gets top-N most probable sequences found by beam search and returns the first one that matches the set of predefined templates which depends on country LP regulations.
后过滤程序根据波束搜索找到的最可能序列的top-N,根据国家LP规则返回与预定义模板集匹配的第一个序列。

训练细节
All training experiments were done with the help of TensorFlow [21].
所有的训练实验都是在TensorFlow[21]的帮助下完成的。
We train our model with ’Adam’ optimizer using batch size of 32, initial learning rate 0.001 and gradient noise scale of 0.001.
我们使用“Adam”优化器训练我们的模型,使用批量为32,初始学习率为0.001,梯度噪声尺度为0.001。
We drop the learning rate once after every 100k iterations by a factor of 10 and train our network for 250k iterations in total.
在每100k次迭代之后,我们将学习速率降低10倍,并总共训练我们的网络250k次迭代。
In our experiments we use data augmentation by random affine transformations, e.g. rotation, scaling and shift.
在我们的实验中,我们使用随机仿射变换的数据增强,例如旋转、缩放和移位。
It is worth mentioning, that application of LocNet from the beginning of training leads to degradation of results, because LocNet cannot get reasonable gradients from a recognizer which is typically too weak for the first few iterations.
值得一提的是,从训练开始就应用LocNet会导致结果下降,因为LocNet无法从识别器获得合理的梯度,而识别器在最初的几个迭代中通常都太弱了。
So, in our experiments, we turn LocNet on only after 5k iterations.
所以,在我们的实验中,我们只有在5000千米后才打开LocNet迭代。
All other hyper-parameters were chosen by crossvalidation over the target dataset.
所有其他超参数都是通过目标数据集的交叉验证来选择的。

结果&经验

The LPRNet baseline network, from which we started our experiments with different architectures, was inspired by [2].
LPRNet基线网络是受到[2]的启发,我们从它开始用不同的架构进行实验。
It’s mainly based on Inception blocks followed by a bidirectional LSTM (biLSTM) decoder and trained with CTC loss.
它主要基于Inception块和双向LSTM (biLSTM)解码器,并使用CTC损耗进行训练。
We first performed some experiments aimed at replacing biLSTM with biGRU cells, but we did not observe any clear benefits of using biGRU over biLSTM.
我们首先进行了一些旨在用biGRU细胞替换biLSTM的实验,但是我们没有观察到使用biGRU比使用biLSTM有任何明显的好处。
Then, we focused on eliminating of the complicated biLSTM decoder, because most modern embedded devices still do not have sufficient compute and memory to efficiently execute biLSTM.
然后,我们将重点放在消除复杂的biLSTM解码器上,因为大多数现代嵌入式设备仍然没有足够的计算和内存来有效地执行biLSTM。
Importantly, our LSTM is applied to a spatial sequence rather than to a temporal one.
重要的是,我们的LSTM应用于空间序列而不是时间序列。
Thus all LSTM inpuuuts are known upfront both at the training stage as well as at the inference stage.
因此,在训练阶段和推理阶段,所有的LSTM输入都是预先知道的。
Therefore we believe that RNN can be replaced by spatial convolutions without a significant drop in accuracy.
因此,我们认为可以用空间卷积替代RNN,而精度不会有明显下降。
The RNN-less model with some backbone modifications is referenced as LPRNet basic and it was described in details in sec. 3.
具有一些骨干修改的RNN-less模型被引用为LPRNet基础模型,在第3节详细描述了它。
To improve runtime performance we also modified LPRNet basic by using 2 × 2 strides for all pooling layers.
为了提高运行时性能,我们还修改了LPRNet basic,对所有池化层使用2 × 2的strides。
This modification (the LPRNet reduced model) reduces the size of intermediate feature maps and total inference computational cost significantly (see GFLOPs column of the Table 4).
这种修改(LPRNet缩减模型)大大减少了中间特征映射的大小和总推理计算成本(见表4的GFLOPs列)。

Ablation study消融实验

It is of vital importance to conduct the ablation study to identify correlation between various enhancements and respective accuracy/performance improvements.
进行消融研究是至关重要的,以确定各种增强和各自的准确性/性能改进之间的相关性。
This helps other researchers adopt ideas from the paper and reuse most promising architecture approaches.
这有助于其他研究人员采纳论文中的观点,并重用最有前途的架构方法。
Table 5 shows a summary of architecture approaches and their impact on accuracy.
表5总结了架构方法及其对准确性的影响。

As one can see, the largest accuracy gain (36%) was achieved using the global context.
正如我们所看到的,使用全局上下文获得了最大的准确性增益(36%)。
The data augmentation techniques also help to improve accuracy significantly (by 28.6%).
数据增强技术也有助于显著提高准确度(提高28.6%)。
Without using data augmentation and the global context we could not train the model from scratch.
如果不使用数据扩充和全局上下文,我们就无法从头开始训练模型。
The STN-based alignment subnetwork provides noticeable improvement of 2.8-5.2%.
基于stn的对齐子网提供了2.8-5.2%的显著改进。
Beam Search with postfiltering further improves recognition accuracy by 0.4- 0.6%.
带有后滤波的波束搜索进一步提高了0.4- 0.6%的识别精度。

展望

In this work, we have shown that for License Plate Recognition one can utilize pretty small convolutional neural networks.
在这项工作中,我们已经证明了车牌识别可以利用相当小的卷积神经网络。
LPRNet model was introduced, which can be used for challenging data, achieving up to 95% recognition accuracy.
引入LPRNet模型,可用于挑战性数据,识别准确率高达95%。
Architecture details, its motivation and the ablation study was conducted.
进行了建筑细节、动机和消融研究。
We showed that LPRNet can perform inference in realtime on a variety of hardware architectures including CPU, GPU and FPGA.
我们证明LPRNet可以在包括CPU、GPU和FPGA在内的各种硬件架构上实时执行推理。
We have no doubt that LPRNet could attain real-time performance even on more specialized embedded low-power devices.
我们毫不怀疑,即使在更专业的嵌入式低功耗设备上,LPRNet也可以实现实时性能。
The LPRNet can likely be compressed using modern pruning and quantization techniques, which would potentially help to reduce the computational complexity even further.
LPRNet可以使用现代剪枝和量化技术进行压缩,这可能有助于进一步降低计算复杂度。
As a future direction of research, LPRNet work can be extended by merging CNN-based detection part into our algorithm, so that both detection and recognition tasks will be evaluated as a single network in order to outperform the LBP-based cascaded detector quality.
作为未来的研究方向,可以将基于cnn的检测部分合并到我们的算法中,从而扩展LPRNet的工作,将检测和识别任务作为一个单一的网络进行评估,从而优于基于lbp的级联检测器的质量。

标签:字符,翻译,LPRNet,character,LP,CTC,our
来源: https://www.cnblogs.com/starc/p/16079974.html