首页 > 其他分享> > 用CNN识别验证码

用CNN识别验证码

2021-10-16 23:31:10 作者：互联网

一，基本思路
生成数据(验证码样本)

1.验证码类型

我们这里生成的验证码是当前最常见的验证码即由26位大小写英文字母和0到9十个数字组成的字符型验证码。

2.生成方式

我们可以选择两种方式来生成我们的训练数据。一种是一次性生成几万张图(保存到本地)，另一种是定义一个数据生成器(数据未被保存)。两种方式各有千秋，第一种方式的好处是训练的时候显卡利用率高，如果你需要经常调参，可以一次生成，多次使用；第二种方式的好处是你不需要生成大量数据，训练过程中可以利用 CPU 生成数据，而且还有一个好处是你可以无限生成数据。我们这里采用第二种方式来生成数据。

3.验证码图片如下

处理数据

1.色彩在验证码中并不重要，我们将彩色验证码图片转为黑白，3维转1维，减少干扰数据。

2.将黑白验证码图片及其文本内容转化为数值数据。

3.设置验证码图片组，以便让图片数据分批次进行训练。

创建模型

这里用到了 5 层网络，前 3 层为卷积层，第 4、5 层为全连接层。对 4 层隐藏层都进行 dropout。网络结构如下所示： input——>conv——>pool——>dropout——>conv——>pool——>dropout——>conv——>pool——>dropout——>fully connected layer——>dropout——>fully connected layer——>output

训练数据

这里选择交叉熵损失函数。sigmod_cross适用于每个类别相互独立但不互斥，如图中可以有字母和数字。每批次采用 64 个训练样本，每训练100次测试一次样本识别的准确度，当准确度大于 95% 时保存模型，当准确度大于99%时训练结束。我们这里采用CPU来训练模型，训练大概需要7个小时左右才能达到95%的准确度，什么时候能达到99%呢？抱歉，看到过了0.95就激动了，我没能等到0.99。

测试模型

生成验证码——>调用保存的模型——>识别验证码——>输出识别结果。

二，主要工具
Anaconda

Anaconda指的是一个开源的Python发行版本，其包含了conda、Python等180多个科学包及其依赖项。

Python3.6

Python 是一个有条理的和强大的面向对象的程序设计语言，类似于Perl, Ruby, Scheme, Java.

TensorFlow

TensorFlow是谷歌基于DistBelief进行研发的第二代人工智能学习系统，是一种通用深度学习框架，可被用于语音识别或图像识别等多项机器学习和深度学习领域。

Captcha

captcha 是用 python 写的生成验证码的库，它支持图片验证码和语音验证码，我们使用的是它生成图片验证码的功能。

PyCharm(其他IDE当然也行)

PyCharm是一种Python IDE，带有一整套可以帮助用户在使用Python语言开发时提高其效率的工具。

三，准备工作
安装Anaconda

1.从官方网站下载Anaconda:

https://www.anaconda.com/download/

2.进行软件安装（这个和普通的没什么特别区别）:

注意一点：

3.安装完成Anaconda之后进行环境变量的检测: 进入Anaconda Prompt，输入 conda info --envs

4.检测anaconda环境是否安装成功：进入Anaconda Prompt，输入 conda --version

安装TensorFlow(含Python3.6)

1.检测目前安装了哪些环境变量：打开Anaconda Prompt，输入 conda info --envs

2.接着在Anaconda中安装一个内置的python版本解析器(其实就是python的版本)，安装python版本：输入 conda create --name tensorflow python=3.6

3.检测tensflow的环境是否添加到了Anaconda里面：输入 conda info --envs

4.启动tensorflow环境：在Anaconda Prompt中，输入 activate tensorflow

5.正式安装tensorflow：在tensorflow环境下，输入 pip install --upgrade --ignore-installed tensorflow

6.使用时注意：创建新python项目时，python解析器必须选择我们之前安装tensorflow的目录下的解析器，否则的话，我们之后使用不了tensorflow模块的内容。

7.测试是否安装成功：在tensorflow环境下，调出python，运行一个简单程序。

import tensorflow as tf

hello = tf.constant('Hello, TensorFlow!')

sess = tf.Session() print(sess.run(hello))

这里运行程序时它会跳出一句话(不解决的话以后也一直有)说我们下载TensorFlow的版本不支持cpu的AVX2编译，一般没啥影响，不用管他。当然解决办法也有，请参考 https://blog.csdn.net/Fourierrr_/article/details/79749899。

8.注意：tensorflow环境下没有的模块，需要在tensorflow环境下安装。如下面的captcha等python库。

安装Captcha

1.启动tensorflow环境：在Anaconda Prompt中，直接输入 activate tensorflow
2.在tensorflow环境下安装captcha：输入pip install captcha

安装其他python库

random，numpy ，matplotlib，os，datetime ——同上，从略

四，详细代码
代码分成4个部分，分别是captcha_create.py，captcha_process.py，cnn_train.py，cnn_test.py，将它们全部放到一个名为captchaCnn的python packge下。

captcha_create.py(生成数据)

import random

import numpy as np

from PIL import Image

import matplotlib.pyplot as plt

from captcha.image import ImageCaptcha

# 验证码基本信息

NUMBER = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

LOW_CASE = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u',

'v', 'w', 'x', 'y', 'z']

UP_CASE = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U',

'V', 'W', 'X', 'Y', 'Z']

CAPTCHA_LIST = NUMBER + LOW_CASE + UP_CASE

CAPTCHA_LEN = 4

CAPTCHA_HEIGHT = 60

CAPTCHA_WIDTH = 160

# 随机生成验证码文本

def random_captcha_text(char_set=CAPTCHA_LIST, captcha_size=CAPTCHA_LEN):

'''
:param char_set:
:param captcha_size:
:return:
'''

captcha_text = [random.choice(char_set) for _ in range(captcha_size)]

return ''.join(captcha_text)

# 生成随机验证码

def gen_captcha_text_and_image(width=CAPTCHA_WIDTH, height=CAPTCHA_HEIGHT,save=None):

'''
:param width:
:param height:
:param save:
:return: np数组
'''

image = ImageCaptcha(width=width, height=height)

# 验证码文本

captcha_text = random_captcha_text()

captcha = image.generate(captcha_text)

# 保存

if save: image.write(captcha_text, captcha_text + '.jpg')

captcha_image = Image.open(captcha)

# 转化为np数组

captcha_image = np.array(captcha_image)

return captcha_text, captcha_image

if __name__ == '__main__':

a = gen_captcha_text_and_image(CAPTCHA_WIDTH, CAPTCHA_HEIGHT, save=False)

print(a[0])

plt.imshow(a[1])

plt.show()

captcha_process.py(处理数据)

import numpy as np

from captchaCnn.captcha_create import gen_captcha_text_and_image

from captchaCnn.captcha_create import CAPTCHA_LIST, CAPTCHA_LEN, CAPTCHA_HEIGHT, CAPTCHA_WIDTH

# 图片转为黑白，3维转1维

def convert2gray(img):

'''
:param img:
:return:
'''

if len(img.shape) > 2:

img = np.mean(img, -1)

return img

# 验证码文本转为向量

def text2vec(text, captcha_len=CAPTCHA_LEN, captcha_list=CAPTCHA_LIST):

'''
:param text:
:param captcha_len:
:param captcha_list:
:return:
'''

text_len = len(text)

if text_len > captcha_len:

raise ValueError('验证码最长4个字符')

vector = np.zeros(captcha_len * len(captcha_list))

for i in range(text_len): vector[captcha_list.index(text[i])+i*len(captcha_list)] = 1

return vector

# 验证码向量转为文本

def vec2text(vec, captcha_list=CAPTCHA_LIST, size=CAPTCHA_LEN):

'''
:param vec:
:param captcha_list:
:param size:
:return:
'''

vec_idx = vec

text_list = [captcha_list[v] for v in vec_idx]

return ''.join(text_list)

# 返回特定shape图片

def wrap_gen_captcha_text_and_image(shape=(CAPTCHA_HEIGHT, CAPTCHA_WIDTH, 3)):

'''
:param shape:
:return:
'''

while True:

t, im = gen_captcha_text_and_image()

if im.shape == shape: return t, im

# 获取训练图片组

def next_batch(batch_count=60, width=CAPTCHA_WIDTH, height=CAPTCHA_HEIGHT):

'''
:param batch_count:
:param width:
:param height:
:return:
'''

batch_x = np.zeros([batch_count, width * height])

batch_y = np.zeros([batch_count, CAPTCHA_LEN * len(CAPTCHA_LIST)])

for i in range(batch_count):

text, image = wrap_gen_captcha_text_and_image()

image = convert2gray(image)

# 将图片数组一维化同时将文本也对应在两个二维组的同一行

batch_x[i, :] = image.flatten() / 255

batch_y[i, :] = text2vec(text)

# 返回该训练批次

return batch_x, batch_y

if __name__ == '__main__':

x, y = next_batch(batch_count=1)

print(x,'\n\n', y)

cnn_train.py(创建模型，训练数据)

import os

import tensorflow as tf

from datetime import datetime

from captchaCnn.captcha_process import next_batch

from captchaCnn.captcha_create import CAPTCHA_HEIGHT, CAPTCHA_WIDTH, CAPTCHA_LEN, CAPTCHA_LIST

# 随机生成权重

def weight_variable(shape, w_alpha=0.01):

'''
:param shape:
:param w_alpha:
:return:
'''

initial = w_alpha * tf.random_normal(shape)

return tf.Variable(initial)

# 随机生成偏置项

def bias_variable(shape, b_alpha=0.1):

'''
:param shape:
:param b_alpha:
:return:
'''

initial = b_alpha * tf.random_normal(shape)

return tf.Variable(initial)

# 局部变量线性组合，步长为1，模式‘SAME’代表卷积后图片尺寸不变，即零边距

def conv2d(x, w):

'''
:param x:
:param w:
:return:
'''

return tf.nn.conv2d(x, w, strides=[1, 1, 1, 1], padding='SAME')

# max pooling,取出区域内最大值为代表特征， 2x2pool，图片尺寸变为1/2

def max_pool_2x2(x):

'''
:param x:
:return:
'''

return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

# 三层卷积神经网络计算图

def cnn_graph(x, keep_prob, size, captcha_list=CAPTCHA_LIST, captcha_len=CAPTCHA_LEN):

'''
:param x:
:param keep_prob:
:param size:
:param captcha_list:
:param captcha_len:
:return:
'''

# 图片reshape为4维向量

image_height, image_width = size

x_image = tf.reshape(x, shape=[-1, image_height, image_width, 1])

# 第一层

# filter定义为3x3x1，输出32个特征, 即32个filter

w_conv1 = weight_variable([3, 3, 1, 32])

b_conv1 = bias_variable([32])

# rulu激活函数

h_conv1 = tf.nn.relu(tf.nn.bias_add(conv2d(x_image, w_conv1), b_conv1))

# 池化

h_pool1 = max_pool_2x2(h_conv1)

# dropout防止过拟合

h_drop1 = tf.nn.dropout(h_pool1, keep_prob)

# 第二层

w_conv2 = weight_variable([3, 3, 32, 64])

b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(tf.nn.bias_add(conv2d(h_drop1, w_conv2), b_conv2))

h_pool2 = max_pool_2x2(h_conv2)

h_drop2 = tf.nn.dropout(h_pool2, keep_prob)

# 第三层

w_conv3 = weight_variable([3, 3, 64, 64])

b_conv3 = bias_variable([64])

h_conv3 = tf.nn.relu(tf.nn.bias_add(conv2d(h_drop2, w_conv3), b_conv3))

h_pool3 = max_pool_2x2(h_conv3)

h_drop3 = tf.nn.dropout(h_pool3, keep_prob)

# 全连接层

image_height = int(h_drop3.shape[1])

image_width = int(h_drop3.shape[2])

w_fc = weight_variable([image_height*image_width*64, 1024])

b_fc = bias_variable([1024])

h_drop3_re = tf.reshape(h_drop3, [-1, image_height*image_width*64])

h_fc = tf.nn.relu(tf.add(tf.matmul(h_drop3_re, w_fc), b_fc))

h_drop_fc = tf.nn.dropout(h_fc, keep_prob)

# 全连接层(输出层)

w_out = weight_variable([1024, len(captcha_list)*captcha_len])

b_out = bias_variable([len(captcha_list)*captcha_len])

y_conv = tf.add(tf.matmul(h_drop_fc, w_out), b_out)

return y_conv

# 最小化loss

def optimize_graph(y, y_conv):

'''
优化计算图
:param y:
:param y_conv:
:return:
'''

# 交叉熵计算loss

# sigmod_cross适用于每个类别相互独立但不互斥，如图中可以有字母和数字

loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=y_conv, labels=y))

# 最小化loss优化

optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(loss)

return optimizer

# 偏差计算

def accuracy_graph(y, y_conv, width=len(CAPTCHA_LIST), height=CAPTCHA_LEN):

'''
:param y:
:param y_conv:
:param width:
:param height:
:return:
'''

# 预测值

predict = tf.reshape(y_conv, [-1, height, width])

max_predict_idx = tf.argmax(predict, 2)

# 标签

label = tf.reshape(y, [-1, height, width])

max_label_idx = tf.argmax(label, 2)

correct_p = tf.equal(max_predict_idx, max_label_idx)

accuracy = tf.reduce_mean(tf.cast(correct_p, tf.float32))

return accuracy

# 训练cnn

def train(height=CAPTCHA_HEIGHT, width=CAPTCHA_WIDTH, y_size=len(CAPTCHA_LIST)*CAPTCHA_LEN):

'''
:param height:
:param width:
:param y_size:
:return:
'''

acc_rate = 0.95

# 按照图片大小申请占位符

x = tf.placeholder(tf.float32, [None, height * width])

y = tf.placeholder(tf.float32, [None, y_size])

# 防止过拟合训练时启用测试时不启用

keep_prob = tf.placeholder(tf.float32)

# cnn模型

y_conv = cnn_graph(x, keep_prob, (height, width))

# 最优化

optimizer = optimize_graph(y, y_conv)

# 偏差

accuracy = accuracy_graph(y, y_conv)

# 启动会话.开始训练

saver = tf.train.Saver()

sess = tf.Session()

sess.run(tf.global_variables_initializer())

step = 0

while 1:

# 每批次64个样本

batch_x, batch_y = next_batch(64)

sess.run(optimizer, feed_dict={x: batch_x, y: batch_y, keep_prob: 0.75})

# 每训练一百次测试一次

if step % 100 == 0:

batch_x_test, batch_y_test = next_batch(100)

acc = sess.run(accuracy, feed_dict={x: batch_x_test, y: batch_y_test, keep_prob: 1.0})

print(datetime.now().strftime('%c'), ' step:', step, ' accuracy:', acc)

# 偏差满足要求，保存模型

if acc > acc_rate:

model_path = os.getcwd() + os.sep + str(acc_rate) + "captcha.model"

saver.save(sess, model_path, global_step=step)

acc_rate += 0.01

if acc_rate > 0.99: break

step += 1

sess.close()

if __name__ == '__main__':

train()

刚开始准确度很低，但不能就此断定程序写错了，要运行久一点才能说明问题。前3000样本的准确度都比较低也基本上变化不大，3000样本以后就有点起色了，准确度在缓缓上升。到13000个样本左右时，准确度达到0.9左右。到20000个样本左右时，准确度达到0.95左右。

我这里用CPU花了7个小时训练了20600个样本才达到95%的准确率。保存的模型包括4个文件，如下图。

cnn_test.py(测试模型)

import tensorflow as tf

from captchaCnn.cnn_train import cnn_graph

from captchaCnn.captcha_create import gen_captcha_text_and_image

from captchaCnn.captcha_process import vec2text, convert2gray

from captchaCnn.captcha_process import CAPTCHA_LIST, CAPTCHA_WIDTH, CAPTCHA_HEIGHT, CAPTCHA_LEN

# 验证码图片转化为文本

def captcha2text(image_list, height=CAPTCHA_HEIGHT, width=CAPTCHA_WIDTH):

'''
:param image_list:
:param height:
:param width:
:return:
'''

x = tf.placeholder(tf.float32, [None, height * width])

keep_prob = tf.placeholder(tf.float32)

y_conv = cnn_graph(x, keep_prob, (height, width))

saver = tf.train.Saver()

with tf.Session() as sess:

saver.restore(sess, tf.train.latest_checkpoint('.'))

predict = tf.argmax(tf.reshape(y_conv, [-1, CAPTCHA_LEN, len(CAPTCHA_LIST)]), 2)

vector_list = sess.run(predict, feed_dict={x: image_list, keep_prob: 1})

vector_list = vector_list.tolist()

text_list = [vec2text(vector) for vector in vector_list]

return text_list

if __name__ == '__main__':

text, image = gen_captcha_text_and_image()

image = convert2gray(image)

image = image.flatten() / 255

pre_text = captcha2text([image])

print('Label:', text, ' Predict:', pre_text)

五，总结
用CNN识别验证码的优点

1.模型高效

经过不算特别长的时间的训练可以达到很高的识别准确度。

2.模型存粹

不需要额外的OCR软件或验证码识别API等。

用CNN识别验证码的缺点

1.样本获取难

若想将模型实际应用，那么训练样本的获得就是个问题。要么找不到已知验证码的生成器，要么找到的验证码没有文本标签，需要通过人工打码或打码平台等获得验证码标签。

2.灵活性不高

CNN识别验证码不同于将图片进行分割等处理的识别方法，而直接对整个图片进行学习。因此只要验证码的风格与原样本不同时，识别准确率就比较低。

在CNN识别验证码中，卷积层对于图像是没有尺寸限制要求的。卷积仅于自身的卷积核大小，维度有关，输入向量大小对其无影响。但全连接层的输入是固定大小的，如果输入向量的维数不固定，那么全连接的权值参数的量也是不固定的，就会造成网络的动态变化，无法实现参数训练的目的。因此即使是同样的验证码如果改变了图片的大小，程序就会报错。不过这个问题貌似是可以解决的，具体请参见 https://blog.csdn.net/zhangjunhit/article/details/53909548。

六，写在最后
本人第一次做这个，知识精力有限，如有疏漏之处还望各位大佬指正。
————————————————
版权声明：本文为CSDN博主「It will be」的原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。
原文链接：https://blog.csdn.net/qq_40155500/article/details/82858996

标签：CAPTCHA,text,image,验证码,param,captcha,tf,CNN,识别
来源： https://blog.csdn.net/lizz2276/article/details/120805880