首页 > 编程语言> > 读书笔记《Deep Learning for Computer Vision with Python》- 第三卷第9章 Kaggle竞赛：情绪识别

读书笔记《Deep Learning for Computer Vision with Python》- 第三卷第9章 Kaggle竞赛：情绪识别

2021-12-27 22:35:04 作者：互联网

第三卷第九章 Kaggle竞赛：情绪识别

在本章中，我们将解决Kaggle的面部表情识别挑战。为了完成这项任务，我们将在训练数据上从头开始训练一个类似VGG的网络，同时考虑到我们的网络需要足够小和足够快才能在我们的CPU上实时运行。

人类的情绪是混合在一起的。在经历“惊喜”时，我们也可能会感到“快乐”（例如惊喜生日派对）或“害怕”（如果惊喜不是受欢迎的）。即使在“害怕”的情绪中，我们也可能会感受到“愤怒”的暗示。

在研究情绪识别时，重要的是不要关注单个类别标签（就像我们有时在其他分类问题中所做的那样）。相反，查看每种情绪的概率并表征分布对我们来说更有优势。正如我们将在本章后面看到的那样，与简单地选择概率最高的单一情绪相比，检查情绪概率的分布为我们提供了更准确的情绪预测标准。

1、Kaggle 面部表情识别挑战

Kaggle Emotion and Facial Expression Recognition 挑战训练数据集包含 28,709 张图像，每张图像均为48*48张灰度图像（图）。面部已自动对齐，因此它们在每张图像中的大小大致相同。鉴于这些图像，我们的目标是将每张脸上表达的情绪分为七个不同的类别：愤怒、厌恶、恐惧、快乐、悲伤、惊讶和中性。

图：Kaggle中的面部表情示例：面部表情识别挑战。我们将训练一个CNN来识别和识别这些情绪中的每一种。该CNN还能够在您的CPU上实时运行，使您能够识别视频流中的情绪。

1.1 FER13 数据集

这个面部表情数据集称为 FER13 数据集，可以在官方 Kaggle 比赛页面找到并下载。

Challenges in Representation Learning: Facial Expression Recognition Challenge | KaggleLearn facial expressions from an imagehttps://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data 也可以从百度网盘下载

链接：https://pan.baidu.com/s/1ODT8nfO9aGzfKkLrVVWjxg
提取码：ei3l

下载数据集后，您将找到一个名为 fer2013.csv 的文件，其中包含三列：

情感：类标签。

像素：一个由 48*48 = 2304个灰度像素组成的列表，代表人脸本身。

用法：图像是用于训练、PrivateTest（验证）还是 PublicTest（测试）。

我们的目标是现在获取这个 .csv 文件并将其转换为 HDF5 格式，以便我们可以更轻松地在其上训练卷积神经网络。

FER13 总共有七类：愤怒、厌恶、恐惧、快乐、悲伤、惊讶和中立。然而，“厌恶”类存在严重的类不平衡，因为它只有 113 个图像样本（其余每个类有超过 1,000 个图像）。在做了一些研究之后，遇到了 Mememoji 项目，它建议将“厌恶”和“愤怒”合并为一个类（因为情绪在视觉上相似），从而将 FER13 变成一个 6 类问题。

1.2 构建 FER13 数据集

创建一个名为 emotion_config.py 的文件。存储配置变量的地方，包括输入数据集的路径、输出 HDF5 文件和批处理大小等等。

# import the necessary packages
from os import path

# define the base path to the emotion dataset
BASE_PATH = "D:/Project/ml_toolset/emotion_recognition/raid/datasets/"
BASE_PATH1 = "D:/Project/ml_toolset/emotion_recognition/"

# use the base path to define the path to the input emotions file
INPUT_PATH = path.sep.join([BASE_PATH, "fer2013/fer2013.csv"])

# define the number of classes (set to 6 if you are ignoring the
# "disgust" class)
# NUM_CLASSES = 7
NUM_CLASSES = 6

# define the path to the output training, validation, and testing
# HDF5 files
TRAIN_HDF5 = path.sep.join([BASE_PATH1, "hdf5/train.hdf5"])
VAL_HDF5 = path.sep.join([BASE_PATH1, "hdf5/val.hdf5"])
TEST_HDF5 = path.sep.join([BASE_PATH1, "hdf5/test.hdf5"])

# define the batch size
BATCH_SIZE = 128

# define the path to where output logs will be stored
OUTPUT_PATH = path.sep.join([BASE_PATH1, "output"])

CHECKPOINTS_PATH = path.sep.join([BASE_PATH1, "checkpoints"])

MODEL_PATH = path.sep.join([BASE_PATH1, "model"])

创一个名为build_dataset.py的文件。将负责摄取 fer2013.csv 数据集文件并输出一组 HDF5 文件；分别用于训练、验证和测试拆分。

# import the necessary packages
import emotion_config as config
from customize.tools.hdf5DatasetWriter import HDF5DatasetWriter
import numpy as np

# open the input file for reading (skipping the header), then
# initialize the list of data and labels for the training,
# validation, and testing sets
print("[INFO] loading input data...")
f = open(config.INPUT_PATH)
f.__next__() # f.next() for Python 2.7
(trainImages, trainLabels) = ([], [])
(valImages, valLabels) = ([], [])
(testImages, testLabels) = ([], [])

# loop over the rows in the input file
for row in f:
    # extract the label, image, and usage from the row
    (label, image, usage) = row.strip().split(",")
    label = int(label)

    # if we are ignoring the "disgust" class there will be 6 total
    # class labels instead of 7
    if config.NUM_CLASSES == 6:
        # merge together the "anger" and "disgust classes
        if label == 1:
            label = 0

        # if label has a value greater than zero, subtract one from
        # it to make all labels sequential (not required, but helps
        # when interpreting results)
        if label > 0:
            label -= 1

    # reshape the flattened pixel list into a 48x48 (grayscale)
    # image
    image = np.array(image.split(" "), dtype="uint8")
    image = image.reshape((48, 48))

    # check if we are examining a training image
    if usage == "Training":
        trainImages.append(image)
        trainLabels.append(label)
    # check if this is a validation image
    elif usage == "PrivateTest":
        valImages.append(image)
        valLabels.append(label)
    # otherwise, this must be a testing image
    else:
        testImages.append(image)
        testLabels.append(label)

# construct a list pairing the training, validation, and testing
# images along with their corresponding labels and output HDF5
# files
datasets = [
    (trainImages, trainLabels, config.TRAIN_HDF5),
    (valImages, valLabels, config.VAL_HDF5),
    (testImages, testLabels, config.TEST_HDF5)]

# loop over the dataset tuples
for (images, labels, outputPath) in datasets:
    # create HDF5 writer
    print("[INFO] building {}...".format(outputPath))
    writer = HDF5DatasetWriter((len(images), 48, 48), outputPath)

    # loop over the image and add them to the dataset
    for (image, label) in zip(images, labels):
        writer.add([image], [label])

    # close the HDF5 writer
    writer.close()

# close the input file
f.close()

python build_dataset.py命令完成执行后，您可以通过检查您指示将 HDF5 文件存储在 emotion_config.py 中的目录的内容来验证是否已生成 HDF5 文件。

2、实现类似 VGG 的网络’

要实施的识别各种情绪和面部表情的网络受到VGG网络的启发：

1. 网络中的 CONV 层将只有 3*3。

2. 我们将在网络中越深入，每个CONV层学习的过滤器数量将增加一倍。

EmotionVGGnet架构的表格总结。每层都包含输出体积大小，及相关时的卷积过滤器大小/池大小。

创建一个名为emotionvggnet.py的文件。

# import the necessary packages
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import ELU
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Dense
from tensorflow.keras import backend as K

class EmotionVGGNet:
    @staticmethod
    def build(width, height, depth, classes):
        # initialize the model along with the input shape to be
        # "channels last" and the channels dimension itself
        model = Sequential()
        inputShape = (height, width, depth)
        chanDim = -1

        # if we are using "channels first", update the input shape
        # and channels dimension
        if K.image_data_format() == "channels_first":
            inputShape = (depth, height, width)
            chanDim = 1

        # Block #1: first CONV => RELU => CONV => RELU => POOL
        # layer set
        model.add(Conv2D(32, (3, 3), padding="same", kernel_initializer = "he_normal", input_shape = inputShape))
        model.add(ELU())
        model.add(BatchNormalization(axis=chanDim))
        model.add(Conv2D(32, (3, 3), kernel_initializer="he_normal", padding = "same"))
        model.add(ELU())
        model.add(BatchNormalization(axis=chanDim))
        model.add(MaxPooling2D(pool_size=(2, 2)))
        model.add(Dropout(0.25))

        # Block #2: second CONV => RELU => CONV => RELU => POOL
        # layer set
        model.add(Conv2D(64, (3, 3), kernel_initializer="he_normal", padding = "same"))
        model.add(ELU())
        model.add(BatchNormalization(axis=chanDim))
        model.add(Conv2D(64, (3, 3), kernel_initializer="he_normal", padding = "same"))
        model.add(ELU())
        model.add(BatchNormalization(axis=chanDim))
        model.add(MaxPooling2D(pool_size=(2, 2)))
        model.add(Dropout(0.25))

        # Block #3: third CONV => RELU => CONV => RELU => POOL
        # layer set
        model.add(Conv2D(128, (3, 3), kernel_initializer="he_normal", padding = "same"))
        model.add(ELU())
        model.add(BatchNormalization(axis=chanDim))
        model.add(Conv2D(128, (3, 3), kernel_initializer="he_normal", padding = "same"))
        model.add(ELU())
        model.add(BatchNormalization(axis=chanDim))
        model.add(MaxPooling2D(pool_size=(2, 2)))
        model.add(Dropout(0.25))

        # Block #4: first set of FC => RELU layers
        model.add(Flatten())
        model.add(Dense(64, kernel_initializer="he_normal"))
        model.add(ELU())
        model.add(BatchNormalization())
        model.add(Dropout(0.5))

        # Block #6: second set of FC => RELU layers
        model.add(Dense(64, kernel_initializer="he_normal"))
        model.add(ELU())
        model.add(BatchNormalization())
        model.add(Dropout(0.5))

        # Block #7: softmax classifier
        model.add(Dense(classes, kernel_initializer="he_normal"))
        model.add(Activation("softmax"))
        # return the constructed network architecture
        return model

3、训练我们的面部表情识别器

创建一个名为train_recognizer.py的文件。

# set the matplotlib backend so figures can be saved in the background
import matplotlib
matplotlib.use("Agg")

# import the necessary packages
import emotion_config as config
from customize.tools.imagetoarraypreprocessor import ImageToArrayPreprocessor
from customize.tools.epochcheckpoint import EpochCheckpoint
from customize.tools.trainingmonitor import TrainingMonitor
from customize.tools.hdf5datasetgenerator import HDF5DatasetGenerator
from emotionvggnet import EmotionVGGNet
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import load_model
import tensorflow.keras.backend as K
import argparse
import os

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-c", "--checkpoints", required=False, help="path to output checkpoint directory", default=config.CHECKPOINTS_PATH)
ap.add_argument("-m", "--model", type=str, help="path to *specific* model checkpoint to load")#, default=config.MODEL_PATH
ap.add_argument("-s", "--start-epoch", type=int, default=0, help="epoch to restart training at")
args = vars(ap.parse_args())

# construct the training and testing image generators for data
# augmentation, then initialize the image preprocessor
trainAug = ImageDataGenerator(rotation_range=10, zoom_range=0.1, horizontal_flip=True, rescale=1 / 255.0, fill_mode="nearest")
valAug = ImageDataGenerator(rescale=1 / 255.0)
iap = ImageToArrayPreprocessor()

# initialize the training and validation dataset generators
trainGen = HDF5DatasetGenerator(config.TRAIN_HDF5, config.BATCH_SIZE, aug=trainAug, preprocessors=[iap], classes=config.NUM_CLASSES)
valGen = HDF5DatasetGenerator(config.VAL_HDF5, config.BATCH_SIZE, aug=valAug, preprocessors=[iap], classes=config.NUM_CLASSES)

# if there is no specific model checkpoint supplied, then initialize
# the network and compile the model
if args["model"] is None:
    print("[INFO] compiling model...")
    model = EmotionVGGNet.build(width=48, height=48, depth=1, classes=config.NUM_CLASSES)
    opt = Adam(lr=1e-3)
    model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"])
# otherwise, load the checkpoint from disk
else:
    print("[INFO] loading {}...".format(args["model"]))
    model = load_model(args["model"])

    # update the learning rate
    print("[INFO] old learning rate: {}".format(K.get_value(model.optimizer.lr)))
    K.set_value(model.optimizer.lr, 1e-3)
    print("[INFO] new learning rate: {}".format(K.get_value(model.optimizer.lr)))

# construct the set of callbacks
figPath = os.path.sep.join([config.OUTPUT_PATH, "vggnet_emotion.png"])
jsonPath = os.path.sep.join([config.OUTPUT_PATH, "vggnet_emotion.json"])
callbacks = [
    EpochCheckpoint(args["checkpoints"], every=5, startAt=args["start_epoch"]),
    TrainingMonitor(figPath, jsonPath=jsonPath, startAt=args["start_epoch"])]


# train the network
model.fit_generator(
    trainGen.generator(),
    steps_per_epoch=trainGen.numImages // config.BATCH_SIZE,
    validation_data=valGen.generator(),
    validation_steps=valGen.numImages // config.BATCH_SIZE,
    epochs=15,
    max_queue_size=config.BATCH_SIZE * 2,
    callbacks=callbacks, verbose=1)

# close the databases
trainGen.close()
valGen.close()

（1）使用ELU，而不是ReLU。

（2）将“愤怒”和“厌恶”合并为一个标签。修改build_dataset.py 并将 NUM_CLASSES 设置为 6。

（3）使用 Adam 优化器，基本学习率为 1e-3。

当我们检查第 75 个epoch的输出时，我们现在看到 EmotionVGGNet 达到了 68.51% 的准确率。

我们在 FER2013 上的最终实验训练 EmotionVGGNet。在这里，我们通过将“愤怒”和“厌恶”类别合并为一个标签，达到了 68.51% 的准确率。

4、评估我们的面部表情识别器

创建一个名为test_recognizer.py的文件。

# import the necessary packages
import emotion_config as config
from customize.tools.imagetoarraypreprocessor import ImageToArrayPreprocessor
from customize.tools.hdf5datasetgenerator import HDF5DatasetGenerator
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import load_model
import argparse

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-m", "--model", type=str, help="path to model checkpoint to load")
args = vars(ap.parse_args())

# initialize the testing data generator and image preprocessor
testAug = ImageDataGenerator(rescale=1 / 255.0)
iap = ImageToArrayPreprocessor()

# initialize the testing dataset generator
testGen = HDF5DatasetGenerator(config.TEST_HDF5, config.BATCH_SIZE, aug=testAug, preprocessors=[iap], classes=config.NUM_CLASSES)

# load the model from disk
print("[INFO] loading {}...".format(args["model"]))
model = load_model(args["model"])

# evaluate the network
(loss, acc) = model.evaluate_generator(
    testGen.generator(),
    steps=testGen.numImages // config.BATCH_SIZE,
    max_queue_size=config.BATCH_SIZE * 2)
print("[INFO] accuracy: {:.2f}".format(acc * 100))

# close the testing database
testGen.close()

要在 FER2013 上评估 EmotionVGGNet，只需打开一个终端并执行以下命令：

正如我的结果所表明的，我们能够在测试集上获得 66:96% 的准确率。这个66.96%的分类结果来自 FER2013 的6各类别的变体，而不是 Kaggle 识别挑战中的 7各类别的原始版本。但我们可以轻松地在7级版本上重新训练网络并获得类似的准确率。

5、实时情绪检测

创建emotion_detector.py文件。

--cascade 开关是我们人脸检测 Haar 级联的路径。

# import the necessary packages
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.models import load_model
import numpy as np
import argparse
import imutils
import cv2

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-c", "--cascade", required=True, help="path to where the face cascade resides")
ap.add_argument("-m", "--model", required=True, help="path to pre-trained emotion detector CNN")
ap.add_argument("-v", "--video", help="path to the (optional) video file")
args = vars(ap.parse_args())

# load the face detector cascade, emotion detection CNN, then define
# the list of emotion labels
detector = cv2.CascadeClassifier(args["cascade"])
model = load_model(args["model"])
EMOTIONS = ["angry", "scared", "happy", "sad", "surprised", "neutral"]

# if a video path was not supplied, grab the reference to the webcam
if not args.get("video", False):
    camera = cv2.VideoCapture(1)

# otherwise, load the video
else:
    camera = cv2.VideoCapture(args["video"])

# keep looping
while True:
    # grab the current frame
    (grabbed, frame) = camera.read()

    # if we are viewing a video and we did not grab a
    # frame, then we have reached the end of the video
    if args.get("video") and not grabbed:
        break

    # resize the frame and convert it to grayscale
    frame = imutils.resize(frame, width=300)
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    # initialize the canvas for the visualization, then clone
    # the frame so we can draw on it
    canvas = np.zeros((220, 300, 3), dtype="uint8")
    frameClone = frame.copy()
    # detect faces in the input frame, then clone the frame so that
    # we can draw on it
    rects = detector.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(30, 30), flags=cv2.CASCADE_SCALE_IMAGE)

    # ensure at least one face was found before continuing
    if len(rects) > 0:
        # determine the largest face area
        rect = sorted(rects, reverse=True, key = lambda x: (x[2] - x[0]) * (x[3] - x[1]))[0]
        (fX, fY, fW, fH) = rect
        # extract the face ROI from the image, then pre-process
        # it for the network
        roi = gray[fY:fY + fH, fX:fX + fW]
        roi = cv2.resize(roi, (48, 48))
        roi = roi.astype("float") / 255.0
        roi = img_to_array(roi)
        roi = np.expand_dims(roi, axis=0)

        # make a prediction on the ROI, then lookup the class
        # label
        preds = model.predict(roi)[0]
        label = EMOTIONS[preds.argmax()]
        # loop over the labels + probabilities and draw them
        for (i, (emotion, prob)) in enumerate(zip(EMOTIONS, preds)):
            # construct the label text
            text = "{}: {:.2f}%".format(emotion, prob * 100)
            # draw the label + probability bar on the canvas
            w = int(prob * 300)
            cv2.rectangle(canvas, (5, (i * 35) + 5), (w, (i * 35) + 35), (0, 0, 255), -1)
            cv2.putText(canvas, text, (10, (i * 35) + 23), cv2.FONT_HERSHEY_SIMPLEX, 0.45, (255, 255, 255), 2)

        # draw the label on the frame
        cv2.putText(frameClone, label, (fX, fY - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 0, 255), 2)
        cv2.rectangle(frameClone, (fX, fY), (fX + fW, fY + fH), (0, 0, 255), 2)

    # show our classifications + probabilities
    cv2.imshow("Face", frameClone)
    cv2.imshow("Probabilities", canvas)

    # if the ’q’ key is pressed, stop the loop
    if cv2.waitKey(1) & 0xFF == ord("q"):
        break

# cleanup the camera and close any open windows
camera.release()
cv2.destroyAllWindows()

6、小结

在本章中，我们学习了如何实现能够预测情绪和面部表情的卷积神经网络。我们训练了一个名为 EmotionVGGNet 的类VGG的CNN。该网络由两个相互堆叠的 CONV 层组成，每个块中的过滤器数量加倍。重要的是我们的CNN：

1. 足够深以获得高精度。

2. 但不能深到无法在CPU上实时运行。

然后我们在 FER2013 数据集上训练了我们的 CNN，这是 Kaggle 情感和面部表情识别挑战的一部分。总体而言，我们能够获得 66:96% 的准确率，通过更积极地进行数据增强、深化网络、增加层数和添加正则化，可能会获得更高的准确性。

最后，我们通过创建了一个 Python 脚本，该脚本可以 (1) 检测视频流中的人脸，以及 (2) 应用我们预先训练的 CNN 来实时识别主要的面部表情。此外，我们还包括每种情绪的概率分布，使我们能够更轻松地解释我们网络的结果。

另外，作为人类，我们总是某种混合的感情。因此，在尝试标记给定人的面部表情时，检查 EmotionVGGNet 返回的概率分布非常重要。

标签：读书笔记,Python,image,Kaggle,add,path,import,model,config
来源： https://blog.csdn.net/bashendixie5/article/details/122181227

读书笔记《Deep Learning for Computer Vision with Python》- 第三卷 第9章 Kaggle竞赛：情绪识别

第三卷 第九章 Kaggle竞赛：情绪识别