CS231n_2020课程任务实现1.3——Softmax
作者:互联网
CS231n_2020 Assignment 1
- 准备工作
- k-Nearest Neighbor (kNN)
- Support Vector Machine (SVM)
- Softmax
- Two-Layer Neural Network
- Image Features
CS231n是Stanford计算机专业的著名课程,关注卷积神经网络在计算机视觉识别中的应用,每年都会更新slides和assignments。本系列是作者对CS231n课程2020年的3个assignments的学习与实现过程的记录,仅供参考,欢迎讨论、批评与指正。
CS231n_2020官网
Assignment 1的主题是Image Classification,Assignment 1页面。本文是针对Assignment 1.3的学习与实现过程。
本系列将按照课程任务的分项依次实现,代码Github链接。
准备工作
依据课程推荐,作者的代码运行都在Google Colaboratory上完成。该平台的基本用法可自行百度(与Jupyter Notebook类似)。
先阅读Assignment 1页面的任务要求,下载用于Colab实现的代码包后上传到Google Drive中。
Assignment 1共包含5个子任务,在代码包中包含了它们各自对应的.ipynb文件,可以方便地开展代码修改、调试、结果输出等工作。
k-Nearest Neighbor (kNN)
CS231n_2020课程任务实现1.1——k-Nearest Neighbor
Support Vector Machine (SVM)
CS231n_2020课程任务实现1.2——Support Vector Machine
Softmax
知识基础
课程提供的一个线性分类器在线demo,从中可以理解Softmax的含义。
值得关注的概念:
- 交叉熵损失函数(cross-entropy loss):
L i = − f y i + log ∑ j e f j L_i= -f_{y_i} + \log\sum_j e^{f_j} Li=−fyi+logj∑efj
其中 f ( x i ; W ) = W x i f(x_i; W) = W x_i f(xi;W)=Wxi与SVM相同。
更详尽的讲解请参考官方课程笔记-线性分类器、官方课程笔记-优化。
完成 softmax.py
softmax_loss_naive函数
用循环的方式计算softmax损失函数及其梯度。
提示:要避免numeric instability。
def softmax_loss_naive(W, X, y, reg):
loss = 0.0
dW = np.zeros_like(W)
num_classes = W.shape[1]
num_train = X.shape[0]
for i in range(num_train):
scores = X[i].dot(W)
scores -= np.max(scores)
loss += np.log(np.sum(np.exp(scores))) - scores[y[i]]
dW[:, y[i]] -= X[i]
for j in range(num_classes):
dW[:, j] += X[i] * np.exp(scores[j]) / np.sum(np.exp(scores))
loss /= num_train
dW /= num_train
loss += reg * np.sum(W * W)
dW += 2 * reg * W
return loss, dW
与SVM的损失函数计算过程大部分相似。
softmax_loss_vectorized函数
用矢量化编程重写softmax损失函数及其梯度。
提示:要避免numeric instability。
def softmax_loss_vectorized(W, X, y, reg):
loss = 0.0
dW = np.zeros_like(W)
num_train = X.shape[0]
scores = X.dot(W)
scores -= np.max(scores, axis = 1).reshape(-1, 1)
loss = np.sum(np.log(np.sum(np.exp(scores), axis = 1))) - np.sum(scores[np.arange(num_train), y])
loss = loss / num_train + reg * np.sum(W * W)
scores_ = np.exp(scores) / np.sum(np.exp(scores), axis = 1).reshape(-1, 1)
scores_[np.arange(num_train), y] -= 1
dW = X.T.dot(scores_) / num_train + 2 * reg * W
也与SVM的损失函数计算过程大部分相似。
完成 Validation
对不同的学习率和λ \lambdaλ参数分别进行训练、验证,确定效果较好的超参数。
from cs231n.classifiers import Softmax
results = {}
best_val = -1
best_softmax = None
learning_rates = np.linspace(0.8e-7, 1.2e-7, 5)
regularization_strengths = np.linspace(2e4, 3e4, 3)
for lr in learning_rates:
for rs in regularization_strengths:
softmax = Softmax()
loss_hist = softmax.train(X_train, y_train, learning_rate=lr, reg=rs,
num_iters=1500, verbose=False)
y_train_pred = softmax.predict(X_train)
train_acc = np.mean(y_train == y_train_pred)
y_val_pred = softmax.predict(X_val)
val_acc = np.mean(y_val == y_val_pred)
results[(lr, rs)] = (train_acc, val_acc)
if (val_acc > best_val):
best_val = val_acc
best_softmax = softmax
for lr, reg in sorted(results):
train_accuracy, val_accuracy = results[(lr, reg)]
print('lr %e reg %e train accuracy: %f val accuracy: %f' % (
lr, reg, train_accuracy, val_accuracy))
print('best validation accuracy achieved during cross-validation: %f' % best_val)
与SVM的Validation过程大部分相似。
Inline Questions
Inline Question 1
Why do we expect our loss to be close to -log(0.1)? Explain briefly.
答:softmax的损失函数应接近于 − log ( 0.1 ) -\log(0.1) −log(0.1),是因为 L i = − log ( e f y i ∑ j e f j ) L_i = -\log\left(\frac{e^{f_{y_i}}}{ \sum_j e^{f_j} }\right) Li=−log(∑jefjefyi)中每一类的 f f f差异不大,而当所有 f f f均相等时恰好有 L i = − log ( 0.1 ) L_i=-\log(0.1) Li=−log(0.1)。
Inline Question 2
True or False
Suppose the overall training loss is defined as the sum of the per-datapoint loss over all training examples. It is possible to add a new datapoint to a training set that would leave the SVM loss unchanged, but this is not the case with the Softmax classifier loss.
答案:正确。
解释:SVM总训练损失在加入一个新数据点后可能不变(hinge loss可能为0),但Softmax总训练损失在加入一个新数据点后一定变化(cross-entropy loss一定大于0)。
Results
不同损失函数实现方法的运行时间对比
naive loss: 2.383276e+00 computed in 0.224043s
vectorized loss: 2.383276e+00 computed in 0.015238s
Loss difference: 0.000000
Gradient difference: 0.000000
矢量化实现的运行速度远高于用循环实现的速度(超出约1个数量级),体现了矢量化编程的高效。
学习率和 λ \lambda λ的 Validation
初始参数:
learning_rates = [1e-7, 5e-7]
regularization_strengths = [2.5e4, 5e4]
num_iters=1500
结果:
lr 1.000000e-07 reg 2.500000e+04 train accuracy: 0.333837 val accuracy: 0.348000
lr 1.000000e-07 reg 5.000000e+04 train accuracy: 0.306347 val accuracy: 0.323000
lr 5.000000e-07 reg 2.500000e+04 train accuracy: 0.315959 val accuracy: 0.331000
lr 5.000000e-07 reg 5.000000e+04 train accuracy: 0.296837 val accuracy: 0.316000
best validation accuracy achieved during cross-validation: 0.348000
对训练超参数进行精度逐渐提升的尝试:
第一轮参数:
learning_rates = np.linspace(0.8e-7, 1.2e-7, 5)
regularization_strengths = np.linspace(2e4, 3e4, 3)
num_iters=1500
结果:
lr 8.000000e-08 reg 2.000000e+04 train accuracy: 0.333980 val accuracy: 0.344000
lr 8.000000e-08 reg 2.500000e+04 train accuracy: 0.325898 val accuracy: 0.346000
lr 8.000000e-08 reg 3.000000e+04 train accuracy: 0.317714 val accuracy: 0.326000
lr 9.000000e-08 reg 2.000000e+04 train accuracy: 0.335449 val accuracy: 0.351000
lr 9.000000e-08 reg 2.500000e+04 train accuracy: 0.327327 val accuracy: 0.347000
lr 9.000000e-08 reg 3.000000e+04 train accuracy: 0.319755 val accuracy: 0.336000
lr 1.000000e-07 reg 2.000000e+04 train accuracy: 0.335286 val accuracy: 0.338000
lr 1.000000e-07 reg 2.500000e+04 train accuracy: 0.331980 val accuracy: 0.351000
lr 1.000000e-07 reg 3.000000e+04 train accuracy: 0.319490 val accuracy: 0.337000
lr 1.100000e-07 reg 2.000000e+04 train accuracy: 0.339408 val accuracy: 0.351000
lr 1.100000e-07 reg 2.500000e+04 train accuracy: 0.329388 val accuracy: 0.351000
lr 1.100000e-07 reg 3.000000e+04 train accuracy: 0.318633 val accuracy: 0.330000
lr 1.200000e-07 reg 2.000000e+04 train accuracy: 0.340551 val accuracy: 0.358000
lr 1.200000e-07 reg 2.500000e+04 train accuracy: 0.324265 val accuracy: 0.346000
lr 1.200000e-07 reg 3.000000e+04 train accuracy: 0.318918 val accuracy: 0.328000
best validation accuracy achieved during cross-validation: 0.358000
以上超参数试验满足题目的验证集准确率高于35%的要求。
总准确率
softmax on raw pixels final test set accuracy: 0.352000
用验证集上效果最好的Softmax模型在test集上进行测试,得到准确率为35.2%,略低于SVM模型。
权重可视化
Two-Layer Neural Network
Image Features
标签:CS231n,val,1.3,train,2020,np,lr,reg,accuracy 来源: https://blog.csdn.net/naive_learner/article/details/113180931