CS231N课程学习小结(assignment1)
作者:互联网
1.image classification
主要是用数据驱动的算法,将data分为train_data,val_data,test_data.在train上用不同的超参数调试不同的结果,在验证集上进行评估,再用在验证集上表现最好的超参数应用到test上。
image classifier,data_driven approach,
实例1:knn 最近邻算法
代码分为:载入数据(cifar-10),处理数据,训练模型,测试数据,交叉验证这五个部分。
重点看一下knn的算法体现部分:
(1)两层循环计算距离:
def compute_distances_two_loops(self, X):
num_test = X.shape[0]
num_train = self.x_train.shape[0]
dists = np.zeros((num_test, num_train))
for i in xrange(num_test):
for j in xrange(num_train):
dists[i,j]=np.sqrt(np.sum((X[i,:]-self.x_train[j,:])**2))#计算欧氏距离
return dists
(2)一层循环计算距离:
def compute_distances_one_loop(self, X):
num_test = X.shape[0]
num_train = self.x_train.shape[0]
dists = np.zeros((num_test, num_train))
for i in xrange(num_test):
dists[i]=np.sqrt(np.sum(np.square(X[i,:]-self.x_train),axis=1))#计算欧氏距离,用一层循环
return dists
prints(dists)
(3)矩阵运算计算距离:
def compute_distances_no_loops(self, X):
num_test = X.shape[0]
num_train = self.x_train.shape[0]
dists = np.zeros((num_test, num_train))
x_2=np.sum(np.square(X),axis=1)#计算x平方
x_train_2=np.sum(np.square(self.x_train),axis=1)#计算x_train平方
x_xtrain=np.dot(X,self.x_train.T)#计算x.*x_train
dists=np.sqrt(x_2.reshape(-1,1)-2*x_xtrain+x_train_2)#计算距离
return dists
(4)预测部分(投票机制)
def predict_labels(self, dists, k=1):
num_test = dists.shape[0]
y_pred = np.zeros(num_test)
for i in xrange(num_test):
order_dists=np.argsort(dists[i,:],axis=0)#按列排序
target_k=self.y_train[order_dists[: k]]#记录类别
y_pred[i]=np.argmax(np.bincount(target_k))
return y_pred
投票机制中的python函数:
1.np.argsort 排序,配合axis=0操作就是按列排序
2.np.argmax 找出矩阵中的最大值所对应的索引
3.np.bincount 计数
(5)cross-validation
num_folds = 5
k_choices = [1, 3, 5, 8, 10, 12, 15, 20, 50, 100]
x_train_folds = []
y_train_folds = []
y_train = y_train.reshape(-1, 1)
x_train_folds = np.array_split(x_train, num_folds)
y_train_folds = np.array_split(y_train, num_folds)
k_to_accuracies={}
for k in k_choices:
k_to_accuracies.setdefault(k, []) # 设置字典
pass
classifier = KNearestNeighbor()
for i in range(num_folds):
x_train = np.vstack(x_train_folds[0:i] + x_train_folds[i + 1:]) # 设置训练集,去掉i在的val
y_train = np.vstack(y_train_folds[0:i] + y_train_folds[i + 1:]) # label
y_train = y_train[:, 0]
classifier.train(x_train, y_train)
for k in k_choices:
x_pred = x_train_folds[i] # 预测集的x
y_pred = classifier.predict(x_pred, k=k) # 预测出y
num_correct = np.sum(y_pred == y_train_folds[i][:,0])
accuracy = float(num_correct) / len(y_pred) # 计算精度
k_to_accuracies[k] = k_to_accuracies[k] + [accuracy] # 将精度放入字典中
# Print out the computed accuracies
for k in sorted(k_to_accuracies):
for accuracy in k_to_accuracies[k]:
print('k = %d, accuracy = %f' % (k, accuracy))
字典操作
在这里构建了一个精度字典,k_to_accuracies,相对应的分别为k-accuracy。这里用到了setdefault函数,与get相比,该函数可以在字典中没有某要查询的值的时候加入这个查询值。
实例2 SVM 支持向量机
代码分为:数据加载,数据处理,SVM-classifier,超参数调节。
重点看一下svm算法部分。
with loops
def svm_loss_naive(W, X, y, reg):
dW = np.zeros(W.shape) # initialize the gradient as zero
num_classes = W.shape[1]
num_train = X.shape[0]
loss = 0.0
for i in xrange(num_train):
scores = X[i].dot(W)
correct_class_score = scores[y[i]]#正确类别的得分
for j in xrange(num_classes):
if j == y[i]:
#跳过正确类别
continue
margin = scores[j] - correct_class_score + 1 # note delta = 1,得分-正确类别得分+1,参照公式
if margin > 0:
loss += margin
dW[:,j] += X[i].T
dW[:,y[i]] -= X[i].T
loss /= num_train
dW/=num_train
# Add regularization to the loss.
loss += 0.5*reg * np.sum(W * W)
dW +=reg*W
return loss, dW
no loops
def svm_loss_vectorized(W, X, y, reg):
loss = 0.0
dW = np.zeros(W.shape) # initialize the gradient as zero
num_classes=W.shape[1]
num_train=X.shape[0]
scores=X.dot(W)
correct_class_scores=scores[np.arange(num_train),list(y)].reshape(-1,1)
margins=scores-correct_class_scores+1
margins[margins<0]=0
margins[np.arange(num_train),y]=0
loss=np.sum(margins)/num_train
loss+=0.5*reg*np.sum(W*W)
margins[margins>0]=1
e_number=np.sum(margins,axis=1)#符合条件的有贡献的数量
margins[np.arange(num_train),y]-=e_number
dW=np.dot(X.T,margins)/num_train
dW+=reg*W
return loss, dW
一些实用的python函数,有 (axis=1,按行操作) (reshape(-1,1),将某一行矩阵变成一列)
(矩阵中,y.shape[0]返回y中行的总数,基本上与len(y)用法相同,y.shape[1]返回y中列的总数)
超参数调试
learning_rates = [1e-7, 5e-5]
regularization_strengths = [2.5e4, 5e4]
results = {}
best_val = -1 # The highest validation accuracy that we have seen so far.
best_svm = None
from cs231n.classifiers import LinearSVM
for lr,reg in zip(learning_rates,regularization_strengths):
svm=LinearSVM()
svm.train(X_train,y_train,learning_rate=lr,reg=reg,num_iters=1000,verbose=False)
y_train_pred=svm.predict(X_train)
train_accuracy=np.mean(y_train==y_train_pred)
y_val_pred=svm.predict(X_val)
val_accuracy=np.mean(y_val==y_val_pred)
results[(lr,reg)]=(train_accuracy,val_accuracy)
if best_val<val_accuracy:
best_val=val_accuracy
best_svm=svm
pass
在这里,有用zip(learning_rates,regularization_strengths)输出了一些lr与reg的组合,方便超参数调试。
实例3 softmax损失函数
代码分为数据加载,数据处理,softmax-classfier,超参数调试
在这里,softmax与svm不同之处在于损失函数不同,其余的训练过程等基本思路一致。
with loops
import numpy as np
from random import shuffle
from past.builtins import xrange
def softmax_loss_naive(W, X, y, reg):
# Initialize the loss and gradient to zero.
loss = 0.0
dW = np.zeros_like(W)
num_class=W.shape[1]
num_train=X.shape[0]
for i in range(num_train):
scores=X[i].dot(W)
shift_scores=scores-max(scores)
loss -= np.log( np.exp(shift_scores[y[i]]) / np.sum(np.exp(shift_scores)) )
for j in xrange(num_class):
softmax_output = np.exp(shift_scores[j]) / np.sum(np.exp(shift_scores))
if j == y[i]:
dW[:,j] += (-1 + softmax_output) * X[i,:]#根据反向传播算法
else:
dW[:,j] += softmax_output * X[i,:]
loss /= num_train
loss += 0.5 * reg * np.sum(W * W)
dW /= num_train
dW += reg * W
pass
return loss, dW
no loops
def softmax_loss_vectorized(W, X, y, reg):
# Initialize the loss and gradient to zero.
loss = 0.0
dW = np.zeros_like(W)
num_class=W.shape[1]
num_train=X.shape[0]
scores=X.dot(W)
shift_scores=scores-np.max(scores,axis=1).reshape(-1,1)#避免e的x次方溢出,很容易变为nan
softmax_out=np.exp(shift_scores)/np.sum(np.exp(shift_scores),axis=1).reshape((-1,1))
loss = np.sum( -1 * np.log( softmax_out[range(num_train),y] ) )
loss /= num_train
loss += 0.5 * reg * np.sum(W * W)
dS = softmax_out.copy()
dS[range(num_train), list(y)] += -1
dW = (X.T).dot(dS)
dW = dW / num_train + reg * W
return loss, dW
在这里,计算dw的时候用到了反向传播算法中的链式准则,一开始不太理解,课程往后看看,就理解了。以及对scores进行shift处理,也是因为避免e的x次方太大,使得函数值为nan。
标签:assignment1,CS231N,loss,num,train,scores,np,dW,小结 来源: https://blog.csdn.net/moliaochunfeng/article/details/99693044