其他分享
首页 > 其他分享> > CS231N课程学习小结(assignment1)

CS231N课程学习小结(assignment1)

作者:互联网

1.image classification

主要是用数据驱动的算法,将data分为train_data,val_data,test_data.在train上用不同的超参数调试不同的结果,在验证集上进行评估,再用在验证集上表现最好的超参数应用到test上。
image classifier,data_driven approach,

实例1:knn 最近邻算法

代码分为:载入数据(cifar-10),处理数据,训练模型,测试数据,交叉验证这五个部分。

重点看一下knn的算法体现部分:

(1)两层循环计算距离:

def compute_distances_two_loops(self, X):
    num_test = X.shape[0]
    num_train = self.x_train.shape[0]
    dists = np.zeros((num_test, num_train))
    for i in xrange(num_test):
      for j in xrange(num_train):
                dists[i,j]=np.sqrt(np.sum((X[i,:]-self.x_train[j,:])**2))#计算欧氏距离
    return dists

(2)一层循环计算距离:

def compute_distances_one_loop(self, X):
    num_test = X.shape[0]
    num_train = self.x_train.shape[0]
    dists = np.zeros((num_test, num_train))
    for i in xrange(num_test):
      dists[i]=np.sqrt(np.sum(np.square(X[i,:]-self.x_train),axis=1))#计算欧氏距离,用一层循环
    return dists
    prints(dists)

(3)矩阵运算计算距离:

def compute_distances_no_loops(self, X):
    num_test = X.shape[0]
    num_train = self.x_train.shape[0]
    dists = np.zeros((num_test, num_train)) 
    x_2=np.sum(np.square(X),axis=1)#计算x平方
    x_train_2=np.sum(np.square(self.x_train),axis=1)#计算x_train平方
    x_xtrain=np.dot(X,self.x_train.T)#计算x.*x_train
    dists=np.sqrt(x_2.reshape(-1,1)-2*x_xtrain+x_train_2)#计算距离
    return dists

(4)预测部分(投票机制)

def predict_labels(self, dists, k=1):
    num_test = dists.shape[0]
    y_pred = np.zeros(num_test)
    for i in xrange(num_test):
      order_dists=np.argsort(dists[i,:],axis=0)#按列排序
      target_k=self.y_train[order_dists[: k]]#记录类别
      y_pred[i]=np.argmax(np.bincount(target_k))
    return y_pred

投票机制中的python函数:

1.np.argsort 排序,配合axis=0操作就是按列排序
2.np.argmax 找出矩阵中的最大值所对应的索引
3.np.bincount 计数

(5)cross-validation

num_folds = 5
k_choices = [1, 3, 5, 8, 10, 12, 15, 20, 50, 100]
x_train_folds = []
y_train_folds = []
y_train = y_train.reshape(-1, 1)
x_train_folds = np.array_split(x_train, num_folds)
y_train_folds = np.array_split(y_train, num_folds)
k_to_accuracies={}
for k in k_choices:
   k_to_accuracies.setdefault(k, [])  # 设置字典
pass
classifier = KNearestNeighbor()
for i in range(num_folds):
    x_train = np.vstack(x_train_folds[0:i] + x_train_folds[i + 1:])  # 设置训练集,去掉i在的val
    y_train = np.vstack(y_train_folds[0:i] + y_train_folds[i + 1:])  # label
    y_train = y_train[:, 0]
    classifier.train(x_train, y_train)
    for k in k_choices:
        x_pred = x_train_folds[i]  # 预测集的x
        y_pred = classifier.predict(x_pred, k=k)  # 预测出y
        num_correct = np.sum(y_pred == y_train_folds[i][:,0])
        accuracy = float(num_correct) / len(y_pred)  # 计算精度
        k_to_accuracies[k] = k_to_accuracies[k] + [accuracy]  # 将精度放入字典中
# Print out the computed accuracies
for k in sorted(k_to_accuracies):
    for accuracy in k_to_accuracies[k]:
        print('k = %d, accuracy = %f' % (k, accuracy))
字典操作

在这里构建了一个精度字典,k_to_accuracies,相对应的分别为k-accuracy。这里用到了setdefault函数,与get相比,该函数可以在字典中没有某要查询的值的时候加入这个查询值。

实例2 SVM 支持向量机

代码分为:数据加载,数据处理,SVM-classifier,超参数调节。

重点看一下svm算法部分。

with loops

def svm_loss_naive(W, X, y, reg):
  dW = np.zeros(W.shape) # initialize the gradient as zero
  num_classes = W.shape[1]
  num_train = X.shape[0]
  loss = 0.0
  for i in xrange(num_train):
    scores = X[i].dot(W)
    correct_class_score = scores[y[i]]#正确类别的得分
    for j in xrange(num_classes):
      if j == y[i]:
      #跳过正确类别
        continue
      margin = scores[j] - correct_class_score + 1 # note delta = 1,得分-正确类别得分+1,参照公式
      if margin > 0:
        loss += margin
        dW[:,j] += X[i].T
        dW[:,y[i]] -= X[i].T
  loss /= num_train
  dW/=num_train
  # Add regularization to the loss.
  loss += 0.5*reg * np.sum(W * W)
  dW +=reg*W
  return loss, dW

no loops

def svm_loss_vectorized(W, X, y, reg):
  loss = 0.0
  dW = np.zeros(W.shape) # initialize the gradient as zero
  num_classes=W.shape[1]
  num_train=X.shape[0]
  scores=X.dot(W)
correct_class_scores=scores[np.arange(num_train),list(y)].reshape(-1,1)
  margins=scores-correct_class_scores+1
  margins[margins<0]=0
  margins[np.arange(num_train),y]=0
  loss=np.sum(margins)/num_train
  loss+=0.5*reg*np.sum(W*W)
  margins[margins>0]=1
  e_number=np.sum(margins,axis=1)#符合条件的有贡献的数量
  margins[np.arange(num_train),y]-=e_number
  dW=np.dot(X.T,margins)/num_train
  dW+=reg*W
  return loss, dW

一些实用的python函数,有 (axis=1,按行操作) (reshape(-1,1),将某一行矩阵变成一列)
(矩阵中,y.shape[0]返回y中行的总数,基本上与len(y)用法相同,y.shape[1]返回y中列的总数)

超参数调试

learning_rates = [1e-7, 5e-5]
regularization_strengths = [2.5e4, 5e4]
results = {}
best_val = -1   # The highest validation accuracy that we have seen so far.
best_svm = None 
from cs231n.classifiers import LinearSVM
for lr,reg in zip(learning_rates,regularization_strengths):
    svm=LinearSVM()
svm.train(X_train,y_train,learning_rate=lr,reg=reg,num_iters=1000,verbose=False)
    y_train_pred=svm.predict(X_train)
    train_accuracy=np.mean(y_train==y_train_pred)
    y_val_pred=svm.predict(X_val)
    val_accuracy=np.mean(y_val==y_val_pred)
    results[(lr,reg)]=(train_accuracy,val_accuracy)
    if best_val<val_accuracy:
        best_val=val_accuracy
        best_svm=svm

pass

在这里,有用zip(learning_rates,regularization_strengths)输出了一些lr与reg的组合,方便超参数调试。

实例3 softmax损失函数

代码分为数据加载,数据处理,softmax-classfier,超参数调试

在这里,softmax与svm不同之处在于损失函数不同,其余的训练过程等基本思路一致。

with loops

import numpy as np
from random import shuffle
from past.builtins import xrange

def softmax_loss_naive(W, X, y, reg):
  # Initialize the loss and gradient to zero.
  loss = 0.0
  dW = np.zeros_like(W)
  num_class=W.shape[1]
  num_train=X.shape[0]
  for i in range(num_train):
    scores=X[i].dot(W)
    shift_scores=scores-max(scores)
    loss -= np.log( np.exp(shift_scores[y[i]]) / np.sum(np.exp(shift_scores)) )
    for j in xrange(num_class):
     softmax_output = np.exp(shift_scores[j]) / np.sum(np.exp(shift_scores))
     if j == y[i]:
       dW[:,j] += (-1 + softmax_output) * X[i,:]#根据反向传播算法
      else:
       dW[:,j] += softmax_output * X[i,:]

  loss /= num_train
  loss += 0.5 * reg * np.sum(W * W)
  dW /= num_train
  dW += reg * W
  pass
  return loss, dW

no loops

def softmax_loss_vectorized(W, X, y, reg):
  # Initialize the loss and gradient to zero.
  loss = 0.0
  dW = np.zeros_like(W)
  num_class=W.shape[1]
  num_train=X.shape[0]
  scores=X.dot(W)
  shift_scores=scores-np.max(scores,axis=1).reshape(-1,1)#避免e的x次方溢出,很容易变为nan
  softmax_out=np.exp(shift_scores)/np.sum(np.exp(shift_scores),axis=1).reshape((-1,1))
  loss = np.sum( -1 * np.log( softmax_out[range(num_train),y] ) )
  loss /= num_train
  loss += 0.5 * reg * np.sum(W * W)
  dS = softmax_out.copy()
  dS[range(num_train), list(y)] += -1
  dW = (X.T).dot(dS)
  dW = dW / num_train + reg * W  
  return loss, dW

在这里,计算dw的时候用到了反向传播算法中的链式准则,一开始不太理解,课程往后看看,就理解了。以及对scores进行shift处理,也是因为避免e的x次方太大,使得函数值为nan。

标签:assignment1,CS231N,loss,num,train,scores,np,dW,小结
来源: https://blog.csdn.net/moliaochunfeng/article/details/99693044