其他分享
首页 > 其他分享> > 全球名校课程作业分享系列(8)--斯坦福计算机视觉与深度学习CS231n之tensorflow实践

全球名校课程作业分享系列(8)--斯坦福计算机视觉与深度学习CS231n之tensorflow实践

作者:互联网

原文链接:https://blog.csdn.net/yaoqiang2011/article/details/79278930

课程作业原地址:CS231n Assignment 1
作业及整理:@邓妍蕾 && @郭承坤 && @寒小阳
时间:2018年2月。
出处:http://blog.csdn.net/han_xiaoyang/article/details/79278930

在前面的作业中你已经写了很多代码来实现很多的神经网络功能。Dropout, Batch Norm 和 2D卷积是深度学习在计算机视觉中的一些重活。你已经很努力地让你的代码有效率以及向量化。

对于这份作业的最后一个部分,我们不会继续探讨之前的代码,而是转到两个流行的深度学习框架之一。在这份Notebook中,主要是Tensorflow(在其他的notebook中,还会有PyTorch代码).

TensorFlow是什么?

Tensorflow是基于Tensor来执行计算图的系统,对于变量(Variables)有原生的自动反向求导的功能。在它里面,我们用的n维数组的tensor相当于是numpy中的ndarray。

为什么用tensorflow?

我该怎么学习TensorFlow?

TensorFlow已经有许多优秀的教程,包括来自google自己的那些

另外,这个notebook也会带领你过一遍在TensorFlow中,训练模型所需要用到的许多东西。如果你需要学习更多内容,或者了解更多细节,可以去看本Notebook的结尾部分,那里可以找到一些有用的教程链接。

加载数据

import tensorflow as tf
import numpy as np
import math
import timeit
import matplotlib.pyplot as plt
%matplotlib inline
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
from cs231n.data_utils import load_CIFAR10

def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=10000):
“”"
Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
it for the two-layer neural net classifier. These are the same steps as
we used for the SVM, but condensed to a single function.
“”"
# Load the raw CIFAR-10 data
cifar10_dir = ‘cs231n/datasets/cifar-10-batches-py’
X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)

<span class="token comment"># Subsample the data</span>
mask <span class="token operator">=</span> <span class="token builtin">range</span><span class="token punctuation">(</span>num_training<span class="token punctuation">,</span> num_training <span class="token operator">+</span> num_validation<span class="token punctuation">)</span>
X_val <span class="token operator">=</span> X_train<span class="token punctuation">[</span>mask<span class="token punctuation">]</span>
y_val <span class="token operator">=</span> y_train<span class="token punctuation">[</span>mask<span class="token punctuation">]</span>
mask <span class="token operator">=</span> <span class="token builtin">range</span><span class="token punctuation">(</span>num_training<span class="token punctuation">)</span>
X_train <span class="token operator">=</span> X_train<span class="token punctuation">[</span>mask<span class="token punctuation">]</span>
y_train <span class="token operator">=</span> y_train<span class="token punctuation">[</span>mask<span class="token punctuation">]</span>
mask <span class="token operator">=</span> <span class="token builtin">range</span><span class="token punctuation">(</span>num_test<span class="token punctuation">)</span>
X_test <span class="token operator">=</span> X_test<span class="token punctuation">[</span>mask<span class="token punctuation">]</span>
y_test <span class="token operator">=</span> y_test<span class="token punctuation">[</span>mask<span class="token punctuation">]</span>

<span class="token comment"># Normalize the data: subtract the mean image</span>
mean_image <span class="token operator">=</span> np<span class="token punctuation">.</span>mean<span class="token punctuation">(</span>X_train<span class="token punctuation">,</span> axis<span class="token operator">=</span><span class="token number">0</span><span class="token punctuation">)</span>
X_train <span class="token operator">-=</span> mean_image
X_val <span class="token operator">-=</span> mean_image
X_test <span class="token operator">-=</span> mean_image

<span class="token keyword">return</span> X_train<span class="token punctuation">,</span> y_train<span class="token punctuation">,</span> X_val<span class="token punctuation">,</span> y_val<span class="token punctuation">,</span> X_test<span class="token punctuation">,</span> y_test

# Invoke the above function to get our data.
X_train, y_train, X_val, y_val, X_test, y_test = get_CIFAR10_data()
print('Train data shape: ', X_train.shape)
print('Train labels shape: ', y_train.shape)
print('Validation data shape: ', X_val.shape)
print('Validation labels shape: ', y_val.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)

Train data shape:  (49000, 32, 32, 3)
Train labels shape:  (49000,)
Validation data shape:  (1000, 32, 32, 3)
Validation labels shape:  (1000,)
Test data shape:  (10000, 32, 32, 3)
Test labels shape:  (10000,)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

案例模型

一些实用的建议

我们的图像数据格式是:N x H x W x C, 其中

这是一种正确的表示数据的方式,比如当我们做一些像是2D卷积这样的操作,需要理解空间上相邻的像素点。但是,当我们把图像数据放到全连接的仿射层(affine layers)中时,我们希望一个数据样本可以用一个向量来表示,这个时候,把数据分成不同的通道、行和列就不再有用了。

案例模型本尊

训练你自己模型的第一步就是要定义它的结构。
这里有一个定义在TensorFlow中的卷积神经网络的例子 – 试着搞清楚每一行都在做什么,要记住,每一行都建立在前一行之上。 目前我们还没有训练什么东西 – 这后面会讲到 – 现在, 我们希望你能够明白这些东西都是怎么建立起来的。

在这个例子里面,你们会看到2D的卷积层, ReLU激活层,和全连接层(线性的)。 你们也会看到Hinge loss损失函数, 以及Adam优化器是如何使用的。

确保要明白为什么线性层的参数是5408和10。

TensorFlow细节

在TensorFlow中,像我们前面的Notebook一样,我们首先要初始化我们的变量,然后是我们的模型。

# clear old variables
tf.reset_default_graph()

# setup input (e.g. the data that changes every batch)
# The first dim is None, and gets sets automatically based on batch size fed in
# 设置输入,比如每个batch要输入的数据
# 第一维是None, 可以根据输入的batch size自动改变。

X = tf.placeholder(tf.float32, [None, 32, 32, 3])
y = tf.placeholder(tf.int64, [None])
is_training = tf.placeholder(tf.bool)

def simple_model(X,y):
# define our weights (e.g. init_two_layer_convnet)
# 定义权重W
# setup variables
# 设置变量
Wconv1 = tf.get_variable(“Wconv1”, shape=[7, 7, 3, 32])
bconv1 = tf.get_variable(“bconv1”, shape=[32])
W1 = tf.get_variable(“W1”, shape=[5408, 10])
b1 = tf.get_variable(“b1”, shape=[10])

<span class="token comment"># define our graph (e.g. two_layer_convnet)</span>
<span class="token comment"># 定义我们的图 </span>


<span class="token comment"># 这里我们需要用到conv2d函数,建议大家仔细阅读官方文档</span>
<span class="token comment"># tf.nn.conv2d()  https://www.tensorflow.org/api_docs/python/tf/nn/conv2d</span>
<span class="token comment"># conv2d(input,filter,strides,padding,use_cudnn_on_gpu=None,data_format=None,name=None)</span>
<span class="token comment"># input : [batch, in_height, in_width, in_channels]</span>
<span class="token comment"># filter/kernel: [filter_height, filter_width, in_channels, out_channels]</span>
<span class="token comment"># strides: 长度为4的1维tensor,用来指定在每一个维度上的滑动的窗口滑动的步长</span>
<span class="token comment"># 水平或者垂直滑动通常会指定strides = [1,stride,,stride,1] </span>
<span class="token comment"># padding: 'VALID' 或者是 'SAME'</span>
<span class="token comment"># data_format: 数据的输入格式,默认是‘NHWC’ </span>


<span class="token comment"># 根据输出的大小的公式:(W-F+2P)/S + 1</span>
<span class="token comment"># W: 图像宽度   32</span>
<span class="token comment"># F:Filter的宽度  7</span>
<span class="token comment"># P: padding了多少  0</span>
<span class="token comment"># padding='valid' 就是不padding  padding='same' 自动padding若干个行列使得输出的feature map和原输入feature map的尺寸一致</span>
<span class="token comment"># S: stride 步长  2</span>

a1 <span class="token operator">=</span> tf<span class="token punctuation">.</span>nn<span class="token punctuation">.</span>conv2d<span class="token punctuation">(</span>X<span class="token punctuation">,</span> Wconv1<span class="token punctuation">,</span> strides<span class="token operator">=</span><span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">2</span><span class="token punctuation">,</span><span class="token number">2</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">,</span> padding<span class="token operator">=</span><span class="token string">'VALID'</span><span class="token punctuation">)</span> <span class="token operator">+</span> bconv1
<span class="token comment"># (W-F+2P)/S + 1 = (32 - 7 + 2*0)/2 + 1 = 13</span>
<span class="token comment"># 那么输出的feature map的尺寸就是 13 * 13 * 32 = 5408   (Wconv1 有32个out channels, 也就是说有32个filters)</span>

h1 <span class="token operator">=</span> tf<span class="token punctuation">.</span>nn<span class="token punctuation">.</span>relu<span class="token punctuation">(</span>a1<span class="token punctuation">)</span> <span class="token comment"># 对a1中的每个神经元加上激活函数relu</span>
h1_flat <span class="token operator">=</span> tf<span class="token punctuation">.</span>reshape<span class="token punctuation">(</span>h1<span class="token punctuation">,</span><span class="token punctuation">[</span><span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">5408</span><span class="token punctuation">]</span><span class="token punctuation">)</span>  <span class="token comment"># reshape h1,把feature map展开成 batchsize * 5408</span>
y_out <span class="token operator">=</span> tf<span class="token punctuation">.</span>matmul<span class="token punctuation">(</span>h1_flat<span class="token punctuation">,</span>W1<span class="token punctuation">)</span> <span class="token operator">+</span> b1  <span class="token comment"># 得到输出的logits: y_out</span>
<span class="token keyword">return</span> y_out

y_out = simple_model(X,y)

# define our loss
# 定义我们的loss

total_loss = tf.losses.hinge_loss(tf.one_hot(y,10),logits=y_out)
mean_loss = tf.reduce_mean(total_loss) # loss求平均

# define our optimizer
# 定义优化器,设置学习率
optimizer = tf.train.AdamOptimizer(5e-4) # select optimizer and set learning rate
train_step = optimizer.minimize(mean_loss)

TensorFlow支持许多其他层的类型,损失函数和优化器 - 你将在后面的实验中遇到。 这里是官方的API文档(如果上面有任何参数搞不懂,这些资源就会非常有用)

训练一轮

我们在上面已经定义了图所需要的操作,为了能够执行TensorFlow图中定义的计算,我们首先需要创建一个tf.Session对象。一个session中包含了TensorFlow运行时的状态。更多内容请参考TensorFlow指南 Getting started

我们也可以指定一个设备,比如/cpu:0 或者 /gpu:0。 这种类型的操作可以参考this TensorFlow guide

下面你应该可以看到验证集上的loss在0.4到0.6之间,准确率在0.3到0.35。

def run_model(session, predict, loss_val, Xd, yd,
              epochs=1, batch_size=64, print_every=100,
              training=None, plot_losses=False):
<span class="token triple-quoted-string string">'''
run model函数主要是控制整个训练的流程,需要传入session,调用session.run(variables)会得到variables里面各个变量的值。
这里当训练模式的时候,也就是training!=None,我们传入的training是之前定义的train_op,调用session.run(train_op)会自动完成反向求导,
整个模型的参数会发生更新。
当training==None时,是我们需要对验证集合做一次预测的时候(或者是测试阶段),这时我们不需要反向求导,所以variables里面并没有加入train_op
'''</span>
<span class="token comment"># have tensorflow compute accuracy</span>
<span class="token comment"># 计算准确度(ACC值)</span>
correct_prediction <span class="token operator">=</span> tf<span class="token punctuation">.</span>equal<span class="token punctuation">(</span>tf<span class="token punctuation">.</span>argmax<span class="token punctuation">(</span>predict<span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">,</span> y<span class="token punctuation">)</span>
accuracy <span class="token operator">=</span> tf<span class="token punctuation">.</span>reduce_mean<span class="token punctuation">(</span>tf<span class="token punctuation">.</span>cast<span class="token punctuation">(</span>correct_prediction<span class="token punctuation">,</span> tf<span class="token punctuation">.</span>float32<span class="token punctuation">)</span><span class="token punctuation">)</span>

<span class="token comment"># shuffle indicies</span>
<span class="token comment"># 对训练样本进行混洗</span>
train_indicies <span class="token operator">=</span> np<span class="token punctuation">.</span>arange<span class="token punctuation">(</span>Xd<span class="token punctuation">.</span>shape<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
np<span class="token punctuation">.</span>random<span class="token punctuation">.</span>shuffle<span class="token punctuation">(</span>train_indicies<span class="token punctuation">)</span>

training_now <span class="token operator">=</span> training <span class="token keyword">is</span> <span class="token operator">not</span> <span class="token boolean">None</span>

<span class="token comment"># setting up variables we want to compute (and optimizing)</span>
<span class="token comment"># if we have a training function, add that to things we compute</span>
<span class="token comment"># 设置需要计算的变量</span>
<span class="token comment"># 如果需要进行训练,将训练过程(training)也加进来</span>
variables <span class="token operator">=</span> <span class="token punctuation">[</span>mean_loss<span class="token punctuation">,</span>correct_prediction<span class="token punctuation">,</span>accuracy<span class="token punctuation">]</span>
<span class="token keyword">if</span> training_now<span class="token punctuation">:</span>
    variables<span class="token punctuation">[</span><span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">]</span> <span class="token operator">=</span> training

<span class="token comment"># counter </span>
<span class="token comment"># 进行迭代</span>
iter_cnt <span class="token operator">=</span> <span class="token number">0</span>
<span class="token keyword">for</span> e <span class="token keyword">in</span> <span class="token builtin">range</span><span class="token punctuation">(</span>epochs<span class="token punctuation">)</span><span class="token punctuation">:</span>
    <span class="token comment"># keep track of losses and accuracy</span>
    <span class="token comment"># 记录损失函数和准确度的变化</span>
    correct <span class="token operator">=</span> <span class="token number">0</span>
    losses <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token punctuation">]</span>
    <span class="token comment"># make sure we iterate over the dataset once</span>
    <span class="token comment"># 确保每个训练样本都被遍历</span>
    <span class="token keyword">for</span> i <span class="token keyword">in</span> <span class="token builtin">range</span><span class="token punctuation">(</span><span class="token builtin">int</span><span class="token punctuation">(</span>math<span class="token punctuation">.</span>ceil<span class="token punctuation">(</span>Xd<span class="token punctuation">.</span>shape<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token operator">/</span>batch_size<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">:</span>
        <span class="token comment"># generate indicies for the batch</span>
        <span class="token comment"># 产生一个minibatch的样本</span>
        start_idx <span class="token operator">=</span> <span class="token punctuation">(</span>i<span class="token operator">*</span>batch_size<span class="token punctuation">)</span><span class="token operator">%</span>Xd<span class="token punctuation">.</span>shape<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span>
        idx <span class="token operator">=</span> train_indicies<span class="token punctuation">[</span>start_idx<span class="token punctuation">:</span>start_idx<span class="token operator">+</span>batch_size<span class="token punctuation">]</span>
        
        <span class="token comment"># create a feed dictionary for this batch</span>
        <span class="token comment"># 生成一个输入字典(feed dictionary)</span>
        feed_dict <span class="token operator">=</span> <span class="token punctuation">{</span>X<span class="token punctuation">:</span> Xd<span class="token punctuation">[</span>idx<span class="token punctuation">,</span><span class="token punctuation">:</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
                     y<span class="token punctuation">:</span> yd<span class="token punctuation">[</span>idx<span class="token punctuation">]</span><span class="token punctuation">,</span>
                     is_training<span class="token punctuation">:</span> training_now <span class="token punctuation">}</span>
        <span class="token comment"># get batch size</span>
        <span class="token comment"># 获取minibatch的大小</span>
        actual_batch_size <span class="token operator">=</span> yd<span class="token punctuation">[</span>idx<span class="token punctuation">]</span><span class="token punctuation">.</span>shape<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span>
        
        <span class="token comment"># have tensorflow compute loss and correct predictions</span>
        <span class="token comment"># and (if given) perform a training step</span>
        <span class="token comment"># 计算损失函数和准确率</span>
        <span class="token comment"># 如果是训练模式的话,执行训练过程</span>
        loss<span class="token punctuation">,</span> corr<span class="token punctuation">,</span> _ <span class="token operator">=</span> session<span class="token punctuation">.</span>run<span class="token punctuation">(</span>variables<span class="token punctuation">,</span>feed_dict<span class="token operator">=</span>feed_dict<span class="token punctuation">)</span>
        
        <span class="token comment"># aggregate performance stats</span>
        <span class="token comment"># 记录本轮的训练表现</span>
        losses<span class="token punctuation">.</span>append<span class="token punctuation">(</span>loss<span class="token operator">*</span>actual_batch_size<span class="token punctuation">)</span>
        correct <span class="token operator">+=</span> np<span class="token punctuation">.</span><span class="token builtin">sum</span><span class="token punctuation">(</span>corr<span class="token punctuation">)</span>
        
        <span class="token comment"># print every now and then</span>
        <span class="token comment"># 定期输出模型表现</span>
        <span class="token keyword">if</span> training_now <span class="token operator">and</span> <span class="token punctuation">(</span>iter_cnt <span class="token operator">%</span> print_every<span class="token punctuation">)</span> <span class="token operator">==</span> <span class="token number">0</span><span class="token punctuation">:</span>
            <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">"Iteration {0}: with minibatch training loss = {1:.3g} and accuracy of {2:.2g}"</span>\
                  <span class="token punctuation">.</span><span class="token builtin">format</span><span class="token punctuation">(</span>iter_cnt<span class="token punctuation">,</span>loss<span class="token punctuation">,</span>np<span class="token punctuation">.</span><span class="token builtin">sum</span><span class="token punctuation">(</span>corr<span class="token punctuation">)</span><span class="token operator">/</span>actual_batch_size<span class="token punctuation">)</span><span class="token punctuation">)</span>
        iter_cnt <span class="token operator">+=</span> <span class="token number">1</span>
    total_correct <span class="token operator">=</span> correct<span class="token operator">/</span>Xd<span class="token punctuation">.</span>shape<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span>
    total_loss <span class="token operator">=</span> np<span class="token punctuation">.</span><span class="token builtin">sum</span><span class="token punctuation">(</span>losses<span class="token punctuation">)</span><span class="token operator">/</span>Xd<span class="token punctuation">.</span>shape<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span>
    <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">"Epoch {2}, Overall loss = {0:.3g} and accuracy of {1:.3g}"</span>\
          <span class="token punctuation">.</span><span class="token builtin">format</span><span class="token punctuation">(</span>total_loss<span class="token punctuation">,</span>total_correct<span class="token punctuation">,</span>e<span class="token operator">+</span><span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
    <span class="token keyword">if</span> plot_losses<span class="token punctuation">:</span>
        plt<span class="token punctuation">.</span>plot<span class="token punctuation">(</span>losses<span class="token punctuation">)</span>
        plt<span class="token punctuation">.</span>grid<span class="token punctuation">(</span><span class="token boolean">True</span><span class="token punctuation">)</span>
        plt<span class="token punctuation">.</span>title<span class="token punctuation">(</span><span class="token string">'Epoch {} Loss'</span><span class="token punctuation">.</span><span class="token builtin">format</span><span class="token punctuation">(</span>e<span class="token operator">+</span><span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
        plt<span class="token punctuation">.</span>xlabel<span class="token punctuation">(</span><span class="token string">'minibatch number'</span><span class="token punctuation">)</span>
        plt<span class="token punctuation">.</span>ylabel<span class="token punctuation">(</span><span class="token string">'minibatch loss'</span><span class="token punctuation">)</span>
        plt<span class="token punctuation">.</span>show<span class="token punctuation">(</span><span class="token punctuation">)</span>
<span class="token keyword">return</span> total_loss<span class="token punctuation">,</span>total_correct

with tf.Session() as sess:
with tf.device("/cpu:0"): #"/cpu:0" or “/gpu:0”
sess.run(tf.global_variables_initializer())
print(‘Training’)
run_model(sess,y_out,mean_loss,X_train,y_train,1,64,100,train_step,True)
print(‘Validation’)
run_model(sess,y_out,mean_loss,X_val,y_val,1,64)

Training
Iteration 0: with minibatch training loss = 14.5 and accuracy of 0.078
Iteration 100: with minibatch training loss = 0.89 and accuracy of 0.34
Iteration 200: with minibatch training loss = 0.678 and accuracy of 0.33
Iteration 300: with minibatch training loss = 0.832 and accuracy of 0.16
Iteration 400: with minibatch training loss = 0.524 and accuracy of 0.33
Iteration 500: with minibatch training loss = 0.487 and accuracy of 0.44
Iteration 600: with minibatch training loss = 0.467 and accuracy of 0.33
Iteration 700: with minibatch training loss = 0.399 and accuracy of 0.41
Epoch 1, Overall loss = 0.771 and accuracy of 0.31
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10

png

Validation
Epoch 1, Overall loss = 0.472 and accuracy of 0.373
  • 1
  • 2

训练一个特定的模型

在这部分,我们会指定一个模型需要你来构建。这里的目标并不是为了得到好的性能(后面会需要),只是为了让你适应理解TensorFlow的文档以及配置你自己的模型。
用上面的代码作为指导,用相应的TensorFlow文档构建一个下面这样结构的模型:

这里的卷积,激活函数,全连接层都跟之前的代码相似。

下面的batch normalization部分,笔者借鉴了https://github.com/ry/tensorflow-resnet/blob/master/resnet.py 下面的batch normalization部分

这里的bath_normalization主要用到两个函数:
tf.nn.moments() 用来计算mean, variance
tf.nn.batchnormalization() 根据预先算好的mean和variance对数据进行batch norm.

另外,我们在课件中看到的beta和gamma,在tf.nn.batchnormalization中对应的分别是offset和scale,这点在文档中都有详细的说明。
值得注意的是,在测试中,我们用到的mean和variance并不是当前测试集batch的mean和variance,而应该是对训练集训练过程中逐步迭代获得的。我这里的逐步迭代是加入了decay,来用每次新的batch的mean和variance,更新一点全局的mean,variance。
另外,我们更新了全局的mean和variance,需要添加

tf.add_to_collection(tf.GraphKeys.UPDATE_OPS, update_moving_mean)
tf.add_to_collection(tf.GraphKeys.UPDATE_OPS, update_moving_variance)
  • 1
  • 2

这两个操作, 并且我们的train_step需要稍作修改:

# batch normalization in tensorflow requires this extra dependency
extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(extra_update_ops):
    train_step = optimizer.minimize(mean_loss)
  • 1
  • 2
  • 3
  • 4
from tensorflow.python.training import moving_averages
from tensorflow.python.ops import control_flow_ops
# clear old variables
# 清除旧变量
tf.reset_default_graph()

# define our input (e.g. the data that changes every batch)
# The first dim is None, and gets sets automatically based on batch size fed in
# 定义输入数据(如每轮迭代中都会改变的数据)
# 第一维是None,每次迭代时都会根据输入数据自动设定
X = tf.placeholder(tf.float32, [None, 32, 32, 3])
y = tf.placeholder(tf.int64, [None])
is_training = tf.placeholder(tf.bool)

# define model
# 定义模型
def complex_model(X,y,is_training):
# parameters
# 定义一些常量
MOVING_AVERAGE_DECAY = 0.9997
BN_DECAY = MOVING_AVERAGE_DECAY
BN_EPSILON = 0.001

<span class="token comment"># 7x7 Convolutional Layer with 32 filters and stride of 1</span>
<span class="token comment"># 7x7的卷积窗口,32个卷积核,步长为1</span>
Wconv1 <span class="token operator">=</span> tf<span class="token punctuation">.</span>get_variable<span class="token punctuation">(</span><span class="token string">"Wconv1"</span><span class="token punctuation">,</span> shape<span class="token operator">=</span><span class="token punctuation">[</span><span class="token number">7</span><span class="token punctuation">,</span> <span class="token number">7</span><span class="token punctuation">,</span> <span class="token number">3</span><span class="token punctuation">,</span> <span class="token number">32</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
bconv1 <span class="token operator">=</span> tf<span class="token punctuation">.</span>get_variable<span class="token punctuation">(</span><span class="token string">"bconv1"</span><span class="token punctuation">,</span> shape<span class="token operator">=</span><span class="token punctuation">[</span><span class="token number">32</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
h1 <span class="token operator">=</span> tf<span class="token punctuation">.</span>nn<span class="token punctuation">.</span>conv2d<span class="token punctuation">(</span>X<span class="token punctuation">,</span> Wconv1<span class="token punctuation">,</span> strides<span class="token operator">=</span><span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">,</span> padding<span class="token operator">=</span><span class="token string">'VALID'</span><span class="token punctuation">)</span> <span class="token operator">+</span> bconv1
<span class="token comment"># ReLU Activation Layer</span>
<span class="token comment"># ReLU激活层</span>
a1 <span class="token operator">=</span> tf<span class="token punctuation">.</span>nn<span class="token punctuation">.</span>relu<span class="token punctuation">(</span>h1<span class="token punctuation">)</span>  <span class="token comment"># a1的形状是 [batch_size, 26, 26, 32]</span>
<span class="token comment"># Spatial Batch Normalization Layer (trainable parameters, with scale and centering)</span>
<span class="token comment"># for so-called "global normalization", used with convolutional filters with shape [batch, height, width, depth],</span>
<span class="token comment"># 与全局标准化(global normalization)对应,这里的标准化过程我们称之为局部标准化(Spatial Batch Normalization)。记住,我们的卷积窗口大小是[batch, height, width, depth]</span>
<span class="token comment"># pass axes=[0,1,2]</span>
<span class="token comment"># 需要标准化的轴的索引是 axes = [0, 1, 2]</span>
axis <span class="token operator">=</span> <span class="token builtin">list</span><span class="token punctuation">(</span><span class="token builtin">range</span><span class="token punctuation">(</span><span class="token builtin">len</span><span class="token punctuation">(</span>a1<span class="token punctuation">.</span>get_shape<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token operator">-</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">)</span>  <span class="token comment"># axis = [0,1,2]</span>
mean<span class="token punctuation">,</span> variance <span class="token operator">=</span> tf<span class="token punctuation">.</span>nn<span class="token punctuation">.</span>moments<span class="token punctuation">(</span>a1<span class="token punctuation">,</span> axis<span class="token punctuation">)</span> <span class="token comment"># mean, variance for each feature map 求出每个卷积结果(feature map)的平均值,方差</span>

params_shape <span class="token operator">=</span> a1<span class="token punctuation">.</span>get_shape<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">:</span><span class="token punctuation">]</span>   <span class="token comment"># channel or depth 取出最后一维,即通道(channel)或叫深度(depth)</span>
<span class="token comment"># each feature map should have one beta and one gamma</span>
<span class="token comment"># 每一片卷积结果(feature map)都有一个beta值和一个gamma值</span>
beta <span class="token operator">=</span> tf<span class="token punctuation">.</span>get_variable<span class="token punctuation">(</span><span class="token string">'beta'</span><span class="token punctuation">,</span>
                     params_shape<span class="token punctuation">,</span>
                     initializer<span class="token operator">=</span>tf<span class="token punctuation">.</span>zeros_initializer<span class="token punctuation">)</span>

gamma <span class="token operator">=</span> tf<span class="token punctuation">.</span>get_variable<span class="token punctuation">(</span><span class="token string">'gamma'</span><span class="token punctuation">,</span>
                      params_shape<span class="token punctuation">,</span>
                      initializer<span class="token operator">=</span>tf<span class="token punctuation">.</span>ones_initializer<span class="token punctuation">)</span>

<span class="token comment"># mean and variance during trianing are recorded and saved as moving_mean and moving_variance</span>
<span class="token comment"># moving_mean and moving variance are used as mean and variance in testing.</span>
<span class="token comment"># 训练过程中得出的平均值和方差都被记录下来,并被用来计算移动平均值(moving_mean)和移动方差(moving_variance)</span>
<span class="token comment"># 移动平均值(moving_mean)和移动方差(moving_variance)将在预测阶段被使用</span>
moving_mean <span class="token operator">=</span> tf<span class="token punctuation">.</span>get_variable<span class="token punctuation">(</span><span class="token string">'moving_mean'</span><span class="token punctuation">,</span>
                            params_shape<span class="token punctuation">,</span>
                            initializer<span class="token operator">=</span>tf<span class="token punctuation">.</span>zeros_initializer<span class="token punctuation">,</span>
                            trainable<span class="token operator">=</span><span class="token boolean">False</span><span class="token punctuation">)</span>
moving_variance <span class="token operator">=</span> tf<span class="token punctuation">.</span>get_variable<span class="token punctuation">(</span><span class="token string">'moving_variance'</span><span class="token punctuation">,</span>
                                params_shape<span class="token punctuation">,</span>
                                initializer<span class="token operator">=</span>tf<span class="token punctuation">.</span>ones_initializer<span class="token punctuation">,</span>
                                trainable<span class="token operator">=</span><span class="token boolean">False</span><span class="token punctuation">)</span>

<span class="token comment"># update variable by variable * decay + value * (1 - decay)</span>
<span class="token comment"># 更新移动平均值和移动方差,更新方式是 variable * decay + value * (1 - decay)</span>
update_moving_mean <span class="token operator">=</span> moving_averages<span class="token punctuation">.</span>assign_moving_average<span class="token punctuation">(</span>moving_mean<span class="token punctuation">,</span>
                                                           mean<span class="token punctuation">,</span> BN_DECAY<span class="token punctuation">)</span>
update_moving_variance <span class="token operator">=</span> moving_averages<span class="token punctuation">.</span>assign_moving_average<span class="token punctuation">(</span>
    moving_variance<span class="token punctuation">,</span> variance<span class="token punctuation">,</span> BN_DECAY<span class="token punctuation">)</span>
tf<span class="token punctuation">.</span>add_to_collection<span class="token punctuation">(</span>tf<span class="token punctuation">.</span>GraphKeys<span class="token punctuation">.</span>UPDATE_OPS<span class="token punctuation">,</span> update_moving_mean<span class="token punctuation">)</span>
tf<span class="token punctuation">.</span>add_to_collection<span class="token punctuation">(</span>tf<span class="token punctuation">.</span>GraphKeys<span class="token punctuation">.</span>UPDATE_OPS<span class="token punctuation">,</span> update_moving_variance<span class="token punctuation">)</span>
 
mean<span class="token punctuation">,</span> variance <span class="token operator">=</span> control_flow_ops<span class="token punctuation">.</span>cond<span class="token punctuation">(</span>
    is_training<span class="token punctuation">,</span> <span class="token keyword">lambda</span><span class="token punctuation">:</span> <span class="token punctuation">(</span>mean<span class="token punctuation">,</span> variance<span class="token punctuation">)</span><span class="token punctuation">,</span>
    <span class="token keyword">lambda</span><span class="token punctuation">:</span> <span class="token punctuation">(</span>moving_mean<span class="token punctuation">,</span> moving_variance<span class="token punctuation">)</span><span class="token punctuation">)</span>


a1_b <span class="token operator">=</span> tf<span class="token punctuation">.</span>nn<span class="token punctuation">.</span>batch_normalization<span class="token punctuation">(</span>a1<span class="token punctuation">,</span> mean<span class="token punctuation">,</span> variance<span class="token punctuation">,</span> beta<span class="token punctuation">,</span> gamma<span class="token punctuation">,</span> BN_EPSILON<span class="token punctuation">)</span>
<span class="token comment"># 2x2 Max Pooling layer with a stride of 2</span>
<span class="token comment"># 2x2 的池化层,步长为2</span>
m1 <span class="token operator">=</span> tf<span class="token punctuation">.</span>nn<span class="token punctuation">.</span>max_pool<span class="token punctuation">(</span>a1_b<span class="token punctuation">,</span> ksize<span class="token operator">=</span><span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">2</span><span class="token punctuation">,</span><span class="token number">2</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">,</span> strides <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">2</span><span class="token punctuation">,</span><span class="token number">2</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">,</span> padding<span class="token operator">=</span><span class="token string">'VALID'</span><span class="token punctuation">)</span>
<span class="token comment"># shape of m1 should be batchsize * 26/2 * 26/2 * 32 = batchsize * 5408</span>
<span class="token comment"># Affine layer with 1024 output units</span>
<span class="token comment"># 池化后的结果m1的大小应为 batchsize * 26/2 * 26/2 * 32 = batchsize * 5408</span>
<span class="token comment"># 仿射层共输出2014个值</span>
m1_flat <span class="token operator">=</span> tf<span class="token punctuation">.</span>reshape<span class="token punctuation">(</span>m1<span class="token punctuation">,</span> <span class="token punctuation">[</span><span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">5408</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
W1 <span class="token operator">=</span> tf<span class="token punctuation">.</span>get_variable<span class="token punctuation">(</span><span class="token string">"W1"</span><span class="token punctuation">,</span> shape<span class="token operator">=</span><span class="token punctuation">[</span><span class="token number">5408</span><span class="token punctuation">,</span> <span class="token number">1024</span><span class="token punctuation">]</span><span class="token punctuation">)</span> 
b1 <span class="token operator">=</span> tf<span class="token punctuation">.</span>get_variable<span class="token punctuation">(</span><span class="token string">"b1"</span><span class="token punctuation">,</span> shape<span class="token operator">=</span><span class="token punctuation">[</span><span class="token number">1024</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
h2 <span class="token operator">=</span> tf<span class="token punctuation">.</span>matmul<span class="token punctuation">(</span>m1_flat<span class="token punctuation">,</span>W1<span class="token punctuation">)</span> <span class="token operator">+</span> b1 
<span class="token comment"># ReLU Activation Layer</span>
<span class="token comment"># ReLU激活层</span>
a2 <span class="token operator">=</span> tf<span class="token punctuation">.</span>nn<span class="token punctuation">.</span>relu<span class="token punctuation">(</span>h2<span class="token punctuation">)</span>
<span class="token comment"># Affine layer from 1024 input units to 10 outputs</span>
<span class="token comment"># 仿射层有1024个输入和10个输出</span>
W2 <span class="token operator">=</span> tf<span class="token punctuation">.</span>get_variable<span class="token punctuation">(</span><span class="token string">"W2"</span><span class="token punctuation">,</span> shape<span class="token operator">=</span><span class="token punctuation">[</span><span class="token number">1024</span><span class="token punctuation">,</span> <span class="token number">10</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
b2 <span class="token operator">=</span> tf<span class="token punctuation">.</span>get_variable<span class="token punctuation">(</span><span class="token string">"b2"</span><span class="token punctuation">,</span> shape<span class="token operator">=</span><span class="token punctuation">[</span><span class="token number">10</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
y_out <span class="token operator">=</span> tf<span class="token punctuation">.</span>matmul<span class="token punctuation">(</span>a2<span class="token punctuation">,</span>W2<span class="token punctuation">)</span> <span class="token operator">+</span> b2
<span class="token keyword">return</span> y_out

y_out = complex_model(X,y,is_training)

为了确保你做对了,用下面的工具来检查你的输出维度,应该是64 x 10。因为我们的batch size是64,仿射层的最后一个输出是10个神经元对应10个类。

# Now we're going to feed a random batch into the model 
# and make sure the output is the right size
# 现在我们随机输入一个batch进入模型,来验证一下输出的大小是否如预期
x = np.random.randn(64, 32, 32,3)
with tf.Session() as sess:
    with tf.device("/cpu:0"): #"/cpu:0" or "/gpu:0"
        tf.global_variables_initializer().run()
    ans <span class="token operator">=</span> sess<span class="token punctuation">.</span>run<span class="token punctuation">(</span>y_out<span class="token punctuation">,</span>feed_dict<span class="token operator">=</span><span class="token punctuation">{</span>X<span class="token punctuation">:</span>x<span class="token punctuation">,</span>is_training<span class="token punctuation">:</span><span class="token boolean">True</span><span class="token punctuation">}</span><span class="token punctuation">)</span>
    <span class="token operator">%</span>timeit sess<span class="token punctuation">.</span>run<span class="token punctuation">(</span>y_out<span class="token punctuation">,</span>feed_dict<span class="token operator">=</span><span class="token punctuation">{</span>X<span class="token punctuation">:</span>x<span class="token punctuation">,</span>is_training<span class="token punctuation">:</span><span class="token boolean">True</span><span class="token punctuation">}</span><span class="token punctuation">)</span>
    <span class="token keyword">print</span><span class="token punctuation">(</span>ans<span class="token punctuation">.</span>shape<span class="token punctuation">)</span>
    <span class="token keyword">print</span><span class="token punctuation">(</span>np<span class="token punctuation">.</span>array_equal<span class="token punctuation">(</span>ans<span class="token punctuation">.</span>shape<span class="token punctuation">,</span> np<span class="token punctuation">.</span>array<span class="token punctuation">(</span><span class="token punctuation">[</span><span class="token number">64</span><span class="token punctuation">,</span> <span class="token number">10</span><span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">)</span>

Out:

10 loops, best of 3: 118 ms per loop
(64, 10)
True
  • 1
  • 2
  • 3

You should see the following from the run above

(64, 10)

True

GPU!

现在我们要在GPU设备下试一下我们的模型,剩下的代码都保持不变,我们的变量和操作都会用加速的代码路径来执行。然而如果没有GPU,我们会有Python exception然后不得不重建我们的图。在一个双核的CPU上,你大概可以看到50-80毫秒一个batch, 如果用Google Cloud GPUs 应该在2-5毫秒每个batch。

笔者注: 以下代码笔者用了CPU实现,得到的结果也是CPU的,如果读者使用了GPU,可以忽略下面每一个batch得到的计算时间结果。

try:
    with tf.Session() as sess:
        with tf.device("/cpu:0") as dev: # 可以是"/cpu:0" 或 "/gpu:0"
            tf.global_variables_initializer().run()
        ans <span class="token operator">=</span> sess<span class="token punctuation">.</span>run<span class="token punctuation">(</span>y_out<span class="token punctuation">,</span>feed_dict<span class="token operator">=</span><span class="token punctuation">{</span>X<span class="token punctuation">:</span>x<span class="token punctuation">,</span>is_training<span class="token punctuation">:</span><span class="token boolean">True</span><span class="token punctuation">}</span><span class="token punctuation">)</span>
        <span class="token operator">%</span>timeit sess<span class="token punctuation">.</span>run<span class="token punctuation">(</span>y_out<span class="token punctuation">,</span>feed_dict<span class="token operator">=</span><span class="token punctuation">{</span>X<span class="token punctuation">:</span>x<span class="token punctuation">,</span>is_training<span class="token punctuation">:</span><span class="token boolean">True</span><span class="token punctuation">}</span><span class="token punctuation">)</span>

except tf.errors.InvalidArgumentError:
print(“no gpu found, please use Google Cloud if you want GPU acceleration”)
# rebuild the graph
# trying to start a GPU throws an exception
# and also trashes the original graph
tf.reset_default_graph()
X = tf.placeholder(tf.float32, [None, 32, 32, 3])
y = tf.placeholder(tf.int64, [None])
is_training = tf.placeholder(tf.bool)
y_out = complex_model(X,y,is_training)

Out:

10 loops, best of 3: 115 ms per loop
  • 1

你应该可以看到即使是一个简单的前向传播过程在GPU上面也极大的加快了速度。所以对于下面剩下的作业(构建assignment3以及你的project的模型的时候),你应该用GPU设备。 然而,对于tensorflow,默认的设备是GPU(如果有的话),没有GPU的情况下会自动使用CPU。所以从现在开始我们都可以跳过设备的指定部分。

训练模型

既然你已经看到怎么定义一个模型并进行前向传播,下面,我们来用你上面创建的复杂模型,在训练集上训练一轮(epoch)。

确保你明白下面的每一个TensorFlow函数(对应于你自定义的神经网络)是怎么用的。

首先,传建一个RMSprop优化器(用学习率为1e-3)和一个交叉熵损失函数。可以参考TensorFlow文档来找到更多的信息。

# Inputs 输入
#     y_out: is what your model computes 模型输出
#     y: is your TensorFlow variable with label information 数据的真实标签
# Outputs 输出
#    mean_loss: a TensorFlow variable (scalar) with numerical loss 损失函数均值
#    optimizer: a TensorFlow optimizer 优化器
# This should be ~3 lines of code! 大概需要约3行代码
total_loss = tf.nn.softmax_cross_entropy_with_logits(logits=y_out, labels=tf.one_hot(y,10))
mean_loss = tf.reduce_mean(total_loss)

# define our optimizer 定义优化器
optimizer = tf.train.RMSPropOptimizer(1e-3) # select optimizer and set learning rate 定义优化器和学习率

# batch normalization in tensorflow requires this extra dependency
# tensorflow中执行batchNorm需要这些额外的依赖
extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(extra_update_ops):
    train_step = optimizer.minimize(mean_loss)
  • 1
  • 2
  • 3
  • 4
  • 5

训练模型

下面我们创建一个session,并且在一个epoch上训练模型。你应该可以看到loss在1.4到2.0之间,准确率在0.4-0.5之间。由于初始化和随机种子的不同,具体值可能会有一些变化。

sess = tf.Session()

sess.run(tf.global_variables_initializer())
print(‘Training’)
run_model(sess,y_out,mean_loss,X_train,y_train,1,64,100,train_step)

Out:

Training
Iteration 0: with minibatch training loss = 3.39 and accuracy of 0.078
Iteration 100: with minibatch training loss = 3.18 and accuracy of 0.14
Iteration 200: with minibatch training loss = 1.78 and accuracy of 0.41
Iteration 300: with minibatch training loss = 1.86 and accuracy of 0.39
Iteration 400: with minibatch training loss = 1.32 and accuracy of 0.48
Iteration 500: with minibatch training loss = 1.2 and accuracy of 0.66
Iteration 600: with minibatch training loss = 1.27 and accuracy of 0.59
Iteration 700: with minibatch training loss = 1.32 and accuracy of 0.48
Epoch 1, Overall loss = 1.67 and accuracy of 0.452
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10

Out:

(1.6708081902873759, 0.45230612244897961)
  • 1

查看模型的精确度

让我们看一下训练和测试代码 – 在下面你自己建的模型中,可以随意使用这些代码来评估模型。你应该可以看到loss在1.3-2.0之间,准确率是0.45到0.55之间。

print('Validation')
run_model(sess,y_out,mean_loss,X_val,y_val,1,64)
  • 1
  • 2

Out:

Validation
Epoch 1, Overall loss = 1.44 and accuracy of 0.538
  • 1
  • 2

Out:

(1.4403997488021851, 0.53800000000000003)
  • 1

现在你可以实验不同的结构,高参,损失函数和优化器来训练一个模型,能够在CIFAR-10上得到大于等于70%的准确率,你可以用上面的run_model函数。

你可以尝试的

训练建议

对于每个你尝试的网络架构,你应该调整学习率和正则化强度,这么做的话有一些很重要的东西需要记住:
-如果参数设置的很好,你应该可以在几百个迭代中就看到提升
-对于超参的选择,要记住由粗到精的方法,从一个很大范围的超参开始,通过迭代来找到那些表现不错的参数组合。
-一旦你发现了几组似乎有效的参数,在这些参数附近再进一步搜索。这时你也许会需要训练更多的轮数(epochs).
-你应该用验证集来找超参,我们会用你在验证集上找到的最好的参数来测试测试集,从而来评估你的模型表现。

除此以外

如果你比较爱冒险,还有很多其他特征你可以尝试来提升你的模型性能。你并不一定需要实现下面的全部内容,不过尝试实现它们可以获得额外的加分。

如果你决定实现一些其他的东西,请在下面的"Extra Credict Description"中叙述一下。

我们期望的

最起码,你应该可以训练出一个ConNet在验证集上得到至少70%的准确率,这只是一个最低界限。

你应该用下面的空间来做实验并训练你的模型。这个Notebook中的最后一个cell应该包含了你的模型在训练集和验证集的准确率。

开开心心地训练吧!

# Feel free to play with this cell
# 这里的代码可以随意把玩

def my_model(X,y,is_training):
def conv_relu_pool(X, num_filter=32, conv_strides = 1, kernel_size=[3,3], pool_size=[2,2], pool_strides = 2):
conv1 = tf.layers.conv2d(inputs=X, filters=num_filter, kernel_size=kernel_size, stides = conv_strides, padding=“same”, activation=tf.nn.relu)
pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=pool_size, strides = pool_strides)
return pool1

<span class="token keyword">def</span> <span class="token function">conv_relu_conv_relu_pool</span><span class="token punctuation">(</span>X<span class="token punctuation">,</span> num_filter1<span class="token operator">=</span><span class="token number">32</span><span class="token punctuation">,</span> num_filter2<span class="token operator">=</span><span class="token number">32</span><span class="token punctuation">,</span> conv_strides <span class="token operator">=</span> <span class="token number">1</span><span class="token punctuation">,</span> kernel_size<span class="token operator">=</span><span class="token punctuation">[</span><span class="token number">5</span><span class="token punctuation">,</span><span class="token number">5</span><span class="token punctuation">]</span><span class="token punctuation">,</span>pool_size<span class="token operator">=</span><span class="token punctuation">[</span><span class="token number">2</span><span class="token punctuation">,</span><span class="token number">2</span><span class="token punctuation">]</span><span class="token punctuation">,</span> pool_strides <span class="token operator">=</span> <span class="token number">2</span><span class="token punctuation">)</span><span class="token punctuation">:</span>
    conv1 <span class="token operator">=</span> tf<span class="token punctuation">.</span>layers<span class="token punctuation">.</span>conv2d<span class="token punctuation">(</span>inputs<span class="token operator">=</span>X<span class="token punctuation">,</span>filters<span class="token operator">=</span>num_filter1<span class="token punctuation">,</span>kernel_size<span class="token operator">=</span>kernel_size<span class="token punctuation">,</span> strides<span class="token operator">=</span>conv_strides<span class="token punctuation">,</span> padding<span class="token operator">=</span><span class="token string">"same"</span><span class="token punctuation">,</span>activation<span class="token operator">=</span>tf<span class="token punctuation">.</span>nn<span class="token punctuation">.</span>relu<span class="token punctuation">)</span>
    conv2 <span class="token operator">=</span> tf<span class="token punctuation">.</span>layers<span class="token punctuation">.</span>conv2d<span class="token punctuation">(</span>inputs<span class="token operator">=</span>conv1<span class="token punctuation">,</span>filters<span class="token operator">=</span>num_filter2<span class="token punctuation">,</span>kernel_size<span class="token operator">=</span>kernel_size<span class="token punctuation">,</span> strides<span class="token operator">=</span>conv_strides<span class="token punctuation">,</span> padding<span class="token operator">=</span><span class="token string">"same"</span><span class="token punctuation">,</span>activation<span class="token operator">=</span>tf<span class="token punctuation">.</span>nn<span class="token punctuation">.</span>relu<span class="token punctuation">)</span>
    <span class="token comment"># Pooling Layer #1</span>
    pool1 <span class="token operator">=</span> tf<span class="token punctuation">.</span>layers<span class="token punctuation">.</span>max_pooling2d<span class="token punctuation">(</span>inputs<span class="token operator">=</span>conv2<span class="token punctuation">,</span> pool_size<span class="token operator">=</span>pool_size<span class="token punctuation">,</span> strides<span class="token operator">=</span>pool_strides<span class="token punctuation">)</span>
    <span class="token keyword">return</span> pool1

<span class="token keyword">def</span> <span class="token function">affline</span><span class="token punctuation">(</span>X<span class="token punctuation">,</span> num_units<span class="token punctuation">,</span> act<span class="token punctuation">)</span><span class="token punctuation">:</span>
    <span class="token keyword">return</span> tf<span class="token punctuation">.</span>layers<span class="token punctuation">.</span>dense<span class="token punctuation">(</span>inputs<span class="token operator">=</span>X<span class="token punctuation">,</span> units<span class="token operator">=</span>num_units<span class="token punctuation">,</span> activation<span class="token operator">=</span>act<span class="token punctuation">)</span>

<span class="token keyword">def</span> <span class="token function">batchnorm_relu_conv</span><span class="token punctuation">(</span>X<span class="token punctuation">,</span> num_filters<span class="token operator">=</span><span class="token number">32</span><span class="token punctuation">,</span> conv_strides <span class="token operator">=</span> <span class="token number">2</span><span class="token punctuation">,</span> kernel_size<span class="token operator">=</span><span class="token punctuation">[</span><span class="token number">5</span><span class="token punctuation">,</span><span class="token number">5</span><span class="token punctuation">]</span><span class="token punctuation">,</span> is_training<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">)</span><span class="token punctuation">:</span>
    bat1 <span class="token operator">=</span> tf<span class="token punctuation">.</span>layers<span class="token punctuation">.</span>batch_normalization<span class="token punctuation">(</span>X<span class="token punctuation">,</span> training<span class="token operator">=</span>is_training<span class="token punctuation">)</span>
    act1 <span class="token operator">=</span> tf<span class="token punctuation">.</span>nn<span class="token punctuation">.</span>relu<span class="token punctuation">(</span>bat1<span class="token punctuation">)</span>
    <span class="token comment">#conv1 = tf.layers.conv2d(inputs=act1, filters=num_filters, </span>
    <span class="token comment">#                         kernel_size = kernel_size, strides = 2, padding="same", activation=None,</span>
    <span class="token comment">#                         kernel_regularizer=tf.contrib.layers.l2_regularizer(scale=0.1),</span>
    <span class="token comment">#                         bias_regularizer=tf.contrib.layers.l2_regularizer(scale=0.1))</span>
    conv1 <span class="token operator">=</span> tf<span class="token punctuation">.</span>layers<span class="token punctuation">.</span>conv2d<span class="token punctuation">(</span>inputs<span class="token operator">=</span>act1<span class="token punctuation">,</span> filters<span class="token operator">=</span>num_filters<span class="token punctuation">,</span> 
                            kernel_size <span class="token operator">=</span> kernel_size<span class="token punctuation">,</span> strides <span class="token operator">=</span> <span class="token number">2</span><span class="token punctuation">,</span> padding<span class="token operator">=</span><span class="token string">"same"</span><span class="token punctuation">,</span> activation<span class="token operator">=</span><span class="token boolean">None</span><span class="token punctuation">)</span> <span class="token comment"># without regularization</span>
              
    <span class="token keyword">return</span> conv1

N <span class="token operator">=</span> <span class="token number">3</span> <span class="token comment"># num of conv blocks</span>
M <span class="token operator">=</span> <span class="token number">1</span> <span class="token comment"># num of affine </span>
conv <span class="token operator">=</span> tf<span class="token punctuation">.</span>layers<span class="token punctuation">.</span>conv2d<span class="token punctuation">(</span>inputs <span class="token operator">=</span> X<span class="token punctuation">,</span> filters<span class="token operator">=</span><span class="token number">64</span><span class="token punctuation">,</span> kernel_size<span class="token operator">=</span><span class="token punctuation">[</span><span class="token number">5</span><span class="token punctuation">,</span><span class="token number">5</span><span class="token punctuation">]</span><span class="token punctuation">,</span> strides<span class="token operator">=</span><span class="token number">1</span><span class="token punctuation">,</span> padding<span class="token operator">=</span><span class="token string">"same"</span><span class="token punctuation">,</span> activation<span class="token operator">=</span><span class="token boolean">None</span><span class="token punctuation">)</span>


<span class="token keyword">for</span> i <span class="token keyword">in</span> <span class="token builtin">range</span><span class="token punctuation">(</span>N<span class="token punctuation">)</span><span class="token punctuation">:</span>
    <span class="token keyword">print</span><span class="token punctuation">(</span>conv<span class="token punctuation">.</span>get_shape<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
    conv <span class="token operator">=</span> batchnorm_relu_conv<span class="token punctuation">(</span>conv<span class="token punctuation">,</span> is_training<span class="token operator">=</span>is_training<span class="token punctuation">)</span>
    <span class="token comment">#conv = conv_relu_conv_relu_pool(conv)</span>
     
<span class="token keyword">print</span><span class="token punctuation">(</span>conv<span class="token punctuation">.</span>get_shape<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
global_average_shape <span class="token operator">=</span> conv<span class="token punctuation">.</span>get_shape<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">:</span><span class="token number">3</span><span class="token punctuation">]</span> <span class="token comment"># 4,4</span>

<span class="token comment"># just flatten the output</span>
<span class="token comment">#avg_layer = tf.reshape(conv,(-1,512))</span>

<span class="token comment"># global average pooling method 1</span>
<span class="token comment">#avg_layer = tf.layers.average_pooling2d(conv,(global_average_shape,global_average_shape),padding='valid')</span>
<span class="token comment">#avg_layer = tf.squeeze(avg_layer, axis=[1,2]) # remove all 1 axis</span>

<span class="token comment"># global average  pooling method 2</span>
avg_layer <span class="token operator">=</span> tf<span class="token punctuation">.</span>reduce_mean<span class="token punctuation">(</span>conv<span class="token punctuation">,</span> <span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">2</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token comment"># the same as global max pooling</span>

<span class="token keyword">print</span><span class="token punctuation">(</span>avg_layer<span class="token punctuation">.</span>get_shape<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span>

fc <span class="token operator">=</span> avg_layer
<span class="token comment">#keep_prob = tf.constant(0.5)</span>
<span class="token keyword">for</span> i <span class="token keyword">in</span> <span class="token builtin">range</span><span class="token punctuation">(</span>M<span class="token punctuation">)</span><span class="token punctuation">:</span>
    fc <span class="token operator">=</span> affline<span class="token punctuation">(</span>fc<span class="token punctuation">,</span><span class="token number">100</span><span class="token punctuation">,</span>tf<span class="token punctuation">.</span>nn<span class="token punctuation">.</span>relu<span class="token punctuation">)</span>
    <span class="token comment">#fc = tf.nn.dropout(fc, keep_prob)</span>
    
fc <span class="token operator">=</span> affline<span class="token punctuation">(</span>fc<span class="token punctuation">,</span> <span class="token number">10</span><span class="token punctuation">,</span> <span class="token boolean">None</span><span class="token punctuation">)</span>

<span class="token keyword">return</span> fc    

tf.reset_default_graph()

X = tf.placeholder(tf.float32, [None, 32, 32, 3])
y = tf.placeholder(tf.int64, [None])
is_training = tf.placeholder(tf.bool)

y_out = my_model(X,y,is_training)
total_loss = tf.nn.softmax_cross_entropy_with_logits(logits=y_out, labels=tf.one_hot(y,10))
mean_loss = tf.reduce_mean(total_loss)

global_step = tf.Variable(0, trainable=False, name=“Global_Step”)
starter_learning_rate = 1e-2
learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step,
750, 0.96, staircase=True)

#learning_rate = starter_learning_rate
# define our optimizer
optimizer = tf.train.AdamOptimizer(learning_rate) # select optimizer and set learning rate

# batch normalization in tensorflow requires this extra dependency
extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(extra_update_ops):
train_step = optimizer.minimize(mean_loss, global_step=global_step)

print([x.name for x in tf.global_variables()])

Out:

(?, 32, 32, 64)
(?, 16, 16, 32)
(?, 8, 8, 32)
(?, 4, 4, 32)
(?, 32)
['conv2d/kernel:0', 'conv2d/bias:0', 'batch_normalization/beta:0', 'batch_normalization/gamma:0', 'batch_normalization/moving_mean:0', 'batch_normalization/moving_variance:0', 'conv2d_1/kernel:0', 'conv2d_1/bias:0', 'batch_normalization_1/beta:0', 'batch_normalization_1/gamma:0', 'batch_normalization_1/moving_mean:0', 'batch_normalization_1/moving_variance:0', 'conv2d_2/kernel:0', 'conv2d_2/bias:0', 'batch_normalization_2/beta:0', 'batch_normalization_2/gamma:0', 'batch_normalization_2/moving_mean:0', 'batch_normalization_2/moving_variance:0', 'conv2d_3/kernel:0', 'conv2d_3/bias:0', 'dense/kernel:0', 'dense/bias:0', 'dense_1/kernel:0', 'dense_1/bias:0', 'Global_Step:0', 'beta1_power:0', 'beta2_power:0', 'conv2d/kernel/Adam:0', 'conv2d/kernel/Adam_1:0', 'conv2d/bias/Adam:0', 'conv2d/bias/Adam_1:0', 'batch_normalization/beta/Adam:0', 'batch_normalization/beta/Adam_1:0', 'batch_normalization/gamma/Adam:0', 'batch_normalization/gamma/Adam_1:0', 'conv2d_1/kernel/Adam:0', 'conv2d_1/kernel/Adam_1:0', 'conv2d_1/bias/Adam:0', 'conv2d_1/bias/Adam_1:0', 'batch_normalization_1/beta/Adam:0', 'batch_normalization_1/beta/Adam_1:0', 'batch_normalization_1/gamma/Adam:0', 'batch_normalization_1/gamma/Adam_1:0', 'conv2d_2/kernel/Adam:0', 'conv2d_2/kernel/Adam_1:0', 'conv2d_2/bias/Adam:0', 'conv2d_2/bias/Adam_1:0', 'batch_normalization_2/beta/Adam:0', 'batch_normalization_2/beta/Adam_1:0', 'batch_normalization_2/gamma/Adam:0', 'batch_normalization_2/gamma/Adam_1:0', 'conv2d_3/kernel/Adam:0', 'conv2d_3/kernel/Adam_1:0', 'conv2d_3/bias/Adam:0', 'conv2d_3/bias/Adam_1:0', 'dense/kernel/Adam:0', 'dense/kernel/Adam_1:0', 'dense/bias/Adam:0', 'dense/bias/Adam_1:0', 'dense_1/kernel/Adam:0', 'dense_1/kernel/Adam_1:0', 'dense_1/bias/Adam:0', 'dense_1/bias/Adam_1:0']
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
# Feel free to play with this cell
# This default code creates a session
# and trains your model for 10 epochs
# then prints the validation set accuracy
#sess = tf.Session()
#sess.run(tf.global_variables_initializer())
print('Training')
run_model(sess,y_out,mean_loss,X_train,y_train,2,64
          ,100,train_step,True)
print('Validation')
run_model(sess,y_out,mean_loss,X_val,y_val,1,64)

# 下面的loss是我预先跑了5个epoch之后,又跑了两个epoch

Out:

Training
Iteration 0: with minibatch training loss = 0.614 and accuracy of 0.81
Iteration 100: with minibatch training loss = 0.653 and accuracy of 0.77
Iteration 200: with minibatch training loss = 0.852 and accuracy of 0.75
Iteration 300: with minibatch training loss = 0.868 and accuracy of 0.75
Iteration 400: with minibatch training loss = 0.517 and accuracy of 0.81
Iteration 500: with minibatch training loss = 0.744 and accuracy of 0.69
Iteration 600: with minibatch training loss = 0.547 and accuracy of 0.78
Iteration 700: with minibatch training loss = 0.692 and accuracy of 0.75
Epoch 1, Overall loss = 0.714 and accuracy of 0.745
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10

png

Iteration 800: with minibatch training loss = 0.533 and accuracy of 0.8
Iteration 900: with minibatch training loss = 0.907 and accuracy of 0.69
Iteration 1000: with minibatch training loss = 0.595 and accuracy of 0.73
Iteration 1100: with minibatch training loss = 0.518 and accuracy of 0.83
Iteration 1200: with minibatch training loss = 0.837 and accuracy of 0.73
Iteration 1300: with minibatch training loss = 0.723 and accuracy of 0.73
Iteration 1400: with minibatch training loss = 0.923 and accuracy of 0.67
Iteration 1500: with minibatch training loss = 0.612 and accuracy of 0.8
Epoch 2, Overall loss = 0.656 and accuracy of 0.768
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

png

Validation
Epoch 1, Overall loss = 0.901 and accuracy of 0.708
  • 1
  • 2

Out:

(0.9011877479553223, 0.70799999999999996)
  • 1
# Test your model here, and make sure 
# the output of this cell is the accuracy
# of your best model on the training and val sets
# We're looking for >= 70% accuracy on Validation
# 在这里测试你的模型,确保本cell的输出是你的模型在训练集和验证集上最好的准确度
# 验证集的准确度应该在70%以上
print('Training')
run_model(sess,y_out,mean_loss,X_train,y_train,1,64)
print('Validation')

run_model(sess,y_out,mean_loss,X_val,y_val,1,64)

Out:

Training
Epoch 1, Overall loss = 0.607 and accuracy of 0.783
Validation
Epoch 1, Overall loss = 0.901 and accuracy of 0.708
  • 1
  • 2
  • 3
  • 4

Out:

(0.90118774318695072, 0.70799999999999996)
  • 1

在这里写一下你都做了些什么吧

在这里讲述一下你做了神马,以及你实现的额外的特性,以及任何你用来训练和评估你的神经网络的可视化图

笔者简单的实现了上面要求中的几个块,分别试了一下效果,以及用了一下learning rate decay。建议读者可以尝试更多的组合,多查阅官方文档来加深对tensorflow的理解。另外在建模型的时候可以把每一步的结果的shape打印出来,从而对模型每一步的输出有个概念。如果训练的过程中遇到问题,可以先用tensorflow的官方文档上的cifar模型结构来运行一下,看看是否可以调通。

测试集-我们只测一次

既然我们已经有一个我们觉得还不错的结果,那我们需要把最后的模型放到测试集上。这就是我们最后会在比赛上得到的结果,根据这个结果,思考一下,这个结果和你的验证集准确率比起来如何。

print('Test')
run_model(sess,y_out,mean_loss,X_test,y_test,1,64)
  • 1
  • 2

Out:

Test
Epoch 1, Overall loss = 0.899 and accuracy of 0.696
  • 1
  • 2

Out:

(0.89940066184997558, 0.69550000000000001)
  • 1

我们还会用TensorFlow做更多事情

后面的作业都会依赖Tensorflow,你也许会发现它对你的项目也很有帮助。

加分内容说明

如果你实现了额外的一些特性来获得加分,请在这里指明代码或者其它文件的位置。

                                </div>
            <link href="https://csdnimg.cn/release/phoenix/mdeditor/markdown_views-e9f16cbbc2.css" rel="stylesheet">
                </div>

标签:loss,training,CS231n,斯坦福,batch,shape,tf,tensorflow,mean
来源: https://blog.csdn.net/u013771019/article/details/102724490