TensorFlow多GPU训练
作者:互联网
文章目录
问题描述
单机多GPU训练,多机请自行查阅参考文献
解决方案
使用 tf.distribute.MirroredStrategy
的原理:
- 训练开始前,该策略在 N 个 GPU 上复制一份完整模型
- 每次训练传入一个批次数据时,将数据分成 N 份,分别传入 N 个 GPU
- N 个 GPU 使用本地变量分别计算自己那部分数据的梯度
- 使用分布式计算的 All-reduce 操作,在 GPU 间高效交换梯度数据并进行求和
- 使用梯度求和的结果更新本地变量
- 当所有设备均更新本地变量后,进行下一轮训练
- 默认情况下,TensorFlow 中的
MirroredStrategy
策略使用 NVIDIA NCCL 进行 All-reduce 操作。
安装
pip install tensorflow-datasets --upgrade
使用前
import tensorflow as tf
import tensorflow_datasets as tfds
def resize(image, label):
"""图像预处理"""
image = tf.image.resize(image, [224, 224]) / 255.0
return image, label
batch_size = 64
dataset = tfds.load('cats_vs_dogs', split=tfds.Split.TRAIN, as_supervised=True)
dataset = dataset.map(resize).shuffle(1024).batch(batch_size)
model = tf.keras.applications.MobileNetV2(weights=None, classes=2)
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
loss=tf.keras.losses.sparse_categorical_crossentropy,
metrics=[tf.keras.metrics.sparse_categorical_accuracy]
)
model.fit(dataset, epochs=5)
# Epoch 1/5
# 364/364 [==============================] - 110s 303ms/step - loss: 0.6229 - sparse_categorical_accuracy: 0.6500
# Epoch 2/5
# 364/364 [==============================] - 111s 305ms/step - loss: 0.4781 - sparse_categorical_accuracy: 0.7690
# Epoch 3/5
# 364/364 [==============================] - 110s 301ms/step - loss: 0.3919 - sparse_categorical_accuracy: 0.8202
# Epoch 4/5
# 364/364 [==============================] - 113s 311ms/step - loss: 0.3171 - sparse_categorical_accuracy: 0.8602
# Epoch 5/5
# 364/364 [==============================] - 113s 311ms/step - loss: 0.2532 - sparse_categorical_accuracy: 0.8919
使用后
import tensorflow as tf
import tensorflow_datasets as tfds
def resize(image, label):
"""图像预处理"""
image = tf.image.resize(image, [224, 224]) / 255.0
return image, label
strategy = tf.distribute.MirroredStrategy()
batch_size = 64 * strategy.num_replicas_in_sync # 批次大小×设备数量
dataset = tfds.load('cats_vs_dogs', split=tfds.Split.TRAIN, as_supervised=True)
dataset = dataset.map(resize).shuffle(1024).batch(batch_size)
with strategy.scope():
model = tf.keras.applications.MobileNetV2(weights=None, classes=2)
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
loss=tf.keras.losses.sparse_categorical_crossentropy,
metrics=[tf.keras.metrics.sparse_categorical_accuracy]
)
print('Number of devices: {}'.format(strategy.num_replicas_in_sync))
model.fit(dataset, epochs=5)
# Number of devices: 4
# Epoch 1/5
# 91/91 [==============================] - 35s 390ms/step - loss: 0.6459 - sparse_categorical_accuracy: 0.6374
# Epoch 2/5
# 91/91 [==============================] - 34s 377ms/step - loss: 0.5499 - sparse_categorical_accuracy: 0.7225
# Epoch 3/5
# 91/91 [==============================] - 34s 373ms/step - loss: 0.4560 - sparse_categorical_accuracy: 0.7826
# Epoch 4/5
# 91/91 [==============================] - 35s 382ms/step - loss: 0.3811 - sparse_categorical_accuracy: 0.8285
# Epoch 5/5
# 91/91 [==============================] - 34s 379ms/step - loss: 0.3274 - sparse_categorical_accuracy: 0.8558
参考文献
标签:loss,训练,categorical,sparse,GPU,tf,TensorFlow,364,accuracy 来源: https://blog.csdn.net/lly1122334/article/details/118931338