pytorch训练模型时,因为broadcast机制踩的坑:loss无法收敛
作者:互联网
在训练一个非常简单的拟合曲线的模型时,遇到下面的报错:
C:/Users/user/Desktop/test/test.py:58: UserWarning: Using a target size (torch.Size([30000])) that is different to the input size (torch.Size([30000, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
loss = F.mse_loss(out, yt)
mse_loss的两个输入参数input和target维度不一样,一个是torch.Size([30000, 1],一个是torch.Size([30000],这两个维度在进行计算时都会将维度广播成torch.Size([30000, 30000],这个时候就出问题了:
mse_loss函数最终输出值是(target-input)每个元素数字平方和除以width x height,也就是在batch和特征维度上都做了平均。但这两个维度在进行广播之后tensor会变成下面这样:
# 原tensor:yt
yt tensor([ 96.6252, -83.4613, -1.6751, ..., 1.8656, -15.8007, -30.5789]) torch.Size([30000])
# 广播之后:yt
yt tensor([[ 96.6252, -83.4613, -1.6751, ..., 1.8656, -15.8007, -30.5789],
[ 96.6252, -83.4613, -1.6751, ..., 1.8656, -15.8007, -30.5789],
[ 96.6252, -83.4613, -1.6751, ..., 1.8656, -15.8007, -30.5789],
...,
[ 96.6252, -83.4613, -1.6751, ..., 1.8656, -15.8007, -30.5789],
[ 96.6252, -83.4613, -1.6751, ..., 1.8656, -15.8007, -30.5789],
[ 96.6252, -83.4613, -1.6751, ..., 1.8656, -15.8007, -30.5789]])
###############################################################################
# 原tensor:out
out tensor([[62.2171],
[34.9442],
[92.2927],
...,
[16.6877],
[35.8723],
[60.5973]], grad_fn=<MmBackward>) torch.Size([30000, 1])
# 广播之后:out
out tensor([[62.2171, 62.2171, 62.2171, ..., 62.2171, 62.2171, 62.2171],
[34.9442, 34.9442, 34.9442, ..., 34.9442, 34.9442, 34.9442],
[92.2927, 92.2927, 92.2927, ..., 92.2927, 92.2927, 92.2927],
...,
[16.6877, 16.6877, 16.6877, ..., 16.6877, 16.6877, 16.6877],
[35.8723, 35.8723, 35.8723, ..., 35.8723, 35.8723, 35.8723],
[60.5973, 60.5973, 60.5973, ..., 60.5973, 60.5973, 60.5973]],
grad_fn=<ExpandBackward>)
这样在进行逐元素计算时,结果就和预想的不一样了,计算出来的loss就发生了变化
标签:loss,60.5973,30000,...,torch,broadcast,pytorch,16.6877,96.6252 来源: https://blog.csdn.net/weixin_42149550/article/details/117373799