其他分享
首页 > 其他分享> > one of the variables needed for gradient computation has been modified by an inplace operation

one of the variables needed for gradient computation has been modified by an inplace operation

作者:互联网

记录一个pytorch多卡训练遇到的bug
报错如下:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512]] is at version 30; expected version 29 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

这个是多卡训练时候遇到的,单卡是一切正常的

先按网上的提示,在报错的代码前加上with torch.autograd.set_detect_anomaly(True):语句,之后它会把挂掉时候的栈显示出来,我的打出来是在batchNorm那里出的问题

搜索得到一个方案:https://discuss.pytorch.org/t/ddp-sync-batch-norm-gradient-computation-modified/82847/5

解决方法就是在DDP那里加上一个broadcast_buffers=False参数

标签:gradient,variables,torch,modified,报错,computation,operation
来源: https://www.cnblogs.com/jiading/p/14842397.html