首页 > 其他分享> > Pytorch训练时显存分配过程探究

Pytorch训练时显存分配过程探究

2021-11-19 11:32:19 作者：互联网

参考：

https://blog.csdn.net/qq_37189298/article/details/110945128

========================================

代码：

import torch 
from torch import cuda 
import time
 
x = torch.zeros([1,1024,1024,128*2], requires_grad=True, device='cuda:0') 

print("1", cuda.memory_allocated()/1024**2)  

y = 5 * x 
# y.retain_grad()
print("2", cuda.memory_allocated()/1024**2)  


torch.mean(y).backward()     
print("3", cuda.memory_allocated()/1024**2)    
print(cuda.memory_summary())


time.sleep(60)

可以看到pytorch占显存共4777MB空间，其中变量及缓存共占4096空间。可以知道其中1024MB空间为缓存，可以手动释放，改代码：

import torch 
from torch import cuda 
import time
 
x = torch.zeros([1,1024,1024,128*2], requires_grad=True, device='cuda:0') 

print("1", cuda.memory_allocated()/1024**2)  

y = 5 * x 
# y.retain_grad()
print("2", cuda.memory_allocated()/1024**2)  


torch.mean(y).backward()     
print("3", cuda.memory_allocated()/1024**2)    


torch.cuda.empty_cache()
print(cuda.memory_summary())



time.sleep(60)

根据参考文章可知，1024*3MB是变量内存，其余700MB为其他内存，其中变量内存中有1024为x.grad，而且程序运行过程中显存分配峰值为4096MB，如下图：

其中包括 x.grad 和 y.grad 各1024MB空间。

如果保存非叶子节点的grad值，即保存y.grad，运行：

import torch 
from torch import cuda 
import time
 
x = torch.zeros([1,1024,1024,128*2], requires_grad=True, device='cuda:0') 

print("1", cuda.memory_allocated()/1024**2)  

y = 5 * x 
y.retain_grad()
print("2", cuda.memory_allocated()/1024**2)  


torch.mean(y).backward()     
print("3", cuda.memory_allocated()/1024**2)    


torch.cuda.empty_cache()
print(cuda.memory_summary())



time.sleep(60)

发现显存不够用了，也就是说保存y.grad后整体显存已经快达到5.9GB了，于是相同代码再Titan上运行：

发现总显存：

运行结果：

================================================

标签：显存,1024,torch,探究,Pytorch,cuda,memory,print,grad
来源： https://www.cnblogs.com/devilmaycry812839668/p/15576357.html