RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found
作者:互联网
今天用GPU跑的时候显示:RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!
pg = ProcessGroupNCCL(prefix_store, rank, world_size, pg_options)
RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!
这个错误一开始让我怀疑起了这个GPU是不是没用了,-_-||,但是实验室里的小伙伴确信gpu没问题!然后就开始了bug排查之旅...
这时在命令行查看的时候终于现出了它的马脚,估计是pytorch出现了问题,害!
>>> import torch
>>> print(torch.cuda.is_available())
/home/xutianjiao/anaconda3/envs/py36/lib/python3.6/site-packages/torch/cuda/__init__.py:80: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 9020). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:112.)
return torch._C._cuda_getDeviceCount() > 0
False
>>> print(torch.cuda.get_device_name(0))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/xutianjiao/anaconda3/envs/py36/lib/python3.6/site-packages/torch/cuda/__init__.py", line 326, in get_device_name
return get_device_properties(device).name
File "/home/xutianjiao/anaconda3/envs/py36/lib/python3.6/site-packages/torch/cuda/__init__.py", line 356, in get_device_properties
_lazy_init() # will define _get_device_properties
File "/home/xutianjiao/anaconda3/envs/py36/lib/python3.6/site-packages/torch/cuda/__init__.py", line 214, in _lazy_init
torch._C._cuda_init()
RuntimeError: The NVIDIA driver on your system is too old (found version 9020). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.
查了下这个错误,显示cuda和torch的版本不匹配。
再检查了pytorch的版本,1.10+,好的,那就安装低版本torch试试!
pip install torch==1.7.0
大功告成!
标签:ProcessGroupNCCL,no,torch,driver,init,version,cuda,GPUs,your 来源: https://blog.csdn.net/xu823508091/article/details/122340013