首页 > 其他分享> > RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1556653215914/work/torch/lib/c10d/ProcessG
RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1556653215914/work/torch/lib/c10d/ProcessG
作者:互联网
pytorch dist 分布式训练 报错
dist.init_process_group(
backend="nccl",
init_method="file://./sharefile",
world_size=3,
rank=rank,
timeout=datetime.timedelta(seconds=300))
RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1556653215914/work/torch/lib/c10d/ProcessGroupNCCL.cpp:272, unhandled system error
文件共享的方法每次需要重新删除sharefile文件,但是我删了在tmux里还是有这个错。
解决方法:
重新开一个tmux
标签:NCCL,opt,dist,RuntimeError,torch,pytorch,conda,error 来源: https://blog.csdn.net/Rlin_by/article/details/117827671