其他分享
首页 > 其他分享> > Pytorch报错 CUDA error: device-side assert triggered

Pytorch报错 CUDA error: device-side assert triggered

作者:互联网

错误信息:

RuntimeError: CUDA error: device-side assert triggered
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [1,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [2,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [4,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [5,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [8,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [14,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [15,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [16,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [17,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [18,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [0,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [1,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [2,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [4,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [5,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [8,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [11,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [14,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [15,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [16,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [19,0,0] Assertion `t >= 0 && t < n_classes` failed.
terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: device-side assert triggered
Exception raised from create_event_internal at /pytorch/c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7fcb23d5c8b2 in /home/*/anaconda3/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xad2 (0x7fcb23fae952 in /home/*/anaconda3/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7fcb23d47b7d in /home/*/anaconda3/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #3: <unknown function> + 0x5ff5ba (0x7fcb727405ba in /home/*/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: <unknown function> + 0x5ff666 (0x7fcb72740666 in /home/*/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #35: __libc_start_main + 0xe7 (0x7fcb75ab7b97 in /lib/x86_64-linux-gnu/libc.so.6)

terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: device-side assert triggered
Exception raised from create_event_internal at /pytorch/c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7fceacad88b2 in /home/*/anaconda3/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xad2 (0x7fceacd2a952 in /home/*/anaconda3/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7fceacac3b7d in /home/*/anaconda3/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #3: <unknown function> + 0xaac8b9 (0x7fceeac528b9 in /home/*/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #4: <unknown function> + 0x27cfba6 (0x7fceec975ba6 in /home/*/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #5: torch::autograd::deleteNode(torch::autograd::Node*) + 0x112 (0x7fceecda57b2 in /home/*/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #6: <unknown function> + 0x485319 (0x7fcefb342319 in /home/*/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #7: c10::TensorImpl::release_resources() + 0x20 (0x7fceacac3b50 in /home/*/anaconda3/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #8: <unknown function> + 0x5ff5ba (0x7fcefb4bc5ba in /home/*/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #9: <unknown function> + 0x5ff666 (0x7fcefb4bc666 in /home/*/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #10: <unknown function> + 0x1a184f (0x55f609aa184f in /home/*/anaconda3/bin/python)
frame #11: <unknown function> + 0x10db9b (0x55f609a0db9b in /home/*/anaconda3/bin/python)
frame #12: <unknown function> + 0x1a184f (0x55f609aa184f in /home/*/anaconda3/bin/python)
frame #13: <unknown function> + 0x123b73 (0x55f609a23b73 in /home/*/anaconda3/bin/python)
frame #14: _PyGC_CollectNoFail + 0x2a (0x55f609b1bfda in /home/*/anaconda3/bin/python)
frame #15: PyImport_Cleanup + 0x29c (0x55f609aa7bcc in /home/*/anaconda3/bin/python)
frame #16: Py_FinalizeEx + 0x67 (0x55f609b23087 in /home/*/anaconda3/bin/python)
frame #17: <unknown function> + 0x235f93 (0x55f609b35f93 in /home/*/anaconda3/bin/python)
frame #18: _Py_UnixMain + 0x3c (0x55f609b362bc in /home/*/anaconda3/bin/python)
frame #19: __libc_start_main + 0xe7 (0x7fcefe833b97 in /lib/x86_64-linux-gnu/libc.so.6)
frame #20: <unknown function> + 0x1db062 (0x55f609adb062 in /home/*/anaconda3/bin/python)

Aborted (core dumped)

错误原因:

pytorch要求label必须是0到n的连续整数,我出错的原因是二分类label用的是0和2这种不连续的整数,改成0,1即可解决。

标签:ClassNLLCriterion,triggered,lib,anaconda3,frame,assert,pytorch,报错,home
来源: https://blog.csdn.net/veritasalice/article/details/111917185