远程服务器 Linux 用cityscape训练DeepLabv3模型(Pytorch版)
作者:互联网
参考
https://blog.csdn.net/qq_45389690/article/details/111591713?utm_medium=distribute.pc_relevant_download.none-task-blog-baidujs-2.nonecase&depth_1-utm_source=distribute.pc_relevant_download.none-task-blog-baidujs-2.nonecase
https://blog.csdn.net/weixin_41919571/article/details/107906066
代码
https://github.com/jfzhang95/pytorch-deeplab-xception
出现问题
ImportError: No module named pycocotools.coco
解决
https://blog.csdn.net/u011961856/article/details/77676461
https://blog.csdn.net/joejeanjean/article/details/78839318?utm_medium=distribute.pc_relevant.none-task-blog-baidujs_title-6&spm=1001.2101.3001.4242
https://blog.csdn.net/haiyonghao/article/details/80472713?utm_medium=distribute.pc_relevant.none-task-blog-baidujs_title-11&spm=1001.2101.3001.4242
一定要把PythonAPI目录下除setup.py之外的所有文件拷贝到pytorch-deeplab-xception-master文件夹下
出现问题
ImportError: No module named ‘Queue’
解决
https://blog.csdn.net/DarrenXf/article/details/82962412
出现问题
from utils.loss import SegmentationLosses
ImportError: No module named loss
参考
https://blog.csdn.net/Diliduluw/article/details/103742766
解决
在utils文件下 新建一个空白的__init__.py
出现问题
AttributeError: ‘module’ object has no attribute ‘kaiming_normal_’
参考
https://blog.csdn.net/songchunxiao1991/article/details/83104893?utm_medium=distribute.pc_relevant_t0.none-task-blog-BlogCommendFromMachineLearnPai2-1.control&depth_1-utm_source=distribute.pc_relevant_t0.none-task-blog-BlogCommendFromMachineLearnPai2-1.control
https://blog.csdn.net/Aug0st/article/details/42707709
然后 我删除了所有pyc文件 (主要是不敢更新Python27\Lib\urllib2.pyc文件 怕影响别的算法)
删除指令 find /dir -name “*.pyc” | xargs rm -rf
但并没用
然后
我又配了一个python3.5 pytorch0.4.1的环境
出现问题
RuntimeError: CUDA error: out of memory
解决
train.py中改batch-size的default=2
出现问题
ValueError: Expected more than 1 value per channel when training, got input size [1, 256, 1, 1]
参考
https://blog.csdn.net/weixin_43925119/article/details/109755329
https://www.cnblogs.com/zmbreathing/p/pyTorch_BN_error.html
https://blog.csdn.net/sinat_39307513/article/details/87917537
https://blog.csdn.net/qq_42079689/article/details/102587401?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-2.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-2.control
https://blog.csdn.net/jining11/article/details/111478935?utm_medium=distribute.pc_relevant.none-task-blog-baidujs_title-2&spm=1001.2101.3001.4242
https://blog.csdn.net/qq_36321330/article/details/108954588?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-2.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-2.control
https://blog.csdn.net/qq_34124009/article/details/109100053?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-4.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-4.control
https://blog.csdn.net/qq_21230831/article/details/103711545?utm_medium=distribute.pc_relevant.none-task-blog-OPENSEARCH-7.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-OPENSEARCH-7.control
https://blog.csdn.net/lrs1353281004/article/details/108262018?utm_medium=distribute.pc_relevant.none-task-blog-baidujs_title-7&spm=1001.2101.3001.4242
模型中含有nn.BatchNorm层,训练时需要batch_size大于1,来计算当前batch的running mean and std,数据数量除以batch_size后刚好余1时就会报错。
解决
改batch_size 使其除完不余1,但我改完5之后内存又不行了,报错。
于是 我去找了/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py文件,改里面drop_last=True。
因为远程服务器不能直接打开这个文件,所以粘过来又拷过去的。
标签:blog,Pytorch,DeepLabv3,csdn,details,https,Linux,article,net 来源: https://blog.csdn.net/weixin_49636863/article/details/113673775