其他分享
首页 > 其他分享> > 训练日志记录

训练日志记录

作者:互联网

训练日志

 

发现无法使用GPU,对应的驱动包没找到,后面有空了处理

 

(mask_rcnn) bim@bim-PowerEdge-R730:~/project/object_detection/pythons/Mask_RCNN/samples/coco$ python csc.py train --dataset=/home/bim/project/object_detection/DatasetV3 --model=/home/bim/project/object_detection/DatasetV3/mask_rcnn_coco.h5
Using TensorFlow backend.
Command:  train
Model:  /home/bim/project/object_detection/DatasetV3/mask_rcnn_coco.h5
Dataset:  /home/bim/project/object_detection/DatasetV3
Year:  2014
Logs:  /home/bim/project/object_detection/pythons/Mask_RCNN/samples/coco/D:\object_detection\DatasetV3/logs
Auto Download:  False

Configurations:
BACKBONE                       resnet101
BACKBONE_STRIDES               [4, 8, 16, 32, 64]
BATCH_SIZE                     16
BBOX_STD_DEV                   [0.1 0.1 0.2 0.2]
COMPUTE_BACKBONE_SHAPE         None
DETECTION_MAX_INSTANCES        100
DETECTION_MIN_CONFIDENCE       0.7
DETECTION_NMS_THRESHOLD        0.3
FPN_CLASSIF_FC_LAYERS_SIZE     1024
GPU_COUNT                      2
GRADIENT_CLIP_NORM             5.0
IMAGES_PER_GPU                 8
IMAGE_CHANNEL_COUNT            3
IMAGE_MAX_DIM                  1024
IMAGE_META_SIZE                14
IMAGE_MIN_DIM                  800
IMAGE_MIN_SCALE                0
IMAGE_RESIZE_MODE              square
IMAGE_SHAPE                    [1024 1024    3]
LEARNING_MOMENTUM              0.9
LEARNING_RATE                  0.001
LOSS_WEIGHTS                   {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0}
MASK_POOL_SIZE                 14
MASK_SHAPE                     [28, 28]
MAX_GT_INSTANCES               100
MEAN_PIXEL                     [123.7 116.8 103.9]
MINI_MASK_SHAPE                (56, 56)
NAME                           coco
NUM_CLASSES                    2
POOL_SIZE                      7
POST_NMS_ROIS_INFERENCE        1000
POST_NMS_ROIS_TRAINING         2000
PRE_NMS_LIMIT                  6000
ROI_POSITIVE_RATIO             0.33
RPN_ANCHOR_RATIOS              [0.5, 1, 2]
RPN_ANCHOR_SCALES              (32, 64, 128, 256, 512)
RPN_ANCHOR_STRIDE              1
RPN_BBOX_STD_DEV               [0.1 0.1 0.2 0.2]
RPN_NMS_THRESHOLD              0.7
RPN_TRAIN_ANCHORS_PER_IMAGE    256
STEPS_PER_EPOCH                1000
TOP_DOWN_PYRAMID_SIZE          256
TRAIN_BN                       False
TRAIN_ROIS_PER_IMAGE           200
USE_MINI_MASK                  True
USE_RPN_ROIS                   True
VALIDATION_STEPS               50
WEIGHT_DECAY                   0.0001


WARNING:tensorflow:From /home/bim/anaconda3/envs/mask_rcnn/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:504: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /home/bim/anaconda3/envs/mask_rcnn/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:68: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING:tensorflow:From /home/bim/anaconda3/envs/mask_rcnn/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:3828: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

WARNING:tensorflow:From /home/bim/anaconda3/envs/mask_rcnn/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:3652: The name tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead.

WARNING:tensorflow:From /home/bim/anaconda3/envs/mask_rcnn/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:1937: The name tf.image.resize_nearest_neighbor is deprecated. Please use tf.compat.v1.image.resize_nearest_neighbor instead.

WARNING:tensorflow:From /home/bim/anaconda3/envs/mask_rcnn/lib/python3.7/site-packages/tensorflow_core/python/ops/array_ops.py:1475: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /home/bim/project/object_detection/pythons/Mask_RCNN/samples/coco/mrcnn/model.py:553: The name tf.random_shuffle is deprecated. Please use tf.random.shuffle instead.

WARNING:tensorflow:From /home/bim/project/object_detection/pythons/Mask_RCNN/samples/coco/mrcnn/utils.py:202: The name tf.log is deprecated. Please use tf.math.log instead.

WARNING:tensorflow:From /home/bim/project/object_detection/pythons/Mask_RCNN/samples/coco/mrcnn/model.py:600: calling crop_and_resize_v1 (from tensorflow.python.ops.image_ops_impl) with box_ind is deprecated and will be removed in a future version.
Instructions for updating:
box_ind is deprecated, use box_indices instead
WARNING:tensorflow:From /home/bim/anaconda3/envs/mask_rcnn/lib/python3.7/site-packages/tensorflow_core/python/ops/variables.py:2825: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
Loading weights  /home/bim/project/object_detection/DatasetV3/mask_rcnn_coco.h5
WARNING:tensorflow:From /home/bim/anaconda3/envs/mask_rcnn/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:166: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

WARNING:tensorflow:From /home/bim/anaconda3/envs/mask_rcnn/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:171: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From /home/bim/anaconda3/envs/mask_rcnn/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:176: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2022-05-17 18:04:11.226652: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2022-05-17 18:04:11.255268: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2399985000 Hz
2022-05-17 18:04:11.256626: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5571a4cee720 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-05-17 18:04:11.256648: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2022-05-17 18:04:11.259625: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2022-05-17 18:04:12.618884: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5571a4d68950 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2022-05-17 18:04:12.618946: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Tesla P100-PCIE-12GB, Compute Capability 6.0
2022-05-17 18:04:12.618965: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (1): Tesla P100-PCIE-12GB, Compute Capability 6.0
2022-05-17 18:04:12.621074: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: Tesla P100-PCIE-12GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:04:00.0
2022-05-17 18:04:12.622743: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties: 
name: Tesla P100-PCIE-12GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:82:00.0
2022-05-17 18:04:12.623031: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/bim/anaconda3/envs/mask_rcnn/lib/python3.7/site-packages/cv2/../../lib64:/usr/local/cuda-11.4/lib64::/usr/local/cuda/lib64:/usr/local/cuda/lib64
2022-05-17 18:04:12.623239: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/bim/anaconda3/envs/mask_rcnn/lib/python3.7/site-packages/cv2/../../lib64:/usr/local/cuda-11.4/lib64::/usr/local/cuda/lib64:/usr/local/cuda/lib64
2022-05-17 18:04:12.623423: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/bim/anaconda3/envs/mask_rcnn/lib/python3.7/site-packages/cv2/../../lib64:/usr/local/cuda-11.4/lib64::/usr/local/cuda/lib64:/usr/local/cuda/lib64
2022-05-17 18:04:12.623611: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/bim/anaconda3/envs/mask_rcnn/lib/python3.7/site-packages/cv2/../../lib64:/usr/local/cuda-11.4/lib64::/usr/local/cuda/lib64:/usr/local/cuda/lib64
2022-05-17 18:04:12.623792: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/bim/anaconda3/envs/mask_rcnn/lib/python3.7/site-packages/cv2/../../lib64:/usr/local/cuda-11.4/lib64::/usr/local/cuda/lib64:/usr/local/cuda/lib64
2022-05-17 18:04:12.623970: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/bim/anaconda3/envs/mask_rcnn/lib/python3.7/site-packages/cv2/../../lib64:/usr/local/cuda-11.4/lib64::/usr/local/cuda/lib64:/usr/local/cuda/lib64
2022-05-17 18:04:12.624152: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/bim/anaconda3/envs/mask_rcnn/lib/python3.7/site-packages/cv2/../../lib64:/usr/local/cuda-11.4/lib64::/usr/local/cuda/lib64:/usr/local/cuda/lib64
2022-05-17 18:04:12.624181: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2022-05-17 18:04:12.624220: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-05-17 18:04:12.624242: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 1 
2022-05-17 18:04:12.624261: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N N 
2022-05-17 18:04:12.624277: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 1:   N N 
WARNING:tensorflow:From /home/bim/anaconda3/envs/mask_rcnn/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:180: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.
WARNING:tensorflow:From /home/bim/anaconda3/envs/mask_rcnn/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:189: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.

2022-05-17 18:04:13.777799: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
  /job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices: 
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:0' assigned_device_name_='' resource_device_name_='/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
IsVariableInitialized: CPU 
Identity: CPU XLA_CPU XLA_GPU 
VariableV2: CPU 
Assign: CPU 

Colocation members, user-requested devices, and framework assigned devices, if any:
  tower_0/mask_rcnn/anchors/Variable (VariableV2) /device:GPU:0
  tower_0/mask_rcnn/anchors/Variable/Assign (Assign) /device:GPU:0
  tower_0/mask_rcnn/anchors/Variable/read (Identity) /device:GPU:0
  IsVariableInitialized_692 (IsVariableInitialized) /device:GPU:0

2022-05-17 18:04:13.778241: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
  /job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices: 
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:0' assigned_device_name_='' resource_device_name_='/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
IsVariableInitialized: CPU 
Identity: CPU XLA_CPU XLA_GPU 
VariableV2: CPU 
Assign: CPU 

Colocation members, user-requested devices, and framework assigned devices, if any:
  tower_0/mask_rcnn/Variable (VariableV2) /device:GPU:0
  tower_0/mask_rcnn/Variable/Assign (Assign) /device:GPU:0
  tower_0/mask_rcnn/Variable/read (Identity) /device:GPU:0
  IsVariableInitialized_693 (IsVariableInitialized) /device:GPU:0

2022-05-17 18:04:13.791112: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
  /job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices: 
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:1' assigned_device_name_='' resource_device_name_='/device:GPU:1' supported_device_types_=[CPU] possible_devices_=[]
IsVariableInitialized: CPU 
Identity: CPU XLA_CPU XLA_GPU 
VariableV2: CPU 
Assign: CPU 

Colocation members, user-requested devices, and framework assigned devices, if any:
  tower_1/mask_rcnn/anchors/Variable (VariableV2) /device:GPU:1
  tower_1/mask_rcnn/anchors/Variable/Assign (Assign) /device:GPU:1
  tower_1/mask_rcnn/anchors/Variable/read (Identity) /device:GPU:1
  IsVariableInitialized_694 (IsVariableInitialized) /device:GPU:1

2022-05-17 18:04:13.791479: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
  /job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices: 
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:1' assigned_device_name_='' resource_device_name_='/device:GPU:1' supported_device_types_=[CPU] possible_devices_=[]
IsVariableInitialized: CPU 
Identity: CPU XLA_CPU XLA_GPU 
VariableV2: CPU 
Assign: CPU 

Colocation members, user-requested devices, and framework assigned devices, if any:
  tower_1/mask_rcnn/Variable (VariableV2) /device:GPU:1
  tower_1/mask_rcnn/Variable/Assign (Assign) /device:GPU:1
  tower_1/mask_rcnn/Variable/read (Identity) /device:GPU:1
  IsVariableInitialized_695 (IsVariableInitialized) /device:GPU:1

WARNING:tensorflow:From /home/bim/anaconda3/envs/mask_rcnn/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:196: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.

loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
Training network heads

Starting at epoch 0. LR=0.001

Checkpoint Path: /home/bim/project/object_detection/pythons/Mask_RCNN/samples/coco/D:\object_detection\DatasetV3/logs/coco20220517T1804/mask_rcnn_coco_{epoch:04d}.h5
Selecting layers to train
fpn_c5p5               (Conv2D)
fpn_c4p4               (Conv2D)
fpn_c3p3               (Conv2D)
fpn_c2p2               (Conv2D)
fpn_p5                 (Conv2D)
fpn_p2                 (Conv2D)
fpn_p3                 (Conv2D)
fpn_p4                 (Conv2D)
In model:  rpn_model
    rpn_conv_shared        (Conv2D)
    rpn_class_raw          (Conv2D)
    rpn_bbox_pred          (Conv2D)
mrcnn_mask_conv1       (TimeDistributed)
mrcnn_mask_bn1         (TimeDistributed)
mrcnn_mask_conv2       (TimeDistributed)
mrcnn_mask_bn2         (TimeDistributed)
mrcnn_class_conv1      (TimeDistributed)
mrcnn_class_bn1        (TimeDistributed)
mrcnn_mask_conv3       (TimeDistributed)
mrcnn_mask_bn3         (TimeDistributed)
mrcnn_class_conv2      (TimeDistributed)
mrcnn_class_bn2        (TimeDistributed)
mrcnn_mask_conv4       (TimeDistributed)
mrcnn_mask_bn4         (TimeDistributed)
mrcnn_bbox_fc          (TimeDistributed)
mrcnn_mask_deconv      (TimeDistributed)
mrcnn_class_logits     (TimeDistributed)
mrcnn_mask             (TimeDistributed)
WARNING:tensorflow:From /home/bim/anaconda3/envs/mask_rcnn/lib/python3.7/site-packages/keras/optimizers.py:744: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

/home/bim/anaconda3/envs/mask_rcnn/lib/python3.7/site-packages/tensorflow_core/python/framework/indexed_slices.py:424: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
/home/bim/anaconda3/envs/mask_rcnn/lib/python3.7/site-packages/tensorflow_core/python/framework/indexed_slices.py:424: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
/home/bim/anaconda3/envs/mask_rcnn/lib/python3.7/site-packages/tensorflow_core/python/framework/indexed_slices.py:424: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
WARNING:tensorflow:From /home/bim/anaconda3/envs/mask_rcnn/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:973: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.

WARNING:tensorflow:From /home/bim/anaconda3/envs/mask_rcnn/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:960: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.

/home/bim/anaconda3/envs/mask_rcnn/lib/python3.7/site-packages/keras/engine/training.py:2033: UserWarning: Using a generator with `use_multiprocessing=True` and multiple workers may duplicate your data. Please consider using the`keras.utils.Sequence class.
  UserWarning('Using a generator with `use_multiprocessing=True`'
WARNING:tensorflow:From /home/bim/anaconda3/envs/mask_rcnn/lib/python3.7/site-packages/keras/callbacks.py:714: The name tf.summary.merge_all is deprecated. Please use tf.compat.v1.summary.merge_all instead.

WARNING:tensorflow:From /home/bim/anaconda3/envs/mask_rcnn/lib/python3.7/site-packages/keras/callbacks.py:717: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.

csc.py:267: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  mask = np.stack(instance_masks, axis=2).astype(np.bool)
csc.py:267: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  mask = np.stack(instance_masks, axis=2).astype(np.bool)

 

 

##############

标签:bim,训练,记录,home,mask,rcnn,device,tensorflow,日志
来源: https://www.cnblogs.com/herd/p/16282077.html