【Zero Shot Detection】论文阅读笔记
作者:互联网
As we move towards large-scale object detection, it is unrealistic to expect annotated training data, in the form of bounding box annotations around objects, for all object classes at sufficient scale, and so methods capable of unseen object detection are required. We propose a novel zero-shot method based on training an end-to-end model that fuses semantic at- tribute prediction with visual features to propose object bounding boxes for seen and unseen classes. While we utilize semantic features during training, our method is agnostic to semantic information for unseen classes at test-time. Our method retains the efficiency and effectiveness of YOLOv2 [1] for objects seen during training, while improving its performance for novel and unseen objects. The ability of state-of-art detection methods to learn discriminative object features to reject background proposals also limits their performance for unseen objects. We posit that, to detect unseen objects, we must incorporate semantic information into the visual domain so that the learned visual features reflect this information and leads to improved recall rates for unseen objects. We test our method on PASCAL VOC and MS COCO dataset and observed significant improve- ments on the average precision of unseen classes
随着我们走向大规模的对象检测,期望所有对象类都有足够规模的标注训练数据(以对象周围的边界框标注的形式)是不现实的,因此需要能够检测未知对象的方法。
我们提出了一种新的零样本方法,训练一个端到端的网络,该模型将语义属性预测与视觉特征相融合,为可见类和不可见类找到bounding box。我们在训练中使用语义特征,但在测试时对unseen类的语义信息是不可知的。
我们的方法保留了YOLOv2对训练过程中可知类的效率和有效性,同时提高了它对不可知类的性能。
【【最先进的检测方法学习有区别的物体特征以拒绝背景提议的能力也限制了它们对于看不见的物体的性能。】】(没看懂)
我们假设,为了检测看不见的物体,我们必须将语义信息纳入视觉领域,以便学习的视觉特征反映这一信息,从而提高unseen类的召回率。我们在PASCAL VOC和MS COCO数据集上测试了我们的方法,并观察到了unseen类的平均精度的显著提高。
标签:training,Shot,object,Detection,Zero,objects,unseen,semantic,method 来源: https://blog.csdn.net/m0_45682738/article/details/122015909