首页 > 其他分享> > PointNetVLAD 论文笔记

PointNetVLAD 论文笔记

2022-03-25 22:32:20 作者：互联网

Abstract
Term
Problem Definition
流程
数据处理和结果分析

Abstract

Unlike its image based counterpart, point cloud based retrieval for place recognition has remained as an unexplored and unsolved problem. This is largely due to the difficulty in extracting local feature descriptors from a point cloud that can subsequently be encoded into a global de-scriptor for the retrieval task. In this paper, we propose the PointNetVLAD where we leverage on（利用） the recent success of deep networks to solve point cloud based retrieval for place recognition. Specifically, our PointNetVLAD is a combi-nation/modification of the existing PointNet and NetVLAD, which allows end-to-end training and inference to extract the global descriptor from a given 3D point cloud. Fur-thermore, we propose the “lazy triplet and quadruplet” loss functions that can achieve more discriminative and gener-alizable global descriptors to tackle（处理，解决） the retrieval task. We create benchmark datasets for point cloud based retrieval for place recognition, and the experimental results on these datasets show the feasibility of our PointNetVLAD. Our code and datasets are publicly available on the project web-site

点云检索的难点在于如何提取提取一个可以被编码为用于检索任务的全局描述符的局部特征描述符（有一点绕）。

本文提出了什么呢？

基于PointNet的深度学习网络
lazy triplet and quadruplet loss function.

Term

LiDAR (Light Detection and Ranging) 激光雷达

SfM (Structure-from-Motion) 动态结构

circumvent 避开

benchmark datasets 基准数据集

\(\mathcal{M}\) 固定框架下的3D数据库。

AOC Area of coverage 覆盖区域

进一步定义将\(\mathcal{M}\)分解为\(M\)个子图。

那么\(\mathcal{M} = \cup_{i=1}^{M} m_i | AOC(m_i) \approx AOC(m_j)\).

并且我们希望子图\(m_i\)是比较小的，满足\(|m_i| \ll |\mathcal{M}|\)

\(\mathcal{G}(\cdot)\) 下采样，但是实际上作者提前处理了。下采样之后回保证子图点云点数一样。

\(f(\cdot)\) 是对于一个给定的点云\(\bar{p}\)将其映射为固定大小的全剧描述符变量。

Problem Definition

Given a query 3D point cloud denoted as \(q\), where \(\verb|AOC|(q) \approx \verb|AOC|(m_i)\) and \(|\mathcal{G}(q)|= |\mathcal{G}(m_i)|\) , out goal is to retrieve the submap \(m_*\) from the database \(\mathcal{M}\) that is structurally most similar to \(q\).

其实就是将点云映射成某个\(m\)维向量，然后\(\text{KNN}\)去找就好了。

流程

PointNet

这部分先略过，因为我们专门有PointNet的文章。只需要知道

NetVLAD（要反复读）

需要补充很多知识点，比如：

NetVLAD
Hinge Loss and SVM （这个估计要看书才会）
triplet loss ✅

一些参考链接：

NetVLAD 笔记

NetVLAD 知乎

如何理解各种Loss ✅ 这个写的非常好

总而言之，我们通过NetVLAD将PointNet得到的 local descriptors 转化为一个\(D\times K\)的全局向量。

有一个问题，为什么我们要进行转化呢？

但是考虑到\(D\times K\)维度太高了，所以我们用 full connected layer 去降维，最后用L2 Normalized产生最终的全局描述符（\(f(P) \in \R ^ {\mathcal{O}} | \mathcal{O} \ll (D\times K)\)）。

但是仍然有几个问题：

full connected layer 为什么可以降维，如何实现的？（应该是看论文）

什么是L2-Normalized，那么L1-Norm存在吗？这里为什么要用？解决了什么.

About L2 Normalization Link:

Kaggle, 并不是很好写的，但是对比介绍很详细

知乎, 写的很好

带一点代码的

\(l_2(\Vert v \Vert_2)\)就是欧几里得范数，相比L1不够鲁邦，无法输出稀疏数据。

\[l_2 = \sqrt{\sum \left( x_i^2\right)} \]

Metric Learning

什么是Metric Learning ?

知乎
标签：neg,论文,pos,笔记,NetVLAD,delta,点云,mathcal,PointNetVLAD
来源： https://www.cnblogs.com/Last--Whisper/p/16057001.html