[Paper Notes] [CVPR2019] Hand + Object (H+O)
作者:互联网
文章目录
Paper information
H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions
Bugra Tekin, Federica Bogo, and Marc Pollefeys. H+o: Unified egocentric recognition of 3d hand-object poses and interactions. In CVPR, pages 4511–4520, 2019.
Introduction
Target
Propose a unified method to recognize 3D hand-object poses, and their interactions from egocentric(单视角) monocular(单眼) color images.
Limitations of previous work
- rely on active depth sensors or multi-camera systems
- do not reason about the action the subject is performing
Contributions
- a unified framework for recognizing 3D hand-object interactions.
- a novel single-shot neural network framework that jointly solves for the 3D articulated(骨架) and rigid(刚体) pose estimation problems within the same architecture.
- a temporal model to merge and propagate information in the temporal domain, explicitly model interactions and infer relations between hand and objects, directly in 3D.
Method
Notation | Meaning |
---|---|
I t ( 1 ≤ t ≤ N ) \textbf{I}^t (1\le t \le N) It(1≤t≤N) | input color frames sequence |
N c N_c Nc | number of 3D control points |
N a N_a Na | number of actions |
N o N_o No | number of object classes |
N i a N_{ia} Nia | number of interactions |
y i ∈ R 3 N c \textbf{y}_i \in \mathbb{R}^{3N_c} yi∈R3Nc | control points |
p i a ∈ R N a \textbf{p}^a_i \in \mathbb{R}^{N_a} pia∈RNa | action probability |
p i o ∈ R N o \textbf{p}^o_i \in \mathbb{R}^{N_o} pio∈RNo | object class probability |
p i a ∈ R N i a \textbf{p}^{ia} \in \mathbb{R}^{N^{ia}} pia∈RNia | probability vector over interaction classes |
c i ∈ [ 0 , 1 ] c_i \in [0,1] ci∈[0,1] | overall confidence |
Network
- (a) I t \textbf{I}^t It ->FCN -> RNN
-
- RNN takes input hand and object predictions with high confidence and outputs a probability vector p i a \textbf{p}^{ia} pia
- (b) divide input image into regular grid G t \textbf{G}^t Gt containing H × W × D H \times W\times D H×W×D cells
- (c-d) keep the target values for hands and objects
- (e) for each cell, store two sets for hand and object v i h \textbf{v}^h_i vih and v i o \textbf{v}^o_i vio, i ∈ H × W × D i\in H\times W\times D i∈H×W×D
-
- v i h \textbf{v}^h_i vih stores control points, action prob and confidence
-
- v i o \textbf{v}^o_i vio stores control points, object class prob and confidence
-
- low confidence are pruned
What is control points?
Why is control points?
3D bounding box cannot handle articulated 3D pose
How?
parametrize both hand and object poses jointly with 3D control points
- Choose N c = 21 N_c = 21 Nc=21
- 21 = 8 keypoints + 12 edge midpoints + 1 centroid
标签:control,object,Notes,Object,points,textbf,Paper,hand,3D 来源: https://blog.csdn.net/weixin_42441466/article/details/121336946