首页 > 其他分享> > [Paper Notes] [CVPR2019] Hand + Object (H+O)

[Paper Notes] [CVPR2019] Hand + Object (H+O)

2021-11-16 11:34:29 作者：互联网

文章目录

Paper information
Introduction
Method

Paper information

H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions

Bugra Tekin, Federica Bogo, and Marc Pollefeys. H+o: Unified egocentric recognition of 3d hand-object poses and interactions. In CVPR, pages 4511–4520, 2019.

Introduction

Target

Propose a unified method to recognize 3D hand-object poses, and their interactions from egocentric（单视角） monocular（单眼） color images.

Example Result

Limitations of previous work

rely on active depth sensors or multi-camera systems
do not reason about the action the subject is performing

Contributions

a unified framework for recognizing 3D hand-object interactions.
a novel single-shot neural network framework that jointly solves for the 3D articulated（骨架） and rigid（刚体） pose estimation problems within the same architecture.
a temporal model to merge and propagate information in the temporal domain, explicitly model interactions and infer relations between hand and objects, directly in 3D.

Method

Notation	Meaning
I t ( 1 ≤ t ≤ N ) \textbf{I}^t (1\le t \le N) It(1≤t≤N)	input color frames sequence
N c N_c Nc	number of 3D control points
N a N_a Na	number of actions
N o N_o No	number of object classes
N i a N_{ia} Nia	number of interactions
y i ∈ R 3 N c \textbf{y}_i \in \mathbb{R}^{3N_c} yi∈R3Nc	control points
p i a ∈ R N a \textbf{p}^a_i \in \mathbb{R}^{N_a} pia∈RNa	action probability
p i o ∈ R N o \textbf{p}^o_i \in \mathbb{R}^{N_o} pio∈RNo	object class probability
p i a ∈ R N i a \textbf{p}^{ia} \in \mathbb{R}^{N^{ia}} pia∈RNia	probability vector over interaction classes
c i ∈ [ 0 , 1 ] c_i \in [0,1] ci∈[0,1]	overall confidence

Network

在这里插入图片描述

(a) I t \textbf{I}^t It ->FCN -> RNN
- RNN takes input hand and object predictions with high confidence and outputs a probability vector p i a \textbf{p}^{ia} pia
(b) divide input image into regular grid G t \textbf{G}^t Gt containing H × W × D H \times W\times D H×W×D cells
(c-d) keep the target values for hands and objects
(e) for each cell, store two sets for hand and object v i h \textbf{v}^h_i vih and v i o \textbf{v}^o_i vio, i ∈ H × W × D i\in H\times W\times D i∈H×W×D
- v i h \textbf{v}^h_i vih stores control points, action prob and confidence
- v i o \textbf{v}^o_i vio stores control points, object class prob and confidence
- low confidence are pruned

What is control points?

Why is control points?

3D bounding box cannot handle articulated 3D pose

How?

parametrize both hand and object poses jointly with 3D control points

Choose N c = 21 N_c = 21 Nc=21
21 = 8 keypoints + 12 edge midpoints + 1 centroid

标签：control,object,Notes,Object,points,textbf,Paper,hand,3D
来源： https://blog.csdn.net/weixin_42441466/article/details/121336946