其他分享
首页 > 其他分享> > Temporal RoI Align for Video Object Recognition 解读

Temporal RoI Align for Video Object Recognition 解读

作者:互联网

可以采用翻译软件翻译

Temporal RoI Align for Video Object Recognition

TL;DR

Introduction

can only utilize nearby frames within 1 sec(30 frames)

ROI Align

Temporal ROI Align

Extract features corresponding to target frame based on affine map, not positions in ROI regions in support frames

Notations

Most Similar ROI Align(Top K + concatenation)

pixel-level

deformable align, based on SIMILARITY rather than BBOX REGION in original ROI-align

Temporal Feature Aggregation

How to use the T aligned feature blocks to help detection in this frame

get an enhanced \(\bar{X}_{t}\)

Pipeline

Experiments

Difference from Non-local Network

Non-local Operation works

It's essentially the same: introducing dynamic, non-local reception as big as whole image.

However, I think the problem lies in the target frame*

标签:RoI,frac,ROI,Temporal,Object,feature,times,Align
来源: https://www.cnblogs.com/zxyfrank/p/16500877.html