首页 > 其他分享> > Temporal RoI Align for Video Object Recognition 解读

Temporal RoI Align for Video Object Recognition 解读

2022-07-21 10:04:50 作者：互联网

可以采用翻译软件翻译

Temporal RoI Align for Video Object Recognition

RPN -> proposals
proposal -> deformable attention along time axis -> aggregate temporal features to current frame
regress

image-level information
- D&T, DFF, FGFA, MANet, STSN
- the performance of these methods degrades quickly with longer time interval

can only utilize nearby frames within 1 sec(30 frames)

proposal-level information?
- MANet, SELSA, Leveraging Long-Range Temporal Relationships Between Proposals for Video Object Detection

Extract features corresponding to target frame based on affine map, not positions in ROI regions in support frames

\(T\), number of supporting frames
\(F_{t} \in \mathbb{R}^{H\times W \times C}\), feature map(full image)
\(X_{t} \in \mathbb{R}^{h\times w \times C}\)
- ROI-aligned feature
- Note: ROI-align is the prerequisite to perform detection, which adaptively rescale the feature to suit CNN

pixel-level

deformable align, based on SIMILARITY rather than BBOX REGION in original ROI-align

Input
- current ROI \(X_{t}\)
- feature maps of support frames \(\{F_{t+i}\}_{i = -\frac{T}{2}}^{\frac{T}{2}}\)
Output
- \(\{X_{t+i}\}_{i = -\frac{T}{2}}^{\frac{T}{2}}\) ROI in every support frame

How to use the T aligned feature blocks to help detection in this frame

query: \(X_{t}\)
key: \(\{X_{t+i}\}_{i = -\frac{T}{2}}^{\frac{T}{2}}\)
value: \(\{X_{t+i}\}_{i = -\frac{T}{2}}^{\frac{T}{2}}\)
multi-head
- split feature map to \(N \times \mathbf{F} \in \mathbb{R}^{h\times w\times \frac{C}{N}}\)
- apply \(N\) heads.

get an enhanced \(\bar{X}_{t}\)

Non-local Operation works

It's essentially the same: introducing dynamic, non-local reception as big as whole image.

However, I think the problem lies in the target frame*

RPN cannot propose regions when encountering severe distortion
We should not assume that distortion can be verified only based on single-pixel affinity

标签：RoI,frac,ROI,Temporal,Object,feature,times,Align
来源： https://www.cnblogs.com/zxyfrank/p/16500877.html