Channel-temporal attention for first-person video domain adaptation

Xianyuan Liu, Shuo Zhou, Tao Lei, Haiping Lu

August 2021

Abstract

Unsupervised Domain Adaptation (UDA) can transfer knowledge from labeled source data to unlabeled target data of the same categories. However, UDA for first-person action recognition is an under-explored problem, with lack of datasets and limited consideration of first-person video characteristics. This paper focuses on addressing this problem. Firstly, we propose two small-scale first-person video domain adaptation datasets: ADLsmall and GTEA-KITCHEN. Secondly, we introduce channel-temporal attention blocks to capture the channel-wise and temporal-wise relationships and model their inter-dependencies important to first-person vision. Finally, we propose a Channel-Temporal Attention Network (CTAN) to integrate these blocks into existing architectures. CTAN outperforms baselines on the two proposed datasets and one existing dataset EPICcvpr20

Type

Preprint

Publication

arXiv preprint arXiv:2108.07846

Channel-temporal attention for first-person video domain adaptation

Abstract

Xianyuan Liu

Assistant Head of AI Research Engineering & Senior AI Research Engineer

Shuo Zhou

Lecturer in Machine Learning at University of Sheffield (past PhD Student)

Haiping Lu

Director of the UK Open Multimodal AI Network, Professor of Machine Learning, and Head of AI Research Engineering