Skip to Main content Skip to Navigation
Conference papers

Temporal Contrastive Pretraining for Video Action Recognition

Abstract : In this paper, we propose a self-supervised method for video representation learning based on Contrastive Predictive Coding (CPC) [27]. Previously, CPC has been used to learn representations for different signals (audio, text or image). It benefits from the use of an autoregressive modeling and contrastive estimation to learn long-term relations inside raw signal while remaining robust to local noise. Our self-supervised task consists in predicting the latent representation of future segments of the video. As opposed to generative models, predicting directly in the feature space is easier and avoid incertitude problems for long-term predictions. Today, using CPC to learn representations for videos remains challenging due to the structure and the high dimensionality of the signal. We demonstrate experimentally that the representations learned by the network are useful for action recognition. We test it with different input types such as optical flows, image differences and raw images on different datasets (UCF-101 and HMDB51). It gives consistent results across the modalities. At last, we notice the utility of our pre-training method by achieving competitive results for action recognition using few labeled data.
Document type :
Conference papers
Complete list of metadata

https://hal-normandie-univ.archives-ouvertes.fr/hal-03255934
Contributor : Samia Ainouz-Zemouche Connect in order to contact the contributor
Submitted on : Wednesday, June 9, 2021 - 10:57:33 PM
Last modification on : Friday, March 4, 2022 - 3:12:48 AM
Long-term archiving on: : Friday, September 10, 2021 - 7:26:47 PM

File

LORRE_Temporal_Contrastive_Pre...
Publisher files allowed on an open archive

Identifiers

Citation

Guillaume Lorre, Jaonary Rabarisoa, Astrid Orcesi, Samia Ainouz-Zemouche, Stephane Canu. Temporal Contrastive Pretraining for Video Action Recognition. IEEE/CVF Winter Conference on Applications of Computer Vision, Mar 2020, Snowmass, United States. ⟨10.1109/WACV45572.2020.9093278⟩. ⟨hal-03255934⟩

Share

Metrics

Record views

17

Files downloads

60