RESOUND: Towards Action Recognition without Representation Bias

Paper

Yingwei Li, Yi Li and Nuno Vasconcelos

Published in European Conference on Computer Vision (ECCV), 2018

Abstract: While large datasets have proven to be a key enabler for progress in computer vision, they can have biases that lead to erroneous conclusions. The notion of the representation bias of a dataset is proposed to combat this problem. It captures the fact that representations other than the ground-truth representation can achieve good performance on any given dataset. When this is the case, the dataset is said not to be well calibrated. Dataset calibration is shown to be a necessary condition for the standard state-of-the-art evaluation practice to converge to the ground-truth representation. A procedure, RESOUND, is proposed to quantify and minimize representation bias. Its application to the problem of action recognition shows that current datasets are biased towards static representations (objects, scenes and people). Two versions of RESOUND are studied. An Explicit RESOUND procedure is proposed to assemble new datasets by sampling existing datasets. An implicit RESOUND procedure is used to guide the creation of a new dataset, Diving48, of over 18,000 video clips of competitive diving actions, spanning 48 fine-grained dive classes. Experimental evaluation confirms the effectiveness of RESOUND to reduce the static biases of current datasets.