Weakly-Supervised Temporal Activity Localization and Classification With Web Videos
- Author(s): Dougherty, Thomas
- Advisor(s): Roy-Chowdhury, Amit K
- et al.
In this thesis, weakly-supervised temporal activity localization and classification is considered with the use of web videos. Most activity localization methods depend on the availability of frame-wise annotation, which is a burdensome task to collect. To reduce the effort of manual labeling, learning from weak labels may be used as a potential solution. Recently there has been a substantial influx of tagged videos on the Internet. These can potentially be used as a rich source of data for weakly-supervised training. The following problem is considered. Given only the keyword of an action, can videos be retrieved online and be used to train the Weakly-supervised Temporal Activity Localization and Classification (W-TALC) network? Then, can a re-ranking method be implemented to filter out noisy video data? Action categories of the Thumos14 dataset are used to search for videos online with Youtube Data API. These videos are used as a training set for the W-TALC network. Given only the video labels, the W-TALC network learns to both localize and classify actions in videos. Using a re-ranking strategy, noisy video data is removed and shows an increase in detection performance versus using the original web video dataset. Analysis of the web video dataset and results of the detection performance shows promise for the reliable use of web videos for training.