Context-Aware Informative Sample Selection and Image Forgery Detection
Most of the computer vision methods assume that data will be labeled and available beforehand in order to train a good recognition model. However, it becomes infeasible and unrealistic to know all the labels beforehand with the huge corpus of visual data being generated on a daily basis. In most image and video analysis tasks, selection of the most informative samples from a huge pool of training data in order to learn a good recognition model is an active research problem. Furthermore, it is also useful to reduce the annotation cost, as it is time-consuming to annotate unlabeled samples. In this thesis, we aim to design information-theoretic approaches which exploit inter-relationships between data instances in order to find informative samples in image or videos. Moreover, in recent years, the advent of high-tech journaling tools facilitates an image to be forged in a way that can easily evade state-of-the-art image tampering detection approaches. The recent success of the deep learning approaches in different recognition tasks inspires us to develop a high-confidence detection framework which can localize forged/manipulated regions in an image. Unlike semantic object segmentation where all meaningful regions (objects) are segmented, the localization of image forgery focuses only the possible tampered region which makes the problem even more challenging.
We present two distinct information-theoretic approaches for selecting samples to learn recognition models, and a deep learning based method for localizing manipulation from images. In first approach, we show how models for joint scene and object classification can be learned online. A major motivation for this approach is to exploit the hierarchical relationships between scenes and objects, represented as a graphical model, in an active learning framework. To select the samples on graph, which need to be labeled by a human, we formulate an optimization function that reduces the joint entropy of scene and object variables. The second approach we propose is motivated by the theories in data compression, which exploits the concept of typicality from the domain of information theory in order to find informative samples in videos. Typicality is a simple and powerful technique which can be applied to compress the training data to learn a good classification model. Both of the approaches lead to a significant reduction in the amount of manual labeling effort for similar or better performance when compared with a model trained with the full dataset. In the final chapter, we explore a deep learning architecture to localize manipulated regions from an image. Our proposed framework utilizes resampling features, Long-Short Term Memory (LSTM) cells, and encoder-decoder network to segment out manipulated regions from non-manipulated ones. The overall framework is capable of detecting different types of image forgeries.