Pattern-Based Data Mining on Diverse Multimedia and Time Series Data
- Author(s): Campana, Bilson Jake
- Advisor(s): Keogh, Eamonn
- et al.
The ubiquity of patterns in data mining and knowledge discovery data sets is a binding characteristic across a diverse, and possibly otherwise unrelated, range of images, audio, video, and time series data. Despite the intra and inter distinctions of the data sets, there is usually the notion of a pattern within each data. These patterns may manifest as macro and micro textures in images, n-grams in text, motifs in time series, etc. Though despite this recurring trait, scientific studies on these data sets are an expansive history of varied methods, with new algorithms continuously presenting novel techniques and/or specialized parameters to adjust to their particular data. Because of the growing algorithmic complexities, efforts with new data then require an in depth review of its voluminous research background in order to optimize the selection of algorithm's sub-functions, feature spaces, and parameters.
Rather than providing data-dependent approaches which exist to cater to the variances in the data, this work leverages on the existence of patterns in many, if not all, data sets by using the data's pattern as its atomic form of representation. By forming our algorithms to operate on the data's patterns, all that is necessary for the application of these pattern-based methods on new, unseen data is an understanding the data's patterns; a term well understood by human intuition and abundantly expressed in the literature in many fields.
We first present a framework which provides an extremely accurate, fast, and parameter-less methods for measuring textural pattern similarity in images. We then demonstrate that this pattern-based method continues to be high performing across a large variety of image data sets from very diverse fields. We then show that it performs equally well in the realm of audio similarity. To further demonstrate the reach of pattern-based approaches, we present a novel method for the discovery of motif rules in time series; a pattern discovery problem where previous research efforts have been shown to deliver meaningless results. We then demonstrate optimizations for time series similarity search, a core subroutine to time series rule discovery and many other time series data mining algorithms.