Skip to main content
eScholarship
Open Access Publications from the University of California

UC Riverside

UC Riverside Electronic Theses and Dissertations bannerUC Riverside

Learning from Time Series in the Resource-Limited Situations

Abstract

Many data mining tasks are based on datasets containing sequential characteristics, such as web search queries, medical monitoring data, motion capture records, and astronomical observations. In these and many other applications, a time series is a concise yet expressive representation. A wealth of current data mining research on time series is focused on providing exact solutions in such small datasets. However, advances in storage techniques and the increasing ubiquity of distributed systems make realistic time series datasets orders of magnitude larger than the size that most of those solutions can handle due to computational resource limitations. On the other hand, proposed approximate solutions such as dimensionality reduction and sampling, suffer from two drawbacks: they do not adapt to available computational resources and they often require complicated parameter tuning to produce high quality results.

In this dissertation, we discuss anytime/anyspace algorithms as a way to address these issues. Anytime/anyspace algorithms (after a small amount of setup time/space) are algorithms that always have a best-so-far answer available. The quality of these answers improves as more computational time/space is provided. We show that by framing problems as anytime/anyspace algorithms, we can extract the most benefit from the available computational resources and provide high-quality approximate solutions accordingly.

We further argue that it is not always effective and efficient to rely on whole datasets. When the data is noisy, using distinguishing local features rather than global features could mitigate the effect of noise. Moreover, building a concise model based on local features makes the computational time and space much less expensive. We introduce a new time series primitive, time series shapelets, as a distinguishing feature. Informally, shapelets are time series subsequences which are in some sense maximally representative of a class. As we shall show with extensive empirical evaluations in diverse domains, classification algorithms based on the time series shapelet primitives can be interpretable, more accurate, and significantly faster than state-of-the-art classifiers.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View