Developing Efficient Techniques for Clustering and Degradation Analysis in Time Series Data
Time series data mining is one of the most studied and researched areas. This need in mining time series data is engendered by proliferation of ubiquitous sensors collecting data on different aspects of our life, from measuring equipment parameters on chemical plants to counting heartbeats as we exercise. The hardware advances in recent years allow for producing large amounts of complex data to describe objects’ behaviors and changes in those behaviors.
One of the most basic and widely used primitive for data mining is clustering, which finds its use as a subroutine in many higher-order algorithms. In the real world applications data often comes as a stream restricting random access to different data points and demanding to cluster coming data by processing each element only once. In this work, we propose a novel clustering algorithm for streaming data that can be used with any distance measure (not only metrics) and that is parameter-lite and exact. We demonstrate its utility on diverse real domains such as cardiology, entomology and biological audio processing.
It was shown that for time series clustering it is often useful to ignore some data within each of the time series. A recently-proposed technique, u-shapelets, addresses this issue allowing us to consider only the relevant data for clustering. However, the original algorithms were intractable on large datasets. In this dissertation, we propose a speed-up technique that allows u-shapelet discovery up to two orders of magnitude faster than original algorithms without significant loss of clustering quality. We also demonstrate utility of our scheme on real datasets from diverse domains such as monitoring of physical activity of elderly people and bird song analysis.
Finally, to allow monitoring of physical systems for aging and degradation signs, we propose a novel approach to extract and analyze the aging trend. Our algorithm can provide monitoring software with actionable information about the changes of the system’s behavior over time. This is critical to ensure timely repair of worn out components to maintain ceaseless work of the whole system.