Length-Invariant Motif Discovery: Finding Similar Patterns at Different Subsequence Lengths
In many time series data mining problems, the analysis can be reduced to frequent pattern mining. Specific to time series, we have motif discovery algorithms that help finding repeated patterns in the given data. Tools such as Matrix Profile, Time series chains and Time series consensus motifs discover patterns in a time series. Although in principle they can be used with any distance measure, they have been optimized for euclidean distance. A problem this poses is that otherwise similar patterns in time series of different lengths can have a very high distance in the Euclidean space, and thus be difficult or impossible to discover with standard tools. In this thesis we will discuss ways to find motifs that may exist in different subsequence lengths. We will consider four algorithms: Iterative AB-STOMP, Appended STAMP, Pruned STOMP and piecewise STAMP and compare them on efficiency and effectiveness. Finally, the utility of these algorithms will be demonstrated through analysis on diverse domains, including insect behavior, bird song and electrical usage data.