UC San Diego
Automated Feature Design for Time Series Classification by Genetic Programming
- Author(s): Harvey, Dustin Yewell
- et al.
Time series classification (TSC) methods discover and exploit patterns in time series and other one-dimensional signals. Although many accurate, robust classifiers exist for multivariate feature sets, general approaches are needed to extend machine learning techniques to make use of signal inputs. Numerous applications of TSC can be found in structural engineering, especially in the areas of structural health monitoring and non-destructive evaluation. Additionally, the fields of process control, medicine, data analytics, econometrics, image and facial recognition, and robotics include TSC problems. This dissertation details, demonstrates, and evaluates Autofead, a novel approach to automated feature design for TSC. In Autofead, a genetic programming variant evolves a population of candidate solutions to optimize performance for the TSC or time series regression task based on training data. Solutions consist of features built from a library of mathematical and digital signal processing functions. Numerical optimization methods, included through a hybrid search approach, ensure that the fitness of candidate feature algorithms is measured using optimal parameter values. Experimental validation and evaluation of the method is carried out on a wide range of synthetic, laboratory, and real-world data sets with direct comparison to conventional solutions and state-of-the-art TSC methods. Autofead is shown to be competitively accurate as well as producing highly interpretable solutions that are desirable for data mining and knowledge discovery tasks. Computational cost of the search is relatively high in the learning stage to design solutions; however, the computational expense for classifying new time series is very low making Autofead solutions suitable for embedded and real-time systems. Autofead represents a powerful, general tool for TSC and time series data mining researchers as well as industry practitioners. Potential applications are numerous including the monitoring of electrocardiogram signals for indications of heart failure, network traffic analysis for intrusion detection systems, vibration measurement for bearing condition determination in rotating machinery, and credit card activity for fraud detection. In addition to the development of the overall method, this dissertation provides contributions in the areas of evolutionary computation, numerical optimization, digital signal processing, and uncertainty analysis for evaluating solution robustness