Skip to main content
eScholarship
Open Access Publications from the University of California

UC Riverside

UC Riverside Electronic Theses and Dissertations bannerUC Riverside

Image/Time Series Mining Algorithms: Applications to Developmental Biology, Document Processing and Data Streams

Abstract

Interdisciplinary research in computer science requires the development of computational techniques for practical application in different domains. This usually requires careful integration of different areas of technical expertise. This dissertation presents image and time series analysis algorithms, with practical interdisciplinary applications to develop-mental biology, historical manuscript processing, and data stream processing. Inspired by the NSF IGERT program, this dissertation presents algorithms for analysis of growth dy-namics at the shoot apex of Arabidopsis thaliana. A robust understanding of the causal relationship between gene expression, cell behaviors, and organ growth requires the de-velopment of computational techniques for quantitative analysis of real-time, live-cell meristem growth data. This requires the development/application of image analysis tools and novel time series alignment algorithms. Image analysis is necessary for the computa-tion of growth features, but this leads to a time series of unsynchronized growth data, which requires a robust alignment method. Towards this end, we present two time series alignment algorithms. This dissertation further considers image mining in historical doc-ument processing. An application of the Minimum Description Length principle (MDL) to develop a symbols clustering algorithm is presented. The developed algorithm pro-duced one of the first practical applications of MDL to real-world, real-valued data such as images. Moreover, we introduce a novel premise that a clustering algorithm should have the freedom to ignore some data. Extensive empirical results show that the MDL-based algorithm outperforms the popular K-Means clustering algorithm, given the same input data, distance measure, and the correct value of K in K-means. The new algorithm could have significant impact, as clustering is a critical subroutine in almost all historical document processing systems. Finally, we present an algorithm for detecting rare and ap-proximately repeating sequences in unbounded real-valued data streams, given limited space. This algorithm employs the novel integration of SAX time series representation with a Bloom filter to develop a robust cache maintenance policy that allows us to over-come known challenges to a previously unsolved frequent pattern mining problem. Our contribution lies in the fact that we solve this problem for real-valued data, whereas only the discrete-valued case has been considered in the literature.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View