Li, Haosong

A Scalable Association Rule Learning Algorithm for Large Datasets and Its Application to Microarray Datasets

2021

Li, Haosong
Advisor(s): Sheu, Phillip

Creative Commons 'BY-NC-ND' version 4.0 license

Abstract

Many algorithms have solved the association rule learning problem. However, most of these algorithms suffer from the problem of scalability either because of tremendous time complexity or memory usage, especially when the dataset is large and the minimum support (minsup) is set to a lower number. Among others, association rule learning algorithms have been applied to microarray datasets to find association rules among genes. With the development of microarray technology, larger datasets have been generated recently that challenge the current association rule learning algorithms. Specifically, the large number of items per transaction significantly increases the running time and memory consumption of such tasks. In this dissertation, we solve the above problems by introducing a new approach that follows the divide-and-conquer paradigm, which can exponentially reduce both the time complexity and memory usage, even on a single machine. It is shown from comparative experiments that the proposed heuristic approach has significant speedup over existing algorithms. The heuristic approach, with some modification, efficiently learns gene-disease association rules and gene-gene association rules from large-scale microarray datasets. The rules are ranked based on their importance. Our experiments show our algorithm outperforms the Apriori algorithm on microarray datasets by one to three orders of magnitude.

UC Irvine

A Scalable Association Rule Learning Algorithm for Large Datasets and Its Application to Microarray Datasets