Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

A Bootstrap Algorithm for Testing the Equality of Phi Correlation Matrices in Text Mining

Abstract

In text data analysis, researchers often use ϕ correlation matrices as inputs for network visualizations (Aruga et al., 2022; Buschken & Allenby, 2016; Lee et al., 2021). These visualizations are typically compared across groups, and differences are inferred visually. However, this approach lacks formal statistical tests to evaluate whether the correlation matrices differ significantly across groups. This dissertation introduces a parametric bootstrap algorithm for testing the equality of multiple ϕ correlation matrices across groups. The algorithm generates bootstrap samples under the null hypothesis treating observed sample statistics as population parameters. Two simulation studies assessed the algorithm’s ability to control Type I error under conditions common in text data analysis. Study 1 focused on equal correlations with equal marginals. Study 2 examined more complex cases with varying correlations and unequal marginals. The algorithm maintained nominal Type I error rates given sufficient sample sizes. This method offers a practical tool for testing group differences in word co-occurrence structures. It can be integrated into text analysis pipelines by helping detect meaningful group based differences before tasks such as network plotting.