Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

Statistical Analyses of Clustering Patterns of Transcription Factor-DNA Binding in ChIP-seq Data

Abstract

Binding of transcription factors on specific sites of DNA is central to the regulation of gene expression. ChIP-seq technology is a novel tool that combines the method of chromatin immunoprecipitation (ChIP) with the next generation DNA sequencing (seq) to identify the transcription factor binding loci on DNA. ChIP-seq has revolutionized the process of biological data acquisition for elucidating fundamental gene regulation mechanisms. However, the acquired large dataset on transcription factor-DNA binding calls for analyses using statistical tools, which will provide predictions that guide the wet-lab biological research. This research is part of statistical modeling of patterns of transcription factor-DNA binding which serves to analyze the various patterns of transcription factor co-clustering on DNA in a ChIP-seq dataset obtained in the mouse embryonic stem cells for 15 transcription factors/coregulators. First, we used the Chi-square goodness of fit test to determine whether the location of binding sites for each transcription factor constitute a Poisson process. The results indicated that it is unlikely to be a homogenous Poisson process. Second, we studied the correlation among the bindings by various transcription factors. Third, the patterns of various clustered sites containing three transcription factors were analyzed. It is found that there are a total of 3353 such sites. The transcription factors Smad1, Tcfcp2l1, Stat3, Klf4 and Esrrb and the coregulator p300 are preferentially co-localized with Nanog, Oct4, Sox2, while E2f1 and Zfx are preferentially colocalized with n-Myc and c-Myc.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View