With 33.3 million people presently infected with Human Immunodeficiency Virus-1 (HIV-1), combined with the 2.6 million new infections and 1.8 million AIDS related death in 2009 alone, HIV-1 continues to be one of the biggest global pandemics and medical challenges of the new millennium. Although the development of antiretroviral drugs was a major advance in the treatment of patients infected with HIV-1, complete eradication of HIV-1 has not been possible due to two major obstacles. First, the high mutation rate of the virus coupled with its rapid replication rate has given rise to drug resistant strains of HIV-1. Furthermore, latent viral reservoirs that are not directly targeted by anti-viral therapies or by the immune system can reactivate at a later time preventing complete viral clearance from a patient. Compounding these difficulties is the global diversification of viral strains or subtypes that have widely differing sequences, resulting in unique gene regulation and pathogenesis. Following integration into the host genome, activation of viral gene expression results in the production of new progeny whereas the inability to activate gene expression could initiate the establishment of viral latency. Thus, a better understanding of the mechanisms and factors that regulate viral transcription is critical towards eliminating latent viral populations. Therefore, the focus of this work has been to investigate the role of both cellular and viral factors in regulating HIV-1 gene expression and latency using a combination of computational and experimental techniques. This work may help develop novel therapy targets and better treatment regimens for different HIV-1 subtypes while concurrently providing new insights on mammalian gene regulation.
In studying viral factors that regulate gene expression in HIV-1, we focused attention on the HIV-1 promoter, a viral protein called Tat and a RNA hairpin called TAR. The error prone nature of HIV-1 replication has resulted in highly diverse viral sequences, and it is not clear how Tat, which plays a critical role in viral gene expression and replication, retains its complex functions. Although several important amino acid positions in Tat are conserved, we hypothesized that it may also harbor functionally important residues that may not be individually conserved yet appear as correlated pairs, and knowledge of such evolutionary information could help elucidate underlying mechanisms of Tat function. Using Information theory based approaches such as Mutual Information and protein engineering approaches, we found a pair of sites in Tat that are strongly coevolving and that provided insight into Tat-mediated viral transcription. In contrast to most coevolving protein residues that contribute to the same function, these studies showed that these two residues contribute to two mechanistically distinct steps in gene expression: binding the cellular protein, positive transcription-elongation factor b (P-TEFb) and promoting P-TEFb phosphorylation of the C-terminal domain in RNA Polymerase II (RNAPII). Moreover, Tat variants that mimic HIV-1 subtype B or C at these sites have evolved orthogonal strengths of P-TEFb binding vs. RNAPII phosphorylation, suggesting that subtypes have evolved alternate transcriptional strategies that could differentially impact latency while achieving similar gene expression levels.
Interaction between Tat and the viral hairpin TAR is critical for efficient gene expression from the viral promoter and we therefore hypothesized that sequence diversity within these elements may dramatically alter the gene expression and latency properties of different subtype viruses. We found large differences in gene expression between subtypes using a variety of experimental models and showed that subtype TARs and Tats act independently to set the level of gene expression from the viral promoter. Further, using Mutual information and site-directed mutagenesis we showed that nucleotides in TAR are not coevolving with residues in Tat implying that HIV-1 has evolved a highly robust mechanism of activating gene expression in the face of rapid viral evolution.
Similarly, the promoters of different HIV-1 subtypes have evolved different architectures of transcription factor binding sites (TFBS) that result in widely varying levels of gene expression and viral replication. Within this large diversity of TFBS in the HIV-1 promoter, we used in vitro models of HIV-1 latency to identify the minimal set of TFBS that contribute to most of the observed differences in gene expression and latency at steady state. In contract, we found that the dynamics of gene expression is dependent on both the minimal set of TFBS and other sites in the viral promoter. Identifying other targets within the viral promoter will provide better mechanistic understanding of the establishment and reactivation of HIV-1 latency as well as potentially identify new molecular targets to counter latency.
While diversity in viral factors can contribute to differential regulation of viral gene expression, host factors can also play a significant role in this regulation. Since HIV-1 integrates semi-randomly within the human genome, another aspect of my thesis included studying the role of the cellular genomic location in regulating viral gene expression. We exploited the semi-random integration of HIV-1 to quantitatively study both how latent proviruses can be reactivated from different chromatin environments and to address a fundamental question in eukaryotic gene expression related to how the placement of a gene in the genome impacts its responsiveness to an input transcription factor signal. Using a tunable overexpression system for the transcription factor NF-κB RelA, we quantified HIV-1 expression as a function of RelA levels and chromatin features at a panel of viral integration sites. We demonstrated that chromatin environments at different genomic loci decouple transcription factor mediated gene expression induction thresholds from subsequent gene activation. We developed a functional relationship between gene expression, RelA levels, and chromatin accessibility that accurately predicted synergistic HIV-1 activation in response to combinatorial pharmacological perturbations. Thus, this quantitative study should help inform strategies for combinatorial therapies to combat latent HIV-1 and help unravel biological principles underlying selective gene expression in response to transcription factor inputs.
Finally, after HIV-1 integrates into the host genome, it can either activate gene expression that leads to viral replication or become transcriptionally silent that can result in viral latency. Since stochastic fluctuations in HIV-1 gene expression are one of several factors that have been implicated in influencing this decision and thus in the establishment of viral latency, we investigated the role of the local chromatin environment in regulating gene expression noise. We showed that for clones with similar mean gene expression levels, those integrated into more heterochromatic regions are associated with wider mRNA and protein distributions. Using a two-state stochastic model of gene expression, we showed that the repressed chromatin gives rise to noisier gene expression by lowering the burst frequency. In addition to more clearly defining the role of the chromatin environment in regulating the establishment of viral latency, this study has implications for the role of chromatin in modulating transcriptional noise in eukaryotes and its evolutionary consequences in the placement of genes within the genome.
Thus these studies of the role of sequence variation within the viral genome and its chromosomal integration site in regulating gene expression has resulted in better understanding of the mechanisms of gene expression and establishment of latency in HIV-1, while also helping to discern the role of chromatin in regulating mammalian gene expression.