Prokaryotes have evolved a diverse array of strategies to prevent or mitigate infection by phage. Among these, CRISPR-Cas systems (clustered regularly interspaced short palindromic repeats - CRISPR-associated) are unique in that they adapt to infections by generating an immunological memory that allows the host cell to mount a robust defense against subsequent infections. These systems are characterized by the presence of a genomic feature called a CRISPR array, which is made up of an AT-rich leader sequence followed by a series of direct repeat sequences of 20-50 base pairs alternating with variable viral-derived spacer sequences of similar length. When a cell is infected by a phage, a small fragment of the phage genome can be captured and inserted into the CRISPR array as a new spacer through a process called acquisition. The CRISPR array can then be transcribed to generate crRNAs (CRISPR RNAs) that assemble with interference Cas proteins to surveil the cell for complementary nucleic acid sequences. If a match is found, the Cas proteins degrade the nucleic acid. While the interference proteins of CRISPR-Cas systems are highly diverse, acquisition is broadly conserved. The proteins Cas1 and Cas2 carry out the integration of new spacers at the CRISPR locus and are found in nearly all identified active CRISPR systems. This work examines the mechanisms of spacer acquisition with a focus on how Cas1 and Cas2 from different CRISPR systems recognize and maintain specificity for the CRISPR array.
Cas1 and Cas2 function as a complex to capture fragments of foreign DNA, called protospacers prior to integration, and insert them at the leader-proximal repeat through an integrase-like mechanism that results in duplication of the repeat. We find that the Cas1-Cas2 from the Streptococcus pyogenes type II CRISPR system integrates with high specificity in vitro into both plasmid and short linear targets, and we identify sequence motifs in the leader and repeat required for integration. We present the first evidence of full-site integration in vitro and show that the sequence requirements for full-site integration are stricter than those for half-site integration. Our biochemical data suggest that full-site integration acts as a checkpoint to ensure specificity, while half-site integration occurs more promiscuously due to the limited potential for it to introduce mutations at off-target sites.
Using x-ray crystallography and cryo-electron microscopy, we identify the structural basis of leader and repeat recognition by Cas1-Cas2 from the Escherichia coli type I system. Crystal structures of the proteins bound to substrates mimicking half-site and full-site products, supported with biochemical and bacterial genetic experiments, show that integration requires substantial distortion of the repeat DNA and that the repeat sequence is identified by its deformability. The EM structure of Cas1-Cas2 bound to an extended target and IHF, a host factor required for specificity, reveals that IHF bends the leader DNA 180 to bring an upstream recognition sequence into contact with Cas1 for additional sequence-specific recognition. These structures and assays show that Cas1-Cas2 rely on structural constraints to restrict full-site integration to the CRISPR array.