Large-scale sequencing of the CD33-related Siglec gene cluster in five mammalian species reveals rapid evolution by multiple mechanisms
Siglecs are a recently discovered family of animal lectins that belong to the Ig superfamily and recognize sialic acids (Sias). CD33-related Siglecs (CD33rSigiecs) are a subgroup with as-yet-unknown functions, characterized by sequence homology, expression on innate immune cells, conserved cytosolic tyrosine-based signaling motifs, and a clustered localization of their genes. To better understand the biology and evolution of CD33rSiglecs, we sequenced and compared the CD33rSiglec gene cluster from multiple mammalian species. Within the sequenced region, the segments containing CD33rSiglec genes showed a lower degree of sequence conservation. In contrast to the adjacent conserved kallikrein-like genes, the CD33rSiglec genes showed extensive species differences, including expansions of gene subsets; gene deletions, including one human-specific loss of a novel functional primate Siglec (Siglec-13); exon shuffling, generating hybrid genes; accelerated accumulation of nonsynonymous substitutions in the Sia-recognition domain; and multiple instances of mutations of an arginine residue essential for Sia recognition in otherwise intact Siglecs. Nonsynonymous differences between human and chimpanzee orthologs showed uneven distribution between the two 13 sheets of the Sia-recognition domain, suggesting biased mutation accumulation. These data indicate that CD33rSiglec genes are undergoing rapid evolution via multiple genetic mechanisms, possibly due to an evolutionary "arms race" between hosts and pathogens involving Sia recognition. These studies, which reflect one of the most complete comparative sequence analyses of a rapidly evolving gene cluster, provide a clearer picture of the ortholog status of CD33rSiglecs among primates and rodents and also facilitate rational recommendations regarding their nomenclature.