- Veeramah, Krishna R;
- Connell, Bruce A;
- Pour, Naser;
- Powell, Adam;
- Plaster, Christopher A;
- Zeitlyn, David;
- Mendell, Nancy R;
- Weale, Michael E;
- Bradman, Neil;
- Thomas, Mark G
Abstract Background The Cross River region in Nigeria is an extremely diverse area linguistically with over 60 distinct languages still spoken today. It is also a region of great historical importance, being a) adjacent to the likely homeland from which Bantu-speaking people migrated across most of sub-Saharan Africa 3000-5000 years ago and b) the location of Calabar, one of the largest centres during the Atlantic slave trade. Over 1000 DNA samples from 24 clans representing speakers of the six most prominent languages in the region were collected and typed for Y-chromosome (SNPs and microsatellites) and mtDNA markers (Hypervariable Segment 1) in order to examine whether there has been substantial gene flow between groups speaking different languages in the region. In addition the Cross River region was analysed in the context of a larger geographical scale by comparison to bordering Igbo speaking groups as well as neighbouring Cameroon populations and more distant Ghanaian communities. Results The Cross River region was shown to be extremely homogenous for both Y-chromosome and mtDNA markers with language spoken having no noticeable effect on the genetic structure of the region, consistent with estimates of inter-language gene flow of 10% per generation based on sociological data. However the groups in the region could clearly be differentiated from others in Cameroon and Ghana (and to a lesser extent Igbo populations). Significant correlations between genetic distance and both geographic and linguistic distance were observed at this larger scale. Conclusions Previous studies have found significant correlations between genetic variation and language in Africa over large geographic distances, often across language families. However the broad sampling strategies of these datasets have limited their utility for understanding the relationship within language families. This is the first study to show that at very fine geographic/linguistic scales language differences can be maintained in the presence of substantial gene flow over an extended period of time and demonstrates the value of dense sampling strategies and having DNA of known and detailed provenance, a practice that is generally rare when investigating sub-Saharan African demographic processes using genetic data.