Signatures of long-term balancing selection in human genomes
January 1, 2018
Abstract
Balancing selection maintains advantageous diversity in populations through various mechanisms. While extensively explored from a theoretical perspective, an empirical understanding of its prevalence and targets lags behind our knowledge of positive selection. Here we describe the Non-Central Deviation (NCD), a simple yet powerful statistic to detect long-term balancing selection (LTBS) which quantifies how close frequencies are to expectations under LTBS, and provides the basis for a neutrality test. NCD can be applied to single loci or genomic data, to populations with or without known demographic history, and can be implemented considering only polymorphisms (NCD1) or also considering fixed differences (NCD2). Both statistics have very high power to detect LTBS in humans under different frequencies of the balanced allele(s), with NCD2 having the highest power. Applied to genome-wide data from African and European human populations NCD2 shows that, albeit not prevalent, LTBS affects a sizable portion of the genome: about 0.6% of analyzed genomic windows and 0.8% of analyzed positions. These windows overlap about 8% of the protein-coding genes, which interestingly have larger number of transcripts than expected by chance. Significant windows contain 1.6% of the SNPs in the genome, which disproportionally overlap sites within exons and that alter protein sequence, but not putatively regulatory sites. Our catalog of candidates includes known targets of LTBS, but a majority of them are novel. As expected, immune-related genes are among those with the strongest signatures, although most candidates are involved in other biological functions, suggesting that LTBS potentially influences diverse human phenotypes.