Publications

Inferring Balancing Selection From Genome-Scale Data

Genome Biology and Evolution (2023)

Authors: Bárbara D Bitarello, Débora Y C Brandt, Diogo Meyer, Aida M Andrés

The identification of genomic regions and genes that have evolved under natural selection is a fundamental objective in the field of evolutionary genetics. While various approaches have been established for the detection of targets of positive selection, methods for identifying targets of balancing selection, a form of natural selection that preserves genetic and phenotypic diversity within populations, have yet to be fully developed. Despite this, balancing selection is increasingly acknowledged as a significant driver of diversity within populations, and the identification of its signatures in genomes is essential for understanding its role in evolution. In recent years, a plethora of sophisticated methods has been developed for the detection of patterns of linked variation produced by balancing selection, such as high levels of polymorphism, altered allele-frequency distributions, and polymorphism sharing across divergent populations. In this review, we provide a comprehensive overview of classicalnd contemporary methods, offer guidance on the choice of appropriate methods, and discuss the importance of avoiding artifacts and of considering alternative evolutionary processes. The increasing availability of genome-scale datasets holds the potential to assist in the identification of new targets and the quantification of the prevalence of balancing selection, thus enhancing our understanding of its role in natural populations.

Publication

Genome-wide analysis identifies genetic effects on reproductive success and ongoing natural selection at the FADS locus.

Nature Human Behavior (2023)

Authors: Iain Mathieson, Felix R Day, Nicola Barban, Felix C Tropf, David Brazel, eQTLgen Consortium, BIOS Consortium, Ahmad Vaez, Natalie van Zuydam, Bárbara D Bitarello, Eugene J Gardner, Evelina T Akimova, Ajuna Azad, Sven Bergmann, Lawrence F Bielak, Dorret I Boomsma, Kristina Bosak, Marco Brumat, Julie E Buring, David Cesarini, Daniel I Chasman, Jorge E Chavarro, Massimiliano Cocca, Maria Pina Concas, FinnGen Study, Lifelines Cohort Study, (...) John R B Perry

Identifying genetic determinants of reproductive success may highlight mechanisms underlying fertility and identify alleles under present-day selection. Using data in 785,604 individuals of European ancestry, we identified 43 genomic loci associated with either number of children ever born (NEB) or childlessness. These loci span diverse aspects of reproductive biology, including puberty timing, age at first birth, sex hormone regulation, endometriosis and age at menopause. Missense variants in ARHGAP27 were associated with higher NEB but shorter reproductive lifespan, suggesting a trade-off at this locus between reproductive ageing and intensity. Other genes implicated by coding variants include PIK3IP1, ZFP82 and LRP4, and our results suggest a new role for the melanocortin 1 receptor (MC1R) in reproductive biology. As NEB is one component of evolutionary fitness, our identified associations indicate loci under present-day natural selection. Integration with data from historical selection scans highlighted an allele in the FADS1/2 gene locus that has been under selection for thousands of years and remains so today. Collectively, our findings demonstrate that a broad range of biological mechanisms contribute to reproductive success.

Publication Preprint

Predicting skeletal stature using ancient DNA

American Journal of Physical Anthropology (2021)

Authors: Samantha L Cox, Hannah M. Moots, Jay T. Stock, Andrej Shbat, Bárbara D. Bitarello, Nicole Nicklisch,Kurt W. Alt, Wolfgang Haak, Eva Rosenstock, Christopher B. Ruff & Iain Mathieson

Ancient DNA provides an opportunity to separate the genetic and environmental bases of complex traits by allowing direct estimation of genetic values in ancient individuals. Here, we test whether genetic scores for height in ancient individuals are predictive of their actual height, as inferred from skeletal remains. We estimate the contributions of genetic and environmental variables to observed phenotypic variation as a first step towards quantifying individual sources of morphological variation.

Publication

Polygenic scores for height in admixed populations

G3: Genes, Genomes, Genetics (2020)

Authors: Bárbara D. Bitarello, Iain Mathieson

Polygenic risk scores (PRS) use the results of genome-wide association studies (GWAS) to predict quantitative phenotypes or disease risk at an individual level. This provides a potential route to the use of genetic data in personalized medical care. However, a major barrier to the use of PRS is that the majority of GWAS come from cohorts of European ancestry. The predictive power of PRS constructed from these studies is substantially lower in non-European ancestry cohorts, although the reasons for this are unclear. To address this question, we investigate the performance of PRS for height in cohorts with admixed African and European ancestry, allowing us to evaluate ancestry-related differences in PRS predictive accuracy while controlling for environment and cohort differences. We first show that that the predictive accuracy of height PRS increases linearly with European ancestry and is largely explained by European ancestry segments of the admixed genomes. We show that differences in allele frequencies, recombination rate, and marginal effect sizes across ancestries all contribute to the decrease in predictive power, but none of these effects explain the decrease on its own. Finally, we demonstrate that prediction for admixed individuals can be improved by using a linear combination of PRS that includes ancestry-specific effect sizes, although this approach is at present limited by the small size of non-European ancestry discovery cohorts.

Publication

Signatures of long-term balancing selection in human genomes

Genome Biology and Evolution (2018)

Authors: Bárbara Domingues Bitarello Cesare de Filippo, João Carlos Teixeira, Philip Kleinert, Joshua M Schmidt, Diogo Meyer, Aida M Andrés

Balancing selection maintains advantageous diversity in populations through various mechanisms. While extensively explored from a theoretical perspective, an empirical understanding of its prevalence and targets lags behind our knowledge of positive selection. Here we describe the Non-Central Deviation (NCD), a simple yet powerful statistic to detect long-term balancing selection (LTBS) which quantifies how close frequencies are to expectations under LTBS, and provides the basis for a neutrality test. NCD can be applied to single loci or genomic data, to populations with or without known demographic history, and can be implemented considering only polymorphisms (NCD1) or also considering fixed differences (NCD2). Both statistics have very high power to detect LTBS in humans under different frequencies of the balanced allele(s), with NCD2 having the highest power. Applied to genome-wide data from African and European human populations NCD2 shows that, albeit not prevalent, LTBS affects a sizable portion of the genome: about 0.6% of analyzed genomic windows and 0.8% of analyzed positions. These windows overlap about 8% of the protein-coding genes, which interestingly have larger number of transcripts than expected by chance. Significant windows contain 1.6% of the SNPs in the genome, which disproportionally overlap sites within exons and that alter protein sequence, but not putatively regulatory sites. Our catalog of candidates includes known targets of LTBS, but a majority of them are novel. As expected, immune-related genes are among those with the strongest signatures, although most candidates are involved in other biological functions, suggesting that LTBS potentially influences diverse human phenotypes.

Preprint

A genomic perspective on HLA evolution

Immunogenetics (2017)

Authors: D. Meyer, V.R. C. Aguiar, B.D. Bitarello, D.Y. C. Brandt, K. Nunes

Several decades of research have convincingly shown that classical human leukocyte antigen (HLA) loci bear signatures of natural selection. Despite this conclusion, many questions remain regarding the type of selective regime acting on these loci, the time frame at which selection acts, and the functional connections between genetic variability and natural selection. In this review, we argue that genomic datasets, in particular those generated by next-generation sequencing (NGS) at the population scale, are transforming our understanding of HLA evolution. We show that genomewide data can be used to perform robust and powerful tests for selection, capable of identifying both positive and balancing selection at HLA genes. Importantly, these tests have shown that natural selection can be identified at both recent and ancient timescales. We discuss how findings from genomewide association studies impact the evolutionary study of HLA genes, and how genomic data can be used to survey adaptive change involving interaction at multiple loci. We discuss the methodological developments which are necessary to correctly interpret genomic analyses involving the HLA region. These developments include adapting the NGS analysis framework so as to deal with the highly polymorphic HLA data, as well as developing tools and theory to search for signatures of selection, quantify differentiation, and measure admixture within the HLA region. Finally, we show that high throughput analysis of molecular phenotypes for HLA genes—namely transcription levels—is now a feasible approach and can add another dimension to the study of genetic variation.

Publication