
Ancestry Matters: Lack of Representation of Human Genetic Diversity in Genomic Databases
I was delighted to have been invited as one of the guest speakers for this series of talks, followed by an in-person discussion in Boston.
I was delighted to have been invited as one of the guest speakers for this series of talks, followed by an in-person discussion in Boston.
Listen here
Hosts, Andrew Marderstein and Lucia Hindorff, chat with Barbara Bitarello on her work, " Polygenic Scores for Height in Admixed Populations" and what led to her career path. Check out the written interview by visiting the ASHG website.
Link to the publication which the podcast is focusing on:
Hosts, Andrew Marderstein and Lucia Hindorff, chat with Bárbara Bitarello on her work, " Polygenic Scores for Height in Admixed Populations" and what led to her career path. Check out the written interview by visiting the ASHG website.
Read the paper here:
“Polygenic Scores for Height in Admixed Populations”
Listen here
Listen here.
Polygenic risk scores (PRS) rely on the genome-wide association studies (GWAS) to predict the phenotype based on the genotype. However, the prediction accuracy suffers when GWAS from one population are used to calculate PRS within a different population, which is a problem because the majority of the GWAS are done on cohorts of European ancestry.
In this episode, Bárbara Bitarello helps us understand how PRS work and why they don’t transfer well across populations.
Polygenic risk scores (PRS) summarize the results of GWAS into a single number that can predict quantitative phenotype or disease risk. One barrier to the use of PRS in clinical practice is that the majority of GWAS come from cohorts of European ancestry, and predictive power is lower in non-European ancestry cohorts. There are many possible reasons for this decrease; here we show that differences in allele frequencies, LD patterns, and phenotypic variance across ancestries are unlikely to be driving this pattern. We focus on PRS for height in cohorts with admixed African and European ancestry, which allows us to test for ancestry-related differences in PRS prediction while controlling for environment. We first show that that the predictive power of height PRS increases linearly with European ancestry (partial R2 ranges from 0.02-0.12 for 0-100% European ancestry). We replicate this pattern with effect sizes re-estimated within sibling pairs, ruling out residual population structure. This pattern persists when PRS is computed using subsets of SNPs in regions of both high and low LD and ancestry-related differences in effect size are not correlated with local recombination rate. This suggests that differences in LD are not a major driver of low transferability. Next, we show that frequency differences of associated variants between African and European ancestry backgrounds explain only up to 11% of the observed reduction in predictive power and that there is no association between ancestry and phenotypic variance, indicating that the reduction in PRS predictive power cannot be explained by causal variants that are specific to the African ancestry background. Finally, we see a modest improvement in prediction when using a multi-PRS approach that includes ancestry-specific effect sizes in the PRS. We conclude that the reduced predictive power in non-European ancestry populations is largely explained by differences in causal effect sizes across these ancestries.
Polygenic risk scores (PRS) can be used to summarize the results of genome-wide association studies (GWAS) into a single number representing the risk of disease. For some traits (for example, cardiovascular disease, breast cancer) PRS allows us to identify individuals with clinically actionable levels of risk in the tails of the PRS distribution. One barrier to the use of PRS in clinical practice is that the majority of GWAS come from cohorts of European ancestry, and predictive power is lower in non-European ancestry cohorts. There are many possible reasons for this decrease; here we investigate the performance of PRS in admixed cohorts to identify some of these reasons. We focus on the performance of PRS for height (a model polygenic trait) in cohorts with admixed African and European ancestry. Having multiple ancestry components in the same genome allows us to test for ancestry-related differences in PRS prediction while controlling for environmental differences. We first show that that the predictive power of height PRS increases linearly with European ancestry (partial R2ranges from 0.015-0.15 for 0-100% European ancestry). This effect is unaltered when we re-estimate effects-sizes using sibling pairs, ruling out residual population structure as an explanation. Second, we show that this pattern persists when PRS is computed using subsets of SNPs in regions of both high and low linkage disequilibrium (LD), indicating that differences in LD are not the only cause. Third, we show that frequency differences of associated variants between African and European ancestry backgrounds explain only up to 25% of the observed reduction in predictive power. Finally, we find that there is no association between ancestry and phenotypic variance, indicating that there is no relationship between ancestry and genetic variance, and that the reduction in PRS predictive power cannot be explained by causal variants that are specific to the African ancestry background. In conclusion, no single factor we investigated can explain the difference in predictive power across ancestries, hinting that other factors – for example heterogeneity in effect size – or a combination of multiple factors is responsible for this pattern. This study further highlights the need for more diversity in GWAS, as well as a better understanding of the complexities of variant discovery and portability across cohorts and ancestries."