PMC Articles

Polygenic prediction of major depressive disorder and related traits in African ancestries UK Biobank participants

PMCID: PMC11649553

PMID: 39014000


Abstract

Genome-Wide Association Studies (GWAS) over-represent European ancestries, neglecting all other ancestry groups and low-income nations. Consequently, polygenic risk scores (PRS) more accurately predict complex traits in Europeans than African Ancestries groups. Very few studies have looked at the transferability of European-derived PRS for behavioural and mental health phenotypes to Africans. We assessed the comparative accuracy of depression PRS trained on European and African Ancestries GWAS studies to predict major depressive disorder (MDD) and related traits in African ancestry participants from the UK Biobank. UK Biobank participants were selected based on Principal component analysis clustering with an African genetic similarity reference population, MDD was assessed with the Composite International Diagnostic Interview (CIDI). PRS were computed using PRSice2 software using either European or African Ancestries GWAS summary statistics. PRS trained on European ancestry samples (246,363 cases) predicted case control status in Africans of the UK Biobank with similar accuracies (R2 = 2%, β = 0.32, empirical p-value = 0.002) to PRS trained on far much smaller samples of African Ancestries participants from 23andMe, Inc. (5045 cases, R² = 1.8%, β = 0.28, empirical p-value = 0.008). This suggests that prediction of MDD status from Africans to Africans had greater efficiency relative to discovery sample size than prediction of MDD from Europeans to Africans. Prediction of MDD status in African UK Biobank participants using GWAS findings of likely causal risk factors from European ancestries was non-significant. GWAS of MDD in European ancestries are inefficient for improving polygenic prediction in African samples; urgent MDD studies in Africa are needed.


Full Text

Depressive disorders are ranked as the third leading cause of disability, as measured by years lived with disability, with Major Depressive Disorder (MDD) being the most significant contributor to this burden. The World Health Organization estimates that more than 322 million individuals globally suffer from MDD, with at least 9% of these cases occurring in Africa [1, 2]. While lower rates of MDD have been reported in Africa compared to Europe and North America, recent studies suggest that MDD is under-reported in Africa and that most affected individuals go undiagnosed [3, 4].
MDD has a heritability of 30–40% [5] and better characterisation of its genetic architecture may provide both an improved mechanistic understanding and more accurate genetic prediction. So far, genome wide association studies (GWAS) have successfully identified over 243 variants to be associated with depression, focussing on participants of European ancestry [6]. Sirugo et al. showed that GWAS studies overrepresent European compared to other ancestry groups, with an approximately fivefold over-representation compared to their global population [7–9]. The overrepresentation of Europeans in genetics research means that the potential benefits of these studies will disproportionately apply to people of European ancestry and deprive other ancestries and low-income countries of new treatments and diagnostics [10].
Due to overrepresentation of Europeans in GWAS, polygenic risk scores (PRS) developed from these studies more accurately predict many complex traits in European than in African Ancestries samples [11, 12]. The difference in prediction may be due to differences in the phenotypes themselves, their genetic architectures or because of gene-by-environment interactions [13–15]. Very few studies have looked at the transferability of European-derived PRS for behavioural and mental health phenotypes to non-Europeans generally and Africans specifically. Consequently, the predictive accuracy of European derived depression PRS to African samples remains uncertain. We looked at the transferability of MDD-PRS trained on European GWAS studies to African Ancestries participants from the UK Biobank within and across traits. Furthermore, we sought to compare the transferability of MDD-PRS trained in participants of African ancestries from 23andMe Inc., (mainly from North America), with the transferability of MDD-PRS trained on Europeans to the African-ancestry participants from the UK Biobank.
The study focused on African participants in the UK Biobank who have a shared genetic similarity with 1000 Genomes Project’s African reference samples. The UK Biobank is a prospective cohort study of individuals of diverse ethnic backgrounds from across the United Kingdom [16]. We used Principal component analysis (PCA) to identify these participants of African ancestral background within the UK Biobank. Participants were initially selected based on self-report, individuals who self-reported as being Black or Black British (Caribbean, African, Any other Black background), White and Black Caribbean or Black African, and participants whose self-identity was not specifically categorised (responses “Other ethnic group”, “Any other mixed background”, “Do not know”, or “Prefer not to answer’) were selected. Using the genotypes provided by UK Biobank, we derived ancestry informative genetic principal components using the weights from the 1000 Genomes reference dataset to cluster the participants into their genetic similarity groupings. UK Biobank participants who clustered closely with the 1000 Genomes African (AFR) reference group were then selected for further analysis.
An online Mental Health Questionnaire that included a depression assessment was sent to UK Biobank participants by email and entitled ‘The thoughts and feelings questionnaire’ [17]. The questionnaire was offered to the 317,785 participants, out of the total 502,616 UK Biobank participants, who had agreed to email contact, and 157,396 completed the online questionnaires by June 2018, of these 1090 participants were of African ancestry. A depression phenotype was generated based on the CIDI-SF (Composite International Diagnostic Interview Short Form) [18]. Cases were defined as those participants who had at least one core symptom of depression (persistent sadness or loss of interest) for most of the day or all of the day. Symptoms had to be present for a period of over two weeks plus another four non-core depressive symptoms that represent a change from usual occurring over the same timescale, with some or a lot of impairment. Cases that self-reported another mood disorder were excluded. Controls were defined as participants who did not meet symptom criteria for MDD [17, 19].
Polygenic risk scores were computed using PRSice-2 software [20]. PRSice2 software uses the clumping and thresholding method (C + T) to retain only SNPs that are weakly correlated with one another [20]. After clumping, SNPs with a p value larger than a specified level of significance were removed, PRS were then calculated by the sum of SNP allele effect sizes multiplied by the number of risk alleles. Both the base and target data sets were quality controlled (QC) by removing ambiguous and duplicate SNPs, SNPs with a minor allele frequency (MAF) of less than 1% and a genotype missingness greater than 2% were also removed. We report standardised effect sizes. Additionally, we provide empirical P values after specifying 10,000 permutations in PRSice-2.
To compute polygenic risk scores in African-clustered participants of the UK Biobank, we used GWAS summary statistics of depression from global European studies (246,363 cases and 561,190 controls), a predominantly African American study from 23andMe (5045 cases, 102 098 controls), a secondary dataset comprising summary statistics from a meta-analysis of 12 African cohorts (36,313 cases, 160,775 control) and data from several traits that have been known to be associated with depression from European-clustered studies, in addition to height, which we used as a negative control. All summary statistics used in this study have been shown in Table 1, for each set of summary statistics, SNP based heritability was calculated using linkage disequilibrium score regression as implemented in the LDSC software package [21, 22]. To calculate heritability with LDSC, we employed LD Scores derived from UK Biobank data. Specifically, we used European LD Scores for European datasets and African LD Scores for African datasets, both sourced from the UK Biobank.
We utilised principal component analysis (PCA) to project the genetic data of UK Biobank participants onto the PCA space defined by the reference 1000 Genomes dataset. This approach enabled the identification of individuals from the UK Biobank whose genetic profiles closely resemble those of the African ancestry samples within the 1000 Genomes dataset Fig. 1.
Depression GWAS results of African participants from 23andMe (5045 cases and 102 098 controls) were used to predict MDD status in African participants of the UK Biobank (see Fig. 2). The summary statistics from 23andMe African ancestry significantly predicted MDD status in UK Biobank African participants across all P-value thresholds, with the most predictive P-value threshold being 0.2, explaining 1.8% of variation in MDD liability. The prediction was associated with a beta coefficient of 0.28(SE = 0.08, empirical P-value = 0.008). This PRS prediction of MDD from African sample to African sample is comparable in accuracy with prediction of PRS trained on European ancestry samples of over 800 K individuals (246,363 cases and 561,190 controls).
We also used a secondary set of summary statistics from a meta-analysed data of 12 African cohorts with 36,313 depression cases and 160 775 controls to predict MDD status in African participants of the UK Biobank. 99.6% of the cases in this meta-analysed dataset are multiple African American studies and 0.4% participants are from South Africa. In contrast, PRS trained on the meta-analysed African American dataset did not significantly predict MDD status in Africans of the UK Biobank, as shown in Fig. 3.
European-based GWAS results for depression, BMI, Neuroticism, education attainment and height were used to predict within the same trait in UK Biobank African Ancestries participants. PRS trained on European GWAS results significantly predicted MDD, BMI, education attainment, and height within trait in African participants of the UK Biobank as illustrated in Fig. 4.
Specifically, the European based depression PRS explained a 2% variation in MDD risk among individuals of African Ancestries in the UK Biobank, with a beta coefficient of 0.32(SE = 0.09, empirical P-value = 0.002). The PRS associated with education attainment explained 0.7% of the variation in education attainment within the same cohort, this prediction had a beta coefficient equal 0.079 and a P-value = 0.0003 (Fig. 4). However, it is noteworthy that European-based Neuroticism PRS did not significantly predict Neuroticism in African Ancestries participants. Height PRS, a highly heritable trait used for comparison purposes, explained 3% of the variation in height among UK Biobank Africans.
Cross ancestry cross trait polygenic prediction of MDD in Africans of the UK Biobank using PRS estimated from European based GWAS summary statistics for traits known to be associated with MDD (namely: Bipolar Disorder, BMI, Schizophrenia, Neuroticism, and education attainment) did not show any significant association (see Fig. 5). Height summary statistics were used for comparison purposes.
In contrast to the findings made using 23andMe summary statistics, polygenic scores derived from a GWAS meta-analysis of several African ancestry studies within and outside of Africa, showed limited predictive ability for Major Depressive Disorder (MDD) in African Ancestries UK Biobank participants. The dataset combined 36,313 MDD cases and 160 775 controls, predominantly comprising African Americans, with a small representation (139 cases and 346 controls) from continental African populations (Drakenstein Child Health Study). Despite expectations of superior performance compared to the 23andMe dataset, various factors may have contributed to this underperformance. Firstly, while 23andMe used a single definition of MDD and a single genotyping quality-control pipeline, MDD phenotype definitions and methods varied across the included cohorts in the GWAS meta-analysis. Some studies employed stringent criteria while others used broader definitions. The inclusion of individuals with varying definitions of African Ancestries may also have increased genetic heterogeneity. The meta-analysed data primarily featured African Americans, who exhibit varying degrees of genetic admixture with other ancestral backgrounds, possibly influencing the accuracy of PRS predictions in UK Biobank. A recent study by Ding et al. in 2023 revealed that for highly polygenic traits, PRS predictive accuracy tends to diminish with increasing genetic distance between populations [23].
Several European ancestry studies have shown that various traits have a shared genetic liability with MDD, some of which may be causally associated but little is known about the shared genetic liability of MDD with other traits across ancestries [24]. We looked at cross ancestry cross trait prediction of MDD using height, BMI, bipolar disorder, schizophrenia, neuroticism, and education attainment in people of African Ancestries using European based GWAS results. Height was used as a highly heritable control trait with no known causal relationship with MDD. While our study did not yield successful predictions of MDD status in Africans of the UK Biobank using European GWAS results of various traits, it is worth noting that previous investigations conducted within European populations have demonstrated a shared genetic liability between MDD and traits such as Bipolar Disorder, BMI, and neuroticism [25–27]. To advance our understanding of shared genetic liability in African populations, future research endeavours could explore this aspect by training PRS using GWAS data derived specifically from African cohorts. This approach has the potential to uncover novel insights into the shared genetic components between MDD and other traits within the context of African ancestral backgrounds.


Sections

"[{\"pmc\": \"PMC11649553\", \"pmid\": \"39014000\", \"reference_ids\": [\"CR1\", \"CR2\", \"CR3\", \"CR4\"], \"section\": \"Introduction\", \"text\": \"Depressive disorders are ranked as the third leading cause of disability, as measured by years lived with disability, with Major Depressive Disorder (MDD) being the most significant contributor to this burden. The World Health Organization estimates that more than 322 million individuals globally suffer from MDD, with at least 9% of these cases occurring in Africa [1, 2]. While lower rates of MDD have been reported in Africa compared to Europe and North America, recent studies suggest that MDD is under-reported in Africa and that most affected individuals go undiagnosed [3, 4].\"}, {\"pmc\": \"PMC11649553\", \"pmid\": \"39014000\", \"reference_ids\": [\"CR5\", \"CR6\", \"CR7\", \"CR9\", \"CR10\"], \"section\": \"Introduction\", \"text\": \"MDD has a heritability of 30\\u201340% [5] and better characterisation of its genetic architecture may provide both an improved mechanistic understanding and more accurate genetic prediction. So far, genome wide association studies (GWAS) have successfully identified over 243 variants to be associated with depression, focussing on participants of European ancestry [6]. Sirugo et al. showed that GWAS studies overrepresent European compared to other ancestry groups, with an approximately fivefold over-representation compared to their global population [7\\u20139]. The overrepresentation of Europeans in genetics research means that the potential benefits of these studies will disproportionately apply to people of European ancestry and deprive other ancestries and low-income countries of new treatments and diagnostics [10].\"}, {\"pmc\": \"PMC11649553\", \"pmid\": \"39014000\", \"reference_ids\": [\"CR11\", \"CR12\", \"CR13\", \"CR15\"], \"section\": \"Introduction\", \"text\": \"Due to overrepresentation of Europeans in GWAS, polygenic risk scores (PRS) developed from these studies more accurately predict many complex traits in European than in African Ancestries samples [11, 12]. The difference in prediction may be due to differences in the phenotypes themselves, their genetic architectures or because of gene-by-environment interactions [13\\u201315]. Very few studies have looked at the transferability of European-derived PRS for behavioural and mental health phenotypes to non-Europeans generally and Africans specifically. Consequently, the predictive accuracy of European derived depression PRS to African samples remains uncertain. We looked at the transferability of MDD-PRS trained on European GWAS studies to African Ancestries participants from the UK Biobank within and across traits. Furthermore, we sought to compare the transferability of MDD-PRS trained in participants of African ancestries from 23andMe Inc., (mainly from North America), with the transferability of MDD-PRS trained on Europeans to the African-ancestry participants from the UK Biobank.\"}, {\"pmc\": \"PMC11649553\", \"pmid\": \"39014000\", \"reference_ids\": [\"CR16\"], \"section\": \"Samples\", \"text\": \"The study focused on African participants in the UK Biobank who have a shared genetic similarity with 1000 Genomes Project\\u2019s African reference samples. The UK Biobank is a prospective cohort study of individuals of diverse ethnic backgrounds from across the United Kingdom [16]. We used Principal component analysis (PCA) to identify these participants of African ancestral background within the UK Biobank. Participants were initially selected based on self-report, individuals who self-reported as being Black or Black British (Caribbean, African, Any other Black background), White and Black Caribbean or Black African, and participants whose self-identity was not specifically categorised (responses \\u201cOther ethnic group\\u201d, \\u201cAny other mixed background\\u201d, \\u201cDo not know\\u201d, or \\u201cPrefer not to answer\\u2019) were selected. Using the genotypes provided by UK Biobank, we derived ancestry informative genetic principal components using the weights from the 1000 Genomes reference dataset to cluster the participants into their genetic similarity groupings. UK Biobank participants who clustered closely with the 1000 Genomes African (AFR) reference group were then selected for further analysis.\"}, {\"pmc\": \"PMC11649553\", \"pmid\": \"39014000\", \"reference_ids\": [\"CR17\", \"CR18\", \"CR17\", \"CR19\"], \"section\": \"Samples\", \"text\": \"An online Mental Health Questionnaire that included a depression assessment was sent to UK Biobank participants by email and entitled \\u2018The thoughts and feelings questionnaire\\u2019 [17]. The questionnaire was offered to the 317,785 participants, out of the total 502,616 UK Biobank participants, who had agreed to email contact, and 157,396 completed the online questionnaires by June 2018, of these 1090 participants were of African ancestry. A depression phenotype was generated based on the CIDI-SF (Composite International Diagnostic Interview Short Form) [18]. Cases were defined as those participants who had at least one core symptom of depression (persistent sadness or loss of interest) for most of the day or all of the day. Symptoms had to be present for a period of over two weeks plus another four non-core depressive symptoms that represent a change from usual occurring over the same timescale, with some or a lot of impairment. Cases that self-reported another mood disorder were excluded. Controls were defined as participants who did not meet symptom criteria for MDD [17, 19].\"}, {\"pmc\": \"PMC11649553\", \"pmid\": \"39014000\", \"reference_ids\": [\"CR20\", \"CR20\"], \"section\": \"Polygenic risk scores\", \"text\": \"Polygenic risk scores were computed using PRSice-2 software [20]. PRSice2 software uses the clumping and thresholding method (C\\u2009+\\u2009T) to retain only SNPs that are weakly correlated with one another [20]. After clumping, SNPs with a p value larger than a specified level of significance were removed, PRS were then calculated by the sum of SNP allele effect sizes multiplied by the number of risk alleles. Both the base and target data sets were quality controlled (QC) by removing ambiguous and duplicate SNPs, SNPs with a minor allele frequency (MAF) of less than 1% and a genotype missingness greater than 2% were also removed. We report standardised effect sizes. Additionally, we provide empirical P values after specifying 10,000 permutations in PRSice-2.\"}, {\"pmc\": \"PMC11649553\", \"pmid\": \"39014000\", \"reference_ids\": [\"Tab1\", \"CR21\", \"CR22\"], \"section\": \"Summary statistics\", \"text\": \"To compute polygenic risk scores in African-clustered participants of the UK Biobank, we used GWAS summary statistics of depression from global European studies (246,363 cases and 561,190 controls), a predominantly African American study from 23andMe (5045 cases, 102 098 controls), a secondary dataset comprising summary statistics from a meta-analysis of 12 African cohorts (36,313 cases, 160,775 control) and data from several traits that have been known to be associated with depression from European-clustered studies, in addition to height, which we used as a negative control. All summary statistics used in this study have been shown in Table\\u00a01, for each set of summary statistics, SNP based heritability was calculated using linkage disequilibrium score regression as implemented in the LDSC software package [21, 22]. To calculate heritability with LDSC, we employed LD Scores derived from UK Biobank data. Specifically, we used European LD Scores for European datasets and African LD Scores for African datasets, both sourced from the UK Biobank.\"}, {\"pmc\": \"PMC11649553\", \"pmid\": \"39014000\", \"reference_ids\": [\"Fig1\"], \"section\": \"African-ancestries clustered participants in the UK Biobank\", \"text\": \"We utilised principal component analysis (PCA) to project the genetic data of UK Biobank participants onto the PCA space defined by the reference 1000 Genomes dataset. This approach enabled the identification of individuals from the UK Biobank whose genetic profiles closely resemble those of the African ancestry samples within the 1000 Genomes dataset Fig.\\u00a01.\"}, {\"pmc\": \"PMC11649553\", \"pmid\": \"39014000\", \"reference_ids\": [\"Fig2\"], \"section\": \"Within ancestry within trait polygenic prediction of MDD from African datasets to African participants in the UK Biobank\", \"text\": \"Depression GWAS results of African participants from 23andMe (5045 cases and 102 098 controls) were used to predict MDD status in African participants of the UK Biobank (see Fig.\\u00a02). The summary statistics from 23andMe African ancestry significantly predicted MDD status in UK Biobank African participants across all P-value thresholds, with the most predictive P-value threshold being 0.2, explaining 1.8% of variation in MDD liability. The prediction was associated with a beta coefficient of 0.28(SE\\u2009=\\u20090.08, empirical P-value\\u2009=\\u20090.008). This PRS prediction of MDD from African sample to African sample is comparable in accuracy with prediction of PRS trained on European ancestry samples of over 800\\u2009K individuals (246,363 cases and 561,190 controls).\"}, {\"pmc\": \"PMC11649553\", \"pmid\": \"39014000\", \"reference_ids\": [\"Fig3\"], \"section\": \"Within ancestry within trait polygenic prediction of MDD from African datasets to African participants in the UK Biobank\", \"text\": \"We also used a secondary set of summary statistics from a meta-analysed data of 12 African cohorts with 36,313 depression cases and 160 775 controls to predict MDD status in African participants of the UK Biobank. 99.6% of the cases in this meta-analysed dataset are multiple African American studies and 0.4% participants are from South Africa. In contrast, PRS trained on the meta-analysed African American dataset did not significantly predict MDD status in Africans of the UK Biobank, as shown in Fig.\\u00a03.\"}, {\"pmc\": \"PMC11649553\", \"pmid\": \"39014000\", \"reference_ids\": [\"Fig4\"], \"section\": \"Cross ancestry within trait prediction of MDD and other traits\", \"text\": \"European-based GWAS results for depression, BMI, Neuroticism, education attainment and height were used to predict within the same trait in UK Biobank African Ancestries participants. PRS trained on European GWAS results significantly predicted MDD, BMI, education attainment, and height within trait in African participants of the UK Biobank as illustrated in Fig.\\u00a04.\"}, {\"pmc\": \"PMC11649553\", \"pmid\": \"39014000\", \"reference_ids\": [\"Fig4\"], \"section\": \"Cross ancestry within trait prediction of MDD and other traits\", \"text\": \"Specifically, the European based depression PRS explained a 2% variation in MDD risk among individuals of African Ancestries in the UK Biobank, with a beta coefficient of 0.32(SE\\u2009=\\u20090.09, empirical P-value\\u2009=\\u20090.002). The PRS associated with education attainment explained 0.7% of the variation in education attainment within the same cohort, this prediction had a beta coefficient equal 0.079 and a P-value\\u2009=\\u20090.0003 (Fig.\\u00a04). However, it is noteworthy that European-based Neuroticism PRS did not significantly predict Neuroticism in African Ancestries participants. Height PRS, a highly heritable trait used for comparison purposes, explained 3% of the variation in height among UK Biobank Africans.\"}, {\"pmc\": \"PMC11649553\", \"pmid\": \"39014000\", \"reference_ids\": [\"Fig5\"], \"section\": \"Cross ancestry cross trait polygenic prediction of MDD\", \"text\": \"Cross ancestry cross trait polygenic prediction of MDD in Africans of the UK Biobank using PRS estimated from European based GWAS summary statistics for traits known to be associated with MDD (namely: Bipolar Disorder, BMI, Schizophrenia, Neuroticism, and education attainment) did not show any significant association (see Fig.\\u00a05). Height summary statistics were used for comparison purposes.\"}, {\"pmc\": \"PMC11649553\", \"pmid\": \"39014000\", \"reference_ids\": [\"CR23\"], \"section\": \"Discussion\", \"text\": \"In contrast to the findings made using 23andMe summary statistics, polygenic scores derived from a GWAS meta-analysis of several African ancestry studies within and outside of Africa, showed limited predictive ability for Major Depressive Disorder (MDD) in African Ancestries UK Biobank participants. The dataset combined 36,313 MDD cases and 160 775 controls, predominantly comprising African Americans, with a small representation (139 cases and 346 controls) from continental African populations (Drakenstein Child Health Study). Despite expectations of superior performance compared to the 23andMe dataset, various factors may have contributed to this underperformance. Firstly, while 23andMe used a single definition of MDD and a single genotyping quality-control pipeline, MDD phenotype definitions and methods varied across the included cohorts in the GWAS meta-analysis. Some studies employed stringent criteria while others used broader definitions. The inclusion of individuals with varying definitions of African Ancestries may also have increased genetic heterogeneity. The meta-analysed data primarily featured African Americans, who exhibit varying degrees of genetic admixture with other ancestral backgrounds, possibly influencing the accuracy of PRS predictions in UK Biobank. A recent study by Ding et al. in 2023 revealed that for highly polygenic traits, PRS predictive accuracy tends to diminish with increasing genetic distance between populations [23].\"}, {\"pmc\": \"PMC11649553\", \"pmid\": \"39014000\", \"reference_ids\": [\"CR24\", \"CR25\", \"CR27\"], \"section\": \"Discussion\", \"text\": \"Several European ancestry studies have shown that various traits have a shared genetic liability with MDD, some of which may be causally associated but little is known about the shared genetic liability of MDD with other traits across ancestries [24]. We looked at cross ancestry cross trait prediction of MDD using height, BMI, bipolar disorder, schizophrenia, neuroticism, and education attainment in people of African Ancestries using European based GWAS results. Height was used as a highly heritable control trait with no known causal relationship with MDD. While our study did not yield successful predictions of MDD status in Africans of the UK Biobank using European GWAS results of various traits, it is worth noting that previous investigations conducted within European populations have demonstrated a shared genetic liability between MDD and traits such as Bipolar Disorder, BMI, and neuroticism [25\\u201327]. To advance our understanding of shared genetic liability in African populations, future research endeavours could explore this aspect by training PRS using GWAS data derived specifically from African cohorts. This approach has the potential to uncover novel insights into the shared genetic components between MDD and other traits within the context of African ancestral backgrounds.\"}]"

Metadata

"{\"issue-copyright-statement\": \"\\u00a9 Springer Nature Limited 2025\"}"