PMC Articles

Integrated mitogenome and Y chromosome analysis untangles the complex origin of African pigs

PMCID: PMC12775879

PMID:


Abstract

Summary The genetic history of African indigenous pigs remains poorly documented due to scarce archaeological and genomic data. Here, we analyzed 473 mitogenomes and 202 Y chromosome sequences from indigenous pigs in Africa, alongside 901 published mitogenomes and 715 Y chromosome sequences from Eurasian pigs and wild boars. Our results reveal that African pigs predominantly descend from European (haplogroup E, 44.8%) and East Asian (haplogroup D, 53.3%) lineages. Interestingly, there was a novel detection of Asian wild boar haplogroup A∗ (1.9%) in Tanzania. This pattern is congruent with that of Y chromosome analysis. Further maternal analyses confirm a genetic link between western African and Iberian pigs dating to about 4.5 ka, and dispersal into eastern Africa coinciding with the Bantu expansion around 2 ka. Our findings demonstrate complex human-mediated dispersal routes, highlighting the role of Bantu societies in shaping the genetic architecture of African indigenous pigs. Graphical abstract Highlights • African indigenous pigs show signatures of both European and Asian pigs • Iberian-type pigs in Africa show a spatial correlation with the Bantu dispersal • Rare mitochondrial DNA sub-haplogroup A1∗ is identified in pigs from Tanzania Genomics; evolutionary ecology; evolutionary history


Full Text

Historically, indigenous pigs (Sus scrofa domesticus) were associated with the human populations of the Berber and ancient Egyptians (from Morocco to Egypt), the Sennar (further south along the Nile River up to Sudan), and populations from western-central Africa in Senegambia (Senegal and Gambia), along a west-east band ranging from Nigeria to Sudan (“West African Extension”), and along a north-south band ranging from southern Cameroon to Angola (“Angola Extension”)., Currently, due to cultural reasons, African indigenous pigs are popular with small holder traditional farmers in remote regions in Sub-Saharan Africa and are kept mainly for subsistence.,,, According to the FAO’s Domestic Animal Diversity Information System, Africa is home to 49 indigenous breeds, with 5% considered endangered and the status of 54% remaining uncertain (http://dad.fao.org). Unlike exotic breeds, African indigenous pigs are tolerant of feed supply irregularities, fibrous and tannin-rich diets, pathogens, and tropical eco-climatic conditions.,,, They are also highly appreciated by consumers for their marbled meat. Despite the cultural, economic, and historical importance of African indigenous pigs, their origins remain enigmatic., While Sus scrofa is native to areas of northern Africa, there is no archaeological or genetic evidence that these populations gave rise to indigenous African pig breeds. Additionally, the considerable geographic distance (more than 5000 km) between different breeds (such as that between the eastern and western portions of the continent: hereby referred to as western African indigenous pigs and eastern African indigenous pigs) would have likely complicated a single geographic origin scenario for these breeds.,,
The origins of pig management and husbandry can be traced to the 10th millennium BCE in the Near Eastern and eastern Asia.,, Archaeological findings in Egypt revealed pig remains dated to the end of the fifth millennium BCE. It has been speculated that, during this period, pigs likely spread from the Near East to the Nile Valley basin. This is partially supported by the finding of Near Eastern mtDNA haplotypes in pig specimens from Egypt’s museum. Ethnographic data from Sennar in Sudan and Sudan-Ethiopian border indicate that pig farming might have been established in the Nile Valley as early as the Middle Ages, or earlier. This hypothesis is, however, not supported by archaeological evidence, as early medieval sites like Soba in central Sudan have not yielded any pig remains. In western-central Africa, pig bones have been discovered at Nkile, Democratic Republic of the Congo, dated to the ninth century. However, it is still hypothesized that western African indigenous pigs, also known as dwarf pigs (Bakosi, Ashanti, and Elede), originate from Iberian pigs following the introduction of the latter to Africa by Portuguese sailors between the 15th and 19th centuries., This hypothesis has been supported to an extent by the fact that regions in Africa colonized by the Portuguese adopted lusophone terminologies. For instance, the lusophone word for pig, i.e., “porco,” has been integrated into numerous western Africa dialects. Linguistic studies have also reported the use of native pig names suggesting earlier pig establishment in Senegambia, and the West African and Angola extension that may be pre-Portuguese.,
Over the last 20 years, the two uniparentally inherited genetic marker systems, mitochondrial DNA and Y chromosome, have been widely used to resolve genetic origins, prehistorical range expansions, and demographic processes, in humans, domestic, and wild animals., Their non-recombinant nature makes it possible for phylogeographic signatures to be inferred as reported for various livestock species.,,,,,,, For instance, Wu et al. categorized European and Asian pig mtDNA into three major clades: E∗, representing Europe; A∗, Asian wild boar; and D∗, encompassing both eastern Asia wild boars and domestic pigs. These haplogroups were supported by the development of DomeTree (http://dometree.kiz.ac.cn) and MitoToolPy. Similarly, Ramirez et al. identified three distinct Y chromosome haplogroups (HY1, HY2, and HY3) in global pig populations. HY1 and HY2 are present in European and eastern Asia domestic pigs and in wild boars, while HY3 is unique to eastern Asia populations. The presence of HY1 and HY2 in eastern Asia pig populations has been postulated to have resulted from sex-biased gene flow between European male and eastern Asia female pigs around 200 years ago.
In African pig populations, genetic analysis revealed the presence of both mtDNA, and Y chromosome markers typically associated with European and Asian pigs.,, The Y chromosome markers are categorized into three distinct haplogroups, highlighting clear genetic clades. However, the mtDNA markers in African pig populations are yet to be standardized into their respective (sub) haplogroups. Nevertheless, phylogeographic analysis of both genetic markers shows that western African pigs share genetic affinity with European pigs, whereas eastern African pigs exhibit genetic similarities to Asian pigs.,, But because of the small sample sizes from Africa and the limited resolution of phylogeny based on partial mtDNA cytochrome b, D loop sequences, and Y chromosome SNPs of pigs identified in previous studies,, a detailed illustration of the origin and dispersal of domestic pigs in Africa has not yet been developed. Here, we analyzed 1,374 near-complete mitochondrial genomes and 917 sequences of the male-specific Y chromosome (MSY) genes, the DEAD-Box Helicase 3 Y-Linked (DDX3Y), Amelogenin Y-Linked (AMELY), and ubiquitously transcribed tetratricopeptide repeat containing Y-linked (UTY),,, to establish a more precise geographic and temporal framework of the evolutionary history and dynamics of modern African indigenous pigs.
All three major mtDNA haplogroups A∗, D∗, and E∗, observed in Eurasia, are present in African pigs. Haplogroups E∗ and D∗ are the most abundant, whereas haplogroup A∗ is restricted to eastern Africa, particularly Tanzania, but at a low frequency (Figures 1A, S1, S2, and Data S1A). The maximum likelihood (ML) phylogenetic tree (Figure 1B) nested African pigs within the European and eastern Asian clades, with nine Tanzanian samples representing African haplogroup A∗ positioned at branches just basally to eastern Asian wild boars. For the MSY concatenated gene sequences, the phylogenetic tree (Figures 1C and 1D) also nested African pigs within European and eastern Asian clades. However, the majority (197/202, 98%) of African pig samples are positioned within the European clade, while only about 2% (5/202) are assigned to the eastern Asian clade. The PCA plot for the entire mtDNA dataset (Figure 2A) groups most of the wild boars and domestic pigs according to their geographic origin, except for northern African (Morocco and Tunisia) wild boars, which are grouped with European wild boars. African domestic pigs are grouped within the European and eastern Asian clusters. In the PCA plot of the European cluster (Figure 2ai), local sub-structuring is evident for African pigs, and we designated it as cluster III. The eastern Asian cluster (Figure 2aii) is further divided into two, reflecting the two major clades corresponding to haplogroups A∗ and D∗. Notably, haplogroup A∗ mainly comprises eastern Asian wild boars and nine Tanzanian pigs, whereas haplogroup D∗ includes the remaining African pigs initially assigned to the eastern Asian clade on the ML tree. The PCA results of the entire MSY gene sequences (Figure 2B) reveal a similar structure to the ML tree, with the eastern Asian cluster (Figure 2bi), separating African pigs into those closer to northeastern Asian specimens (i.e., those from Kenya, Uganda, and Angola) and southeastern Asian pigs and wild boar clusters (which included two Nigerian pig samples). The expanded European PCA (Figure 2bii) groups the majority of African pigs with Duroc (D), Yorkshire (Y), and Landrace (L) breeds (referred to as “European commercial”), Mangalitza (M) and Pietrain (P) breeds (referred to as “European natives”), and European wild boars (referred to as “wild”). To further investigate the phylogenetic position of African indigenous pigs in respect to the Near Eastern individuals, we obtained CytB sequences of Near Eastern pigs from Ramirez et al. and analyzed those alongside ours, which were extracted from the mitogenome sequences. The resulting PCA and minimum joining network showed no substantial difference in the clustering pattern revealed by the initial phylogenetic analysis of the mitogenomes for African indigenous pigs (Figures S3 and S4).
The haplotype networks, shown in Figures 3A and 3C, are organized according to their respective haplogroups (Data S1A). Haplogroup E∗ (Figure 3A) comprises sub-haplogroups E1∗, E1a∗, E1a1a1∗, E1a1a1a∗, E1a1b1∗, E1a1b2∗, and E2∗, which represents African individuals (Figures S1 and S2). The majority of haplotypes fall under sub-haplogroups E1a1a1∗ and E2∗, with E2∗ showing a clear correlation with the local sub-structure denoted as cluster III in the PCA (Figure 2ai). Haplotypes within sub-haplogroup E1a1a1∗ are predominantly found in individuals from Nigeria, Benin, Gambia, and Tanzania, exhibiting close genetic ties to Duroc breed and Italian wild boar haplotypes (Data S1A). On the other hand, sub-haplogroup E2∗ includes haplotypes from individuals in Nigeria, Cameroon, Gambia, Benin, Angola, and Uganda, closely related to Iberian and Yucatan pigs. These haplotypes are exclusive to African pigs (Figure 3A).
Within haplogroup D∗ (Figure 3B), African haplotypes are most frequent in sub-haplogroups D1a1∗, D1e∗, D1h∗, D3a∗, D3a1∗, and D4a∗, while sub-haplogroups D1∗, D1a3∗, and D2∗ show the lowest frequency of African haplotypes. Haplotypes shared between African and non-African pigs are found in sub-haplogroups D1∗, D1a1∗, D1a3∗, D3a1∗, D1e∗, and D1h∗. Notably, a monophyletic group of eastern African pigs is observed within sub-haplogroup D3a∗, with the most common haplotype being shared among pigs from Kenya, Uganda, and Tanzania. This group’s connection to a median vector suggests the absence of unsampled, or potentially extinct haplotypes within African pigs. Tanzanian individuals are particularly prominent within haplogroup D∗, with higher representation compared to other African pigs (as observed in D1∗, D1a1∗, D1e∗, D3a∗, D3a1∗, and D4a∗). For western African pig representatives, sub-haplogroup D3a1∗ is exclusive to Cameroonian and Gambian indigenous pigs, while Benin pigs are found in D1h∗ and D1a1∗. Sub-haplogroup D1e∗ has the highest frequency of Nigerian individuals, whereas D1∗, which also includes haplotypes from Uganda, Tanzania, and Europe, along with D1a3∗, D2∗, and D3a∗, mainly consists of singletons.
In haplogroup A∗ (Figure 3C), haplotypes are exclusive to Tanzanian pigs, predominantly within sub-haplogroup A1∗, which is likely the most ancient within this group. This basal position is congruent with the eastern Asian clade observed in the ML tree (Figure 1D). Sub-haplogroup A1a∗, which includes eastern Asian wild boars, clusters close to A1 despite significant mutational differences. Consistent with the ML tree (Figure 1D) and PCA results (Figures 2B and 2bii), the MSY haplotype network (Figure 3D) groups’ African haplotypes together with European and eastern Asian ones. All three European haplotypes (commercial, native, and wild) are present in African pigs, with the commercial haplotypes occurring at a higher frequency. A few individual African pigs (5/202, 2%) share haplotypes with eastern Asian pigs.
The earliest split was estimated to be approximately 9.7–5.2 thousand years (ka) for the divergence of E1∗ (which is exclusive to European and northern African wild boars) and E2∗ (Iberian type pigs) from haplogroup E∗ (Figure 4A), whereas the split between the Iberian pig populations and African indigenous pigs occurred approximately 5.2–2.6 ka. The average median divergence time between E2∗ and E1∗ was approximately 7.3 ka and falls within the reported time frame of pig entry into Europe from Anatolia., By around 5 ka, the European pig genome had lost nearly all traces of the Near Eastern lineage. At this point, 96% of the genetic variants found in European pigs were derived from European wild boar., African indigenous pigs of sub-haplogroup E2∗ are estimated to have diverged from the Iberian and Yucatan clade around 4.5 ka (Figure 4A). For A1 (Tanzanian pigs) and A1a (Chinese wild boars), the median divergence time occurred at approximately 28 ka.
Population dynamics of African pigs were inferred from Bayesian skyline plots (BSP) for sub-haplogroups E2∗ (Figure 4B), E1a1a1∗ (Figure S8), and D3a∗ (Figure S9) mitogenomes. It depicts changes in effective population sizes (N) over time (Figures 4B, S8, and S9). The E2∗ BSP (Figure 4B) reveals a stable N up to around 500 years ago, after which the N increases rapidly but seems to stabilize again in recent times. For the E1a1a1∗ African pigs (Figure S8), they exhibited a prolonged stable N beginning around 3.5 ka prior to which they had experienced a gradual increase. The stability in N persisted until around 500 years ago when a decline commenced subsequently followed by a rapid expansion from around 100 years ago continuing to date. The BSP for D3a∗ pigs show a different pattern to that of E2∗ and E1a1a1∗ (Figure S9). There was a stable N up to around 6 ka, followed by a very drastic increase until approximately 4 ka. The population stabilizes for a short period before a decline until around 1 ka from which time it starts to increase once again.
Using ABC strategy, we constructed the most plausible scenario for the colonization of Africa by sub-haplogroup E2∗ based on 14,979 bps of the mitogenome (Figures 4C and 4D). The scenario with the highest posterior probability (0.5006; 95% CI 0.4908–0.5104) identifies pigs from the Iberian Peninsula as the source population. The founding population for African pigs diverged from their Iberian counterparts around 5.25 ka prior to dispersing into western Africa. Eastern African pigs diverged from western Africa ones around 1.98 ka and dispersed to their current location likely through a terrestrial route (Figures 4C and 4D). Many parameter estimates show high values (>0.2) (Data S2) of relative median of the absolute error (RMAE), which highlights potential concerns regarding the reliability of this scenario, as has been suggested by Kamalakkannan et al. However, since our primary objective was to investigate the origin of African pig mtDNA subhaplogroup E2 and their dispersal trajectories across Africa, the high RMAE values did not compromise the evaluation of the scenarios we proposed and tested, an observation also acknowledged in the study by Kamalakkannan et al.
Despite the wide variety of local names across the continent, African indigenous pigs are generally considered a single breed with strong ties to Iberian pigs.,, These pigs go by several regional names, such as the West African Dwarf pig in Nigeria, Ashanti Dwarf pig in Ghana, “Bush pig” in Togo, Mukota pig in Zimbabwe, Kolbroek in South Africa, Somo in Mali, Olongulu in Angola, and Busia pigs in Kenya., Our results from both mtDNA and MSY markers of African pigs demonstrate genetic affinities with both European and eastern Asian haplotypes (Figures 1A–1D). Interestingly, this pattern mirrors the phylogeographic structure first described by Ramirez et al. where most pigs from western Africa cluster with the European clade, while those from eastern Africa align with the east Asian clade in the phylogenetic tree.
The genogeographic distribution of mtDNA sub-haplogroup E2∗ pigs, found across the Iberian Peninsula, western Africa, and Uganda (Figures 5, S1, and S2), offers a compelling model for tracing the movement of Iberian pigs (and by extension humans) across Africa. The phylogenetically based origin of the African E2∗ pig clade, which dates back 4.5 ka (Figure 4B), coincides with archaeological evidence of pig domestication by Neolithic populations in Tangier, Morocco. In line with this, ABC model estimates suggest that Iberian pigs first entered northwestern Africa around 5.25 ka (Figures 4D and Data S2), marking the initial phase of their spread across the continent. The timing of Iberian pigs’ entry into Africa corresponds more closely with the arrival of the Neolithic cultural package in northern Africa and the Iberian Peninsula around 7 to 5 ka., This connection is further supported by studies from Linstädter et al., Zilhão, and Martínez-Sánchez et al., which highlight shared artifacts between northwestern Africa and Iberia Peninsula. Moreover, genetic evidence of introgression between European and African goat populations in Italy and Spain,, further reinforces the long-standing historical genetic exchanges between northwestern Africa and Iberia Peninsula. Pig farming has been an integral part of northern African culture since prehistoric times, particularly among the Berber people, and continued until the advent of Islam., It is likely that the Berber people introduced pig farming into western sub-Saharan Africa, an idea supported by genetic links between the Fulani people, whose presumed homeland lies in the Gambia and Senegal, and Berber populations from Morocco. These genetic connections are thought to date back between 1,190 and 670 years ago, with a confidence interval ranging from 2,090 BCE to CE 130.
The spread of sub-haplogroup E2 (Iberian-type pigs) from western to eastern Africa, aligns with the domestication of African rice, Guinea fowl, and the great Bantu dispersal. In contrast, the presence of sub-haplogroup A1 in eastern Africa, linked to eastern Asian wild boars, indicates earlier connections between eastern Asia and coastal eastern Africa through Indian Ocean trade, see. The distribution of pigs aligns with ancient African civilizations, supporting the idea that pigs are indicative of a settled farming lifestyle. All silhouettes are sourced from https://www.phylopic.org/.
The expansion of Bantu-speaking populations stands as one of the most transformative demographic events in late Holocene Africa, significantly reshaping the continent’s linguistic, cultural, and biological landscapes. This movement, which is estimated to have occurred between 6 and 4 ka, began in western Africa and spread gradually through the Congo rainforest, eventually reaching eastern and southern Africa in a serial-founder fashion. While our study could not sample individuals from the Congo, the Ugandan pig samples we examined carry the E2∗ sub-haplogroup, which is prevalent in western Africa (Figure S2). ABC model estimates suggest that pigs with the E2∗ sub-haplogroup first dispersed from western Africa into eastern Africa around 1.98 ka (Figure 4D and Data S2). This dispersal coincides spatially—though not necessarily causally—with broader population movements and ecological changes in western and central Africa. Despite the uncertainty surrounding the exact timing of animal introductions due to the high RMAE value, the broader pattern of Bantu-speaking population dispersals—and the associated movement of domesticated species such as guinea fowl, pearl millet, African rice, and yam—may provide a geographic framework for understanding how pigs could have been incorporated into eastward translocations. The presence of pigs in this context is plausible, particularly given historical accounts of feral pigs in regions like Chad during the 19th century. While documentation of pig introduction during the early period of European contact remains scarce, it is widely accepted that the Portuguese played a key role in this process, particularly along Africa’s coastal regions. This is reflected in the widespread adoption of the Portuguese term porco for pigs in several local languages., Therefore, the notable increase in the Ne of the collective E2∗ sub-haplogroup in African pigs around 500 years ago in our study (Figure 4B), likely coincides with the onset of Portuguese colonization in Africa during the 16th and 17th centuries AD. It is plausible that Portuguese colonists encouraged local Bantu farmers to expand pig production, following the population bottleneck of pigs initially caused by the spread of Islam.
The high frequency of haplogroup D∗ haplotypes (Figures S1 and S2) suggests that the majority of pigs associated with Tanzania—and more broadly, with coastal eastern Africa—likely originated from a source distinct from those in surrounding inland regions. In this context, the genetic connection between eastern Africa and Asia becomes particularly significant. Several Far-Eastern mitochondrial signatures, such as those embedded in D∗ sub-haplogroups (as defined by MitoToolPy)—including D1a1∗, D1b∗, D1e∗, D1h∗, D3a∗, and D4∗—have been detected not only in eastern African pigs from Uganda and Kenya but also in pig populations across Europe and South Asia.,,,,, This shared genetic footprint raises the possibility of at least two plausible routes of introduction into eastern Africa: (i) indirect introgression from improved British breeds brought during the colonial period, which had been extensively hybridized with Chinese sows in the 18th and 19th centuries to select for pigs with earlier reproductive maturity and increased fatness and (ii) direct introduction to the East African coast with India playing a critical role as a transit point, facilitating the movement of pigs together with other domesticates from eastern and southeastern Asia to Africa, likely during the peak of the Indian Ocean trade.,,, Supporting the complexity of these historical processes, the high haplotype diversity coupled with low nucleotide diversity observed across all populations (Data S6) provides insight into demographic dynamics. Such a pattern typically suggests population expansion following a bottleneck or a period of low effective population size. At the same time, this pattern may also reflect the introduction of multiple, genetically distinct populations, consistent with both colonial and pre-colonial trade-driven introductions. Colonial-era introductions, particularly from Europe, appear to have had a more pronounced influence in western Africa. This is supported by both geographic proximity to Europe and a greater degree of haplogroup sharing with European pigs (Figure 3B). In contrast, coastal eastern Africa seems to have experienced a more complex admixture of sources, with contributions potentially spanning both European colonialism and earlier Indian Ocean trade influences. Further evidence of these dynamics is seen in the substantial increase in Ne observed in sub-haplogroup D3a∗, which begins around 4ka (Figure S9) and encompasses all three eastern African pig populations (Uganda, Tanzania, and Kenya). Notably, this timeline aligns with archaeological evidence for the arrival of southeastern Asian domestic chickens at Zanzibar during the late fourth millennium suggesting contemporaneous movements of other domesticates. Adding to the genetic complexity, we also report the first identification of the rare mitochondrial DNA sub-haplogroup A1∗ in pigs from eastern Africa, specifically in Tanzania (Figures 3C, S1, and S2). The deep divergence time of this haplotype (Figure S17), its basal phylogenetic position (Figure 1B), and its close clustering with Chinese wild boars (Figures S4, S5, and S7) all point to a potential introgression event involving wild pig populations. Given that A1∗ has historically been confined to Asia, its presence in eastern Africa likely reflects gene flow through domesticated pigs that carried introgressed East Asian wild boar genetic material or, it may suggest a direct introduction of East Asian wild boars, which we speculate may have been brought to the region by Portuguese colonists during their presence along the coastal stretches of eastern Africa. However, this remains a hypothesis, as there is currently no direct historical or archaeological evidence to confirm it. We propose this idea as a direction for future research.
While we observed the presence of all major European (commercial, native, and wild) and eastern Asian (northern and southern Chinese) MSY haplotypes in the studied African pig populations—consistent with the mtDNA dataset (Figure 3bii and 4D)—European haplotypes were found at a significantly higher frequency than their eastern Asian counterparts. This contrasts with the mtDNA data, where eastern Asian alleles are especially prevalent, particularly in East Africa. One possible explanation for this discrepancy is that eastern Asian MSY haplotypes may have been replaced over time by European ones due to later waves of introduction. For example, Ramirez et al. reported a high frequency of the Asia-specific HY3 haplotype in Kenyan and Zimbabwean Mukota pigs (35% and 100%, respectively). However, a subsequent study by Noce et al. found no trace of HY3 in eastern African pigs, supporting the hypothesis that European male lineages have gradually supplanted earlier Asian ones. This pattern aligns with historical records of large-scale European pig imports into Africa, driven by colonial agricultural development and undocumented subsistence-level exchanges. This pattern is mirrored in our mtDNA results, which show a high frequency of the sub-haplogroup E1a1a1∗ (Figures 3A and S2), linked to the Duroc breed—an exotic European pig developed in Wisconsin from diverse lineages such as Berkshire, Iberian, Tamworth, and Red Guinea Hog., Crossbreeding between African indigenous pigs and imported European breeds is not uncommon, especially in remote rural areas across the continent., For instance, in eastern Africa—such as Uganda—the local commercial pig industry has largely transitioned to exotic breeds like Camborough, Landrace, and Large White, often resulting in widespread hybridization. This trend likely reflects both a recognition of the superior productivity traits associated with exotic breeds and the effects of loose breeding management practices. Accidental interbreeding may be especially common in regions with active or historical restocking programs. The persistence of mtDNA haplogroup D in African pig populations compared to HY3 may therefore be due to (i) European crossbreeds that trace their maternal lineages to Chinese breeds and (ii) the fact that very few farmers own boars making pigs carrying HY3 alleles particularly susceptible to genetic bottlenecks. Interestingly, we also detected MSY genetic signatures of European wild boars in several western African pig populations. Historical accounts confirm that European wild boars were introduced to northern, western, and southern Africa during the colonial era, often for use in hunting expeditions, providing a plausible explanation for their genetic imprint in these regions.
In conclusion, this study sheds light on the genetic diversity and population structure of African indigenous pig breeds, highlighting their importance not only as cultural and economic assets but also as critical reservoirs of genetic resources. Importantly, this research supports the formulation of evidence-based livestock policies in Africa that promote the protection and utilization of indigenous breeds as highlighted by the African Union – Inter-African Bureau of Animal Resources. By aligning conservation efforts with international and regional agricultural development goals, policymakers can enhance food security, rural livelihoods, and climate resilience across the continent. It is expected that a more detailed depiction of the evolutionary history of African indigenous pigs could be achieved by analyzing high-resolution Y chromosomal markers, whole-genome variations, structural variation, and ancient DNA., Integrating those molecular data with a thorough revision of the morphology of modern and archaeological specimens would further allow to cross validate timings of divergence and dispersals (e.g.,,). Modern morphometric methods have notably proved useful tools to quantify morphological variations in domesticated animals and their closely related wild taxa, and notably in suids (e.g.,,,). We recommend that future research focuses on identifying and characterizing the mechanisms of adaptive evolution in African indigenous pigs, particularly those conferring resistance to endemic diseases in Africa, to further inform strategies for African indigenous pig conservation, improved survivability, and enhanced meat production.
This paper analyzes existing, publicly available data. The accessions for this data can be found in Data S1A and S1B.
The Y chromosome variation data reported in this paper (Data S1B) have been deposited in the Genome Variation Map (GVM) in National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, under accession number GVM001001. The assembled mitogenomes from this study (Data S1A) have been submitted to GenBank under the accession ID PQ388328-PQ389421.
No other custom code/software was used for data analysis in the study. The publicly available software and algorithms used in the present study are listed in the key resources table.
We appreciate all the volunteers who contributed to this project. We thank the support from the Animal Branch of the Germplasm Bank of Wild Species; (the Large Research Infrastructure Funding); and West Africa Livestock Innovation Centre, The Gambia. L.A.O is supported by ANSO scholarship for young talent. A.C.A. is supported by the Yunnan Revitalization Talent Support Program: High-end Foreign Expert Project. A.S. is supported by the French government in the framework of the University of Bordeaux’s IdEx “Investments for the Future” program/GPR “Human Past”. This paper contributes to the results framework of CGIAR’s SAPLING Initiative. ICARDA wishes to acknowledge the China government’s contribution to its activities and support from donors to the Trust Fund. This work was supported by grants from the (SAJC202402 to M.-S.P.); the National Foreign Expert Project (H20240773 to A.C.A.); and the Chinese Academy of Sciences President’s International Fellowship Initiative, Special Expert (2024FSB0002 to A.C.A.).
In this study, we de novo assembled a total of 463 near-complete African pig mitogenomes (79 from Benin, 58 from Cameroon, 61 from The Gambia, 105 from Nigeria, 124 from Tanzania, and 36 from Uganda), from whole genome paired-end illumina resequencing reads (Data S1A). The sample collection represented pigs managed under traditional scavenging systems by rural smallholder farmers in sub-Saharan Africa with no known history of crossbreeding with commercial breeds (e.g.,,), To minimize relatedness, efforts were made—through farmer questionnaires—to avoid sampling related animals up to the third generation. These were analyzed alongside mitogenomes of domestic pigs and wild boar populations from Europe (n = 212), Asia (n = 666), northern Africa (n = 5) and sub-Saharan African pigs from Angola (n = 3) and Kenya (n = 7) comprising data that are newly sequenced as well as derived from IAnimal, GenBank, and Sequence Read Archive (SRA) repository (Figure S10 and Data S1A). We also generated sequences of three MSY genes (DDX3Y, AMELY, and UTY) of male pigs from Uganda (n = 11), Tanzania (n = 44), Benin (n = 24), The Gambia (n = 47), Nigeria (n = 46), and Cameroon (n = 21) and analyzed them together with male pigs data from Europe (n = 452), eastern Asia (n = 263), additional Nigeria (n = 5), Kenya (n = 2) and Angola (n = 2), comprising individuals derived from IAnimal, SRA and newly sequenced data (Figure S10 and Data S1B and S1C).
Raw resequencing reads were trimmed of adapters and low-quality bases (<10) using fastp (v0.23.0). Trimmed reads were retained for assembly if they were ≥150 base pairs (bps) in length with a phred score ≥20, respectively. Reads were then aligned to the S. scrofa mitochondrial reference genome (GenBank number: NC_000845.1) using default parameters with BWA-MEM (v0.7.17). Sequence Alignment Map (SAM) files were converted to Binary Alignment Map (BAM) files, sorted, indexed, and duplicates identified using samtools (v1.3.1), picard (http://broadinstitute.github.io/picard/), and GATK (v4.1.4.1). The list of successfully mapped reads was retrieved by invoking other tools using samtools (v1.3.1), and consequently used to extract mapped reads from the original fastq files using seqtk (https://github.com/lh3/seqtk/). The extracted mitogenome reads were assembled into contigs using megahit (v1.2.9) and then aligned against the mitochondrial reference genome using Aliview (v1.28). The final dataset comprised 1,374 (1,333 newly assembled and 41 from GenBank) mitogenomes spanning a total length of 14,979 bps after excluding gaps and ambiguous bases (Data S1A).
Using GATK HaplotypeCaller in the GVCF mode and the S. scrofa 11.1 reference genome assembly (https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000003025.6/), a total of 326,302 high-quality Y chromosomal variants were generated. The detected variants were filtered by the “VariantFiltration” tool with parameters of “QD < 2.0, FS > 60.0, MQ < 40.0, MQRankSum < −8.0”. To minimize false positives in the called variants, the average depth of sex chromosomes and autosomes was calculated individually using samtools (v.1.3.1). We utilized the differences in depth of coverage of sex chromosomes between female and male individuals to determine the sexes of our African pig samples. We further determined the sexes of all WGS data and validated this using publicly available sample sex information (Figure S11 and Data S1B and S1C). Plink (v1.9) was used to filter MSY chromosomal variants with the following criteria: (i) removed female individuals and obtained 216,440 Y chromosome SNPs (Figure S12); (ii) retained SNP sites with missing genotype rate of <5%; (iii) removed SNP sites with minor allele frequencies <0.001; and (iv) retained only hemizygous sites in the kept male samples by removing all heterozygous sites (Figure S12). Lastly, similarly to Escouflaire and Capitan, we used beagle to impute missing genotypes, and the three MSY genes were extracted and concatenated using vcftools (v0.1.12b) and BCFtools, and then converted into fasta format using vcf2fasta (https://github.com/santiagosnchez/vcf2fasta). The final dataset comprised of 917 male specific concatenated gene sequences with total lengths of 462 SNPs.
The online MAFFT (https://mafft.cbrc.jp/alignment/server/index.html) was used to perform multiple sequence alignments for the newly assembled and NCBI mitogenomes, and MSY sequences. MitoToolPy–seq.py was then used to classify mtDNA haplogroups and their geographic distribution was visualized using ArcMap (v10.7.1). To assess population stratification based on mitogenomes and MSY-chromosome gene sequences, we performed principal component analysis (PCA) using Tassel (v5). The PCA plot was visualized using ggplot2 package in R.
We constructed a maximum likelihood (ML) tree using RaxML (v8) with the GTRCAT model and 1,000 bootstrap replicates. This model has been found to be efficient and accurate enough with faster computation times for phylogenetic analysis (https://evomics.org/learning/phylogenetics/raxml/). African warthogs (Phacochoerus africanus), pygmy hogs (Porcula salvania), and Malaysian bearded pig (S. barbatus) were used as outgroups. The ML tree was visualized with iTOL (v5). To generate the haplotype network, we used FastHaN, and visualized it using tcsBU (tcsBU – TCS Beautifier (up.pt)).
Haplotype diversity, nucleotide diversity and number of haplotypes for haplogroup D mitogenome sequences were calculated using Arlequin (v3.5). Arlequin was also used to perform Fu’s Fs and Tajima’s D neutrality tests to detect evidence of recent population expansion. The significance of the deviations from neutrality were assessed by 1000 coalescent simulations.
To infer demographic dynamics of African specific pig mitogenomes and which ones clustered in sub-haplogroups E2∗, E1a1a1∗ and D3a∗ of the global haplotype networks, we implemented three separate analyses with the Bayesian Skyline plot (BSP) using the random starting tree model in BEAST (v2.0). The recently published pig mitogenome mutation rate of 1.2612 x10−7 was used to calibrate the BSP. We performed each run three times starting from a random tree with a Markov Chain Monte Carlo (MCMC) simulation for 60 million generations, sampled every 6,000 generations with the first 10% generations discarded as burn-in. For this analysis, we invoked the HKY substitution model without site invariants, which was determined using IQ-TREE web server (http://iqtree.cibiv.univie.ac.at/) to be the best fit model. A strict molecular clock model was applied, given its suitability for phylogenies with shallow roots due to low rate variation among branches., Additionally, we used the coalescent Bayesian Skyline tree prior with 10 groups under a piecewise-constant skyline model to capture population size changes over time. Tracer (v1.7.1) was used to assess convergence across runs.
Bayesian divergence tree between the mitogenome sub-haplogroups E1∗ and E2∗ was generated using BEAST (v2.0). We employed a random starting tree model with a constant population size coalescent tree prior, incorporating eastern Asian wild boars (S. scrofa) from Wu et al. as the outgroup. The mutation rate of 1.2612 x 10−7 was used to calibrate the divergence times. The analysis was conducted by employing an MCMC chain length of 10 million generations, sampling trees every 10,000 generations with a strict clock and using the HKY substitution model without site invariants. We also calculated the divergence time between the mitogenome sub-haplogroup A1 and A1a. We tested both a strict clock and an uncorrelated lognormal-distributed relaxed clock under HKY+G+I. For both models three MCMC runs with 30,000,000 iterations were run, with 10,000 sampling frequency. The first 10% of the generations were discarded as burn-in. The estimated effective sample size (ESS) for all parameters was greater than 200, as determined using Tracer (v1.7.1). The two models were compared with a marginal likelihood estimation using general stepping-stone sampling (GSS) (https://beast.community/model_selection_2). The strict clock provided higher effective sample size (ESS) values because of earlier chain convergence; therefore, it was the chosen model to generate divergence tree. Tree visualization was performed using FigTree (v1.4.4) (http://tree.bio.ed.ac.uk/software/figtree/).
To investigate possible colonization trajectories of the sub-haplogroup E2∗ in Africa, we used the Approximate Bayesian Computation (ABC) to test multiple scenarios of dispersal using the near complete (14,979 bps) mitogenome sequences. Simulations were performed using DIYABC (v2.1). We constructed the dispersal scenarios based on the ML phylogenetic and divergence time tree results as well as archaeological and genetic inferences of European pig domestication from wild boars discerned from previous studies., We divided lineage E2∗ into three metapopulations according to geographic distributions: (i) European-Iberian Peninsula group (EUR) comprising Iberian and Yucatan pigs; (ii) western Africa (WA) group comprising pigs from Angola, Benin, Cameroon, Gambia, and Nigeria; and (iii) eastern Africa (EA) group which included only Ugandan pigs within the sub-haplogroup E2∗. To test our hypothesis of an African origin of E2∗ lineage and determine its dispersal routes across the continent, we implemented an independent model invoking two scenarios that considered the two African pig groups (WA and EA) as the ones contributing to the main difference in the sub-haplogroup (Figures 4D and S13).
Starting with EUR (NEEUR) as the source population: (i) the first dispersal was postulated to be from EUR to WA (NEWA) at time t2 and posteriorly from WA to EA (NEEA) at time t1 (Figure 4D), and (ii) the expansion begun from EUR to EA at time t2 and posteriorly from EA to WA at time t1 (Figure S11). The HKY substitution model was used as selected using IQ–TREE web server to be the best fit. The mean mutation rate was set as 10−8 to 10−7 per site per generation. The statistical summaries (SS) were selected after PCA analysis (Figure S14) to pre-evaluate the similarity between the simulated and empirical datasets through the “evaluate scenario-prior combination option” which checks whether the models together with the chosen prior distributions have the potential to generate a subset of summary statistics close to the observed summary statistics. The SS for the simulated dataset, which were considered under the one-sample SS, included mean of pairwise differences, mean of number of rarest nucleotides at segregating sites, and variance of numbers of the rarest nucleotides at segregating sites. The two-sample SS included number of haplotypes, number of segregating sites, mean of pairwise differences and F. After simulating one million datasets for each of the two scenarios and used both a direct and logistic regression method to compare the posterior probability of scenarios based on the selected summary statistics. The precision of each parameter estimates was evaluated by computing the relative median of the absolute error (RMAE). Finally, the model was verified by calculating the goodness-of-fit statistics of the winner scenario from the observed dataset and visualized using PCA (Figures S15 and S16).


Sections

"[{\"pmc\": \"PMC12775879\", \"pmid\": \"\", \"reference_ids\": [\"bib1\", \"bib2\", \"bib2\", \"bib3\", \"bib4\", \"bib5\", \"bib6\", \"bib7\", \"bib8\", \"bib9\", \"bib9\", \"bib1\", \"bib10\", \"bib11\", \"bib12\", \"bib1\", \"bib13\", \"bib14\"], \"section\": \"Introduction\", \"text\": \"Historically, indigenous pigs (Sus scrofa domesticus) were associated with the human populations of the Berber and ancient Egyptians (from Morocco to Egypt), the Sennar (further south along the Nile River up to Sudan), and populations from western-central Africa in Senegambia (Senegal and Gambia), along a west-east band ranging from Nigeria to Sudan (\\u201cWest African Extension\\u201d), and along a north-south band ranging from southern Cameroon to Angola (\\u201cAngola Extension\\u201d)., Currently, due to cultural reasons, African indigenous pigs are popular with small holder traditional farmers in remote regions in Sub-Saharan Africa and are kept mainly for subsistence.,,, According to the FAO\\u2019s Domestic Animal Diversity Information System, Africa is home to 49 indigenous breeds, with 5% considered endangered and the status of 54% remaining uncertain (http://dad.fao.org). Unlike exotic breeds, African indigenous pigs are tolerant of feed supply irregularities, fibrous and tannin-rich diets, pathogens, and tropical eco-climatic conditions.,,, They are also highly appreciated by consumers for their marbled meat. Despite the cultural, economic, and historical importance of African indigenous pigs, their origins remain enigmatic., While Sus scrofa is native to areas of northern Africa, there is no archaeological or genetic evidence that these populations gave rise to indigenous African pig breeds. Additionally, the considerable geographic distance (more than 5000 km) between different breeds (such as that between the eastern and western portions of the continent: hereby referred to as western African indigenous pigs and eastern African indigenous pigs) would have likely complicated a single geographic origin scenario for these breeds.,,\"}, {\"pmc\": \"PMC12775879\", \"pmid\": \"\", \"reference_ids\": [\"bib15\", \"bib16\", \"bib17\", \"bib1\", \"bib1\", \"bib10\", \"bib18\", \"bib19\", \"bib20\", \"bib2\", \"bib10\", \"bib2\", \"bib1\", \"bib18\"], \"section\": \"Introduction\", \"text\": \"The origins of pig management and husbandry can be traced to the 10th millennium BCE in the Near Eastern and eastern Asia.,, Archaeological findings in Egypt revealed pig remains dated to the end of the fifth millennium BCE. It has been speculated that, during this period, pigs likely spread from the Near East to the Nile Valley basin. This is partially supported by the finding of Near Eastern mtDNA haplotypes in pig specimens from Egypt\\u2019s museum. Ethnographic data from Sennar in Sudan and Sudan-Ethiopian border indicate that pig farming might have been established in the Nile Valley as early as the Middle Ages, or earlier. This hypothesis is, however, not supported by archaeological evidence, as early medieval sites like Soba in central Sudan have not yielded any pig remains. In western-central Africa, pig bones have been discovered at Nkile, Democratic Republic of the Congo, dated to the ninth century. However, it is still hypothesized that western African indigenous pigs, also known as dwarf pigs (Bakosi, Ashanti, and Elede), originate from Iberian pigs following the introduction of the latter to Africa by Portuguese sailors between the 15th and 19th centuries., This hypothesis has been supported to an extent by the fact that regions in Africa colonized by the Portuguese adopted lusophone terminologies. For instance, the lusophone word for pig, i.e., \\u201cporco,\\u201d has been integrated into numerous western Africa dialects. Linguistic studies have also reported the use of native pig names suggesting earlier pig establishment in Senegambia, and the West African and Angola extension that may be pre-Portuguese.,\"}, {\"pmc\": \"PMC12775879\", \"pmid\": \"\", \"reference_ids\": [\"bib21\", \"bib22\", \"bib23\", \"bib24\", \"bib25\", \"bib26\", \"bib27\", \"bib28\", \"bib29\", \"bib30\", \"bib31\", \"bib32\", \"bib13\", \"bib27\"], \"section\": \"Introduction\", \"text\": \"Over the last 20 years, the two uniparentally inherited genetic marker systems, mitochondrial DNA and Y chromosome, have been widely used to resolve genetic origins, prehistorical range expansions, and demographic processes, in humans, domestic, and wild animals., Their non-recombinant nature makes it possible for phylogeographic signatures to be inferred as reported for various livestock species.,,,,,,, For instance, Wu et al. categorized European and Asian pig mtDNA into three major clades: E\\u2217, representing Europe; A\\u2217, Asian wild boar; and D\\u2217, encompassing both eastern Asia wild boars and domestic pigs. These haplogroups were supported by the development of DomeTree (http://dometree.kiz.ac.cn) and MitoToolPy. Similarly, Ramirez et al. identified three distinct Y chromosome haplogroups (HY1, HY2, and HY3) in global pig populations. HY1 and HY2 are present in European and eastern Asia domestic pigs and in wild boars, while HY3 is unique to eastern Asia populations. The presence of HY1 and HY2 in eastern Asia pig populations has been postulated to have resulted from sex-biased gene flow between European male and eastern Asia female pigs around 200 years ago.\"}, {\"pmc\": \"PMC12775879\", \"pmid\": \"\", \"reference_ids\": [\"bib13\", \"bib14\", \"bib33\", \"bib13\", \"bib13\", \"bib14\", \"bib33\", \"bib10\", \"bib12\", \"bib13\", \"bib34\", \"bib35\", \"bib36\"], \"section\": \"Introduction\", \"text\": \"In African pig populations, genetic analysis revealed the presence of both mtDNA, and Y chromosome markers typically associated with European and Asian pigs.,, The Y chromosome markers are categorized into three distinct haplogroups, highlighting clear genetic clades. However, the mtDNA markers in African pig populations are yet to be standardized into their respective (sub) haplogroups. Nevertheless, phylogeographic analysis of both genetic markers shows that western African pigs share genetic affinity with European pigs, whereas eastern African pigs exhibit genetic similarities to Asian pigs.,, But because of the small sample sizes from Africa and the limited resolution of phylogeny based on partial mtDNA cytochrome b, D loop sequences, and Y chromosome SNPs of pigs identified in previous studies,, a detailed illustration of the origin and dispersal of domestic pigs in Africa has not yet been developed. Here, we analyzed 1,374 near-complete mitochondrial genomes and 917 sequences of the male-specific Y chromosome (MSY) genes, the DEAD-Box Helicase 3 Y-Linked (DDX3Y), Amelogenin Y-Linked (AMELY), and ubiquitously transcribed tetratricopeptide repeat containing Y-linked (UTY),,, to establish a more precise geographic and temporal framework of the evolutionary history and dynamics of modern African indigenous pigs.\"}, {\"pmc\": \"PMC12775879\", \"pmid\": \"\", \"reference_ids\": [\"fig1\", \"mmc1\", \"mmc1\", \"mmc2\", \"fig1\", \"fig1\", \"fig2\", \"fig2\", \"fig2\", \"fig2\", \"fig2\", \"fig2\", \"bib13\", \"mmc1\", \"mmc1\"], \"section\": \"Phylogeny of African pigs based on mtDNA and MSY sequences\", \"text\": \"All three major mtDNA haplogroups A\\u2217, D\\u2217, and E\\u2217, observed in Eurasia, are present in African pigs. Haplogroups E\\u2217 and D\\u2217 are the most abundant, whereas haplogroup A\\u2217 is restricted to eastern Africa, particularly Tanzania, but at a low frequency (Figures 1A, S1, S2, and Data S1A). The maximum likelihood (ML) phylogenetic tree (Figure 1B) nested African pigs within the European and eastern Asian clades, with nine Tanzanian samples representing African haplogroup A\\u2217 positioned at branches just basally to eastern Asian wild boars. For the MSY concatenated gene sequences, the phylogenetic tree (Figures 1C and 1D) also nested African pigs within European and eastern Asian clades. However, the majority (197/202, 98%) of African pig samples are positioned within the European clade, while only about 2% (5/202) are assigned to the eastern Asian clade. The PCA plot for the entire mtDNA dataset (Figure 2A) groups most of the wild boars and domestic pigs according to their geographic origin, except for northern African (Morocco and Tunisia) wild boars, which are grouped with European wild boars. African domestic pigs are grouped within the European and eastern Asian clusters. In the PCA plot of the European cluster (Figure 2ai), local sub-structuring is evident for African pigs, and we designated it as cluster III. The eastern Asian cluster (Figure 2aii) is further divided into two, reflecting the two major clades corresponding to haplogroups A\\u2217 and D\\u2217. Notably, haplogroup A\\u2217 mainly comprises eastern Asian wild boars and nine Tanzanian pigs, whereas haplogroup D\\u2217 includes the remaining African pigs initially assigned to the eastern Asian clade on the ML tree. The PCA results of the entire MSY gene sequences (Figure 2B) reveal a similar structure to the ML tree, with the eastern Asian cluster (Figure 2bi), separating African pigs into those closer to northeastern Asian specimens (i.e., those from Kenya, Uganda, and Angola) and southeastern Asian pigs and wild boar clusters (which included two Nigerian pig samples). The expanded European PCA (Figure 2bii) groups the majority of African pigs with Duroc (D), Yorkshire (Y), and Landrace (L) breeds (referred to as \\u201cEuropean commercial\\u201d), Mangalitza (M) and Pietrain (P) breeds (referred to as \\u201cEuropean natives\\u201d), and European wild boars (referred to as \\u201cwild\\u201d). To further investigate the phylogenetic position of African indigenous pigs in respect to the Near Eastern individuals, we obtained CytB sequences of Near Eastern pigs from Ramirez et al. and analyzed those alongside ours, which were extracted from the mitogenome sequences. The resulting PCA and minimum joining network showed no substantial difference in the clustering pattern revealed by the initial phylogenetic analysis of the mitogenomes for African indigenous pigs (Figures S3 and S4).\"}, {\"pmc\": \"PMC12775879\", \"pmid\": \"\", \"reference_ids\": [\"fig3\", \"mmc2\", \"fig3\", \"mmc1\", \"mmc1\", \"fig2\", \"mmc2\", \"fig3\"], \"section\": \"Mitogenomes reveal genetic structure and ancestral lineages of African pigs\", \"text\": \"The haplotype networks, shown in Figures 3A and 3C, are organized according to their respective haplogroups (Data S1A). Haplogroup E\\u2217 (Figure 3A) comprises sub-haplogroups E1\\u2217, E1a\\u2217, E1a1a1\\u2217, E1a1a1a\\u2217, E1a1b1\\u2217, E1a1b2\\u2217, and E2\\u2217, which represents African individuals (Figures S1 and S2). The majority of haplotypes fall under sub-haplogroups E1a1a1\\u2217 and E2\\u2217, with E2\\u2217 showing a clear correlation with the local sub-structure denoted as cluster III in the PCA (Figure 2ai). Haplotypes within sub-haplogroup E1a1a1\\u2217 are predominantly found in individuals from Nigeria, Benin, Gambia, and Tanzania, exhibiting close genetic ties to Duroc breed and Italian wild boar haplotypes (Data S1A). On the other hand, sub-haplogroup E2\\u2217 includes haplotypes from individuals in Nigeria, Cameroon, Gambia, Benin, Angola, and Uganda, closely related to Iberian and Yucatan pigs. These haplotypes are exclusive to African pigs (Figure 3A).\"}, {\"pmc\": \"PMC12775879\", \"pmid\": \"\", \"reference_ids\": [\"fig3\"], \"section\": \"Mitogenomes reveal genetic structure and ancestral lineages of African pigs\", \"text\": \"Within haplogroup D\\u2217 (Figure 3B), African haplotypes are most frequent in sub-haplogroups D1a1\\u2217, D1e\\u2217, D1h\\u2217, D3a\\u2217, D3a1\\u2217, and D4a\\u2217, while sub-haplogroups D1\\u2217, D1a3\\u2217, and D2\\u2217 show the lowest frequency of African haplotypes. Haplotypes shared between African and non-African pigs are found in sub-haplogroups D1\\u2217, D1a1\\u2217, D1a3\\u2217, D3a1\\u2217, D1e\\u2217, and D1h\\u2217. Notably, a monophyletic group of eastern African pigs is observed within sub-haplogroup D3a\\u2217, with the most common haplotype being shared among pigs from Kenya, Uganda, and Tanzania. This group\\u2019s connection to a median vector suggests the absence of unsampled, or potentially extinct haplotypes within African pigs. Tanzanian individuals are particularly prominent within haplogroup D\\u2217, with higher representation compared to other African pigs (as observed in D1\\u2217, D1a1\\u2217, D1e\\u2217, D3a\\u2217, D3a1\\u2217, and D4a\\u2217). For western African pig representatives, sub-haplogroup D3a1\\u2217 is exclusive to Cameroonian and Gambian indigenous pigs, while Benin pigs are found in D1h\\u2217 and D1a1\\u2217. Sub-haplogroup D1e\\u2217 has the highest frequency of Nigerian individuals, whereas D1\\u2217, which also includes haplotypes from Uganda, Tanzania, and Europe, along with D1a3\\u2217, D2\\u2217, and D3a\\u2217, mainly consists of singletons.\"}, {\"pmc\": \"PMC12775879\", \"pmid\": \"\", \"reference_ids\": [\"fig3\", \"fig1\", \"fig1\", \"fig2\", \"fig3\"], \"section\": \"Mitogenomes reveal genetic structure and ancestral lineages of African pigs\", \"text\": \"In haplogroup A\\u2217 (Figure 3C), haplotypes are exclusive to Tanzanian pigs, predominantly within sub-haplogroup A1\\u2217, which is likely the most ancient within this group. This basal position is congruent with the eastern Asian clade observed in the ML tree (Figure 1D). Sub-haplogroup A1a\\u2217, which includes eastern Asian wild boars, clusters close to A1 despite significant mutational differences. Consistent with the ML tree (Figure 1D) and PCA results (Figures 2B and 2bii), the MSY haplotype network (Figure 3D) groups\\u2019 African haplotypes together with European and eastern Asian ones. All three European haplotypes (commercial, native, and wild) are present in African pigs, with the commercial haplotypes occurring at a higher frequency. A few individual African pigs (5/202, 2%) share haplotypes with eastern Asian pigs.\"}, {\"pmc\": \"PMC12775879\", \"pmid\": \"\", \"reference_ids\": [\"fig4\", \"bib37\", \"bib38\", \"bib24\", \"bib39\", \"fig4\"], \"section\": \"Divergence time within and between sub-haplogroups E1\\u2217 and E2\\u2217, A1 and A1a\", \"text\": \"The earliest split was estimated to be approximately 9.7\\u20135.2 thousand years (ka) for the divergence of E1\\u2217 (which is exclusive to European and northern African wild boars) and E2\\u2217 (Iberian type pigs) from haplogroup E\\u2217 (Figure 4A), whereas the split between the Iberian pig populations and African indigenous pigs occurred approximately 5.2\\u20132.6 ka. The average median divergence time between E2\\u2217 and E1\\u2217 was approximately 7.3 ka and falls within the reported time frame of pig entry into Europe from Anatolia., By around 5 ka, the European pig genome had lost nearly all traces of the Near Eastern lineage. At this point, 96% of the genetic variants found in European pigs were derived from European wild boar., African indigenous pigs of sub-haplogroup E2\\u2217 are estimated to have diverged from the Iberian and Yucatan clade around 4.5 ka (Figure 4A). For A1 (Tanzanian pigs) and A1a (Chinese wild boars), the median divergence time occurred at approximately 28 ka.\"}, {\"pmc\": \"PMC12775879\", \"pmid\": \"\", \"reference_ids\": [\"fig4\", \"mmc1\", \"mmc1\", \"fig4\", \"mmc1\", \"mmc1\", \"fig4\", \"mmc1\", \"mmc1\"], \"section\": \"Population dynamics of African indigenous pigs\", \"text\": \"Population dynamics of African pigs were inferred from Bayesian skyline plots (BSP) for sub-haplogroups E2\\u2217 (Figure 4B), E1a1a1\\u2217 (Figure S8), and D3a\\u2217 (Figure S9) mitogenomes. It depicts changes in effective population sizes (N) over time (Figures 4B, S8, and S9). The E2\\u2217 BSP (Figure 4B) reveals a stable N up to around 500 years ago, after which the N increases rapidly but seems to stabilize again in recent times. For the E1a1a1\\u2217 African pigs (Figure S8), they exhibited a prolonged stable N beginning around 3.5 ka prior to which they had experienced a gradual increase. The stability in N persisted until around 500 years ago when a decline commenced subsequently followed by a rapid expansion from around 100 years ago continuing to date. The BSP for D3a\\u2217 pigs show a different pattern to that of E2\\u2217 and E1a1a1\\u2217 (Figure S9). There was a stable N up to around 6 ka, followed by a very drastic increase until approximately 4 ka. The population stabilizes for a short period before a decline until around 1 ka from which time it starts to increase once again.\"}, {\"pmc\": \"PMC12775879\", \"pmid\": \"\", \"reference_ids\": [\"fig4\", \"fig4\", \"mmc3\", \"bib40\", \"bib40\"], \"section\": \"Dispersal of sub-haplogroup E2\\u2217 pigs into Africa\", \"text\": \"Using ABC strategy, we constructed the most plausible scenario for the colonization of Africa by sub-haplogroup E2\\u2217 based on 14,979 bps of the mitogenome (Figures 4C and 4D). The scenario with the highest posterior probability (0.5006; 95% CI 0.4908\\u20130.5104) identifies pigs from the Iberian Peninsula as the source population. The founding population for African pigs diverged from their Iberian counterparts around 5.25 ka prior to dispersing into western Africa. Eastern African pigs diverged from western Africa ones around 1.98 ka and dispersed to their current location likely through a terrestrial route (Figures 4C and 4D). Many parameter estimates show high values (>0.2) (Data S2) of relative median of the absolute error (RMAE), which highlights potential concerns regarding the reliability of this scenario, as has been suggested by Kamalakkannan et al. However, since our primary objective was to investigate the origin of African pig mtDNA subhaplogroup E2 and their dispersal trajectories across Africa, the high RMAE values did not compromise the evaluation of the scenarios we proposed and tested, an observation also acknowledged in the study by Kamalakkannan et al.\"}, {\"pmc\": \"PMC12775879\", \"pmid\": \"\", \"reference_ids\": [\"bib10\", \"bib41\", \"bib42\", \"bib2\", \"bib42\", \"fig1\", \"bib13\"], \"section\": \"General diversity and phylogeography of pigs in Africa\", \"text\": \"Despite the wide variety of local names across the continent, African indigenous pigs are generally considered a single breed with strong ties to Iberian pigs.,, These pigs go by several regional names, such as the West African Dwarf pig in Nigeria, Ashanti Dwarf pig in Ghana, \\u201cBush pig\\u201d in Togo, Mukota pig in Zimbabwe, Kolbroek in South Africa, Somo in Mali, Olongulu in Angola, and Busia pigs in Kenya., Our results from both mtDNA and MSY markers of African pigs demonstrate genetic affinities with both European and eastern Asian haplotypes (Figures 1A\\u20131D). Interestingly, this pattern mirrors the phylogeographic structure first described by Ramirez et al. where most pigs from western Africa cluster with the European clade, while those from eastern Africa align with the east Asian clade in the phylogenetic tree.\"}, {\"pmc\": \"PMC12775879\", \"pmid\": \"\", \"reference_ids\": [\"fig5\", \"mmc1\", \"fig4\", \"bib47\", \"fig4\", \"mmc3\", \"bib48\", \"bib49\", \"bib50\", \"bib51\", \"bib52\", \"bib53\", \"bib54\", \"bib1\", \"bib2\", \"bib55\", \"bib51\"], \"section\": \"Neolithic human dispersal and introduction of pigs into northwestern Africa\", \"text\": \"The genogeographic distribution of mtDNA sub-haplogroup E2\\u2217 pigs, found across the Iberian Peninsula, western Africa, and Uganda (Figures 5, S1, and S2), offers a compelling model for tracing the movement of Iberian pigs (and by extension humans) across Africa. The phylogenetically based origin of the African E2\\u2217 pig clade, which dates back 4.5 ka (Figure 4B), coincides with archaeological evidence of pig domestication by Neolithic populations in Tangier, Morocco. In line with this, ABC model estimates suggest that Iberian pigs first entered northwestern Africa around 5.25 ka (Figures 4D and Data S2), marking the initial phase of their spread across the continent. The timing of Iberian pigs\\u2019 entry into Africa corresponds more closely with the arrival of the Neolithic cultural package in northern Africa and the Iberian Peninsula around 7 to 5 ka., This connection is further supported by studies from Linst\\u00e4dter et al., Zilh\\u00e3o, and Mart\\u00ednez-S\\u00e1nchez et al., which highlight shared artifacts between northwestern Africa and Iberia Peninsula. Moreover, genetic evidence of introgression between European and African goat populations in Italy and Spain,, further reinforces the long-standing historical genetic exchanges between northwestern Africa and Iberia Peninsula. Pig farming has been an integral part of northern African culture since prehistoric times, particularly among the Berber people, and continued until the advent of Islam., It is likely that the Berber people introduced pig farming into western sub-Saharan Africa, an idea supported by genetic links between the Fulani people, whose presumed homeland lies in the Gambia and Senegal, and Berber populations from Morocco. These genetic connections are thought to date back between 1,190 and 670 years ago, with a confidence interval ranging from 2,090 BCE to CE 130.\"}, {\"pmc\": \"PMC12775879\", \"pmid\": \"\", \"reference_ids\": [\"bib43\", \"bib44\", \"bib45\", \"bib46\"], \"section\": \"\", \"text\": \"The spread of sub-haplogroup E2 (Iberian-type pigs) from western to eastern Africa, aligns with the domestication of African rice, Guinea fowl, and the great Bantu dispersal. In contrast, the presence of sub-haplogroup A1 in eastern Africa, linked to eastern Asian wild boars, indicates earlier connections between eastern Asia and coastal eastern Africa through Indian Ocean trade, see. The distribution of pigs aligns with ancient African civilizations, supporting the idea that pigs are indicative of a settled farming lifestyle. All silhouettes are sourced from https://www.phylopic.org/.\"}, {\"pmc\": \"PMC12775879\", \"pmid\": \"\", \"reference_ids\": [\"bib45\", \"bib45\", \"mmc1\", \"fig4\", \"mmc3\", \"bib45\", \"bib44\", \"bib43\", \"bib56\", \"bib1\", \"bib1\", \"bib1\", \"bib2\", \"fig4\", \"bib42\"], \"section\": \"Genetic legacy of Bantu expansion and distribution of domestic pigs across Sub-Saharan Africa\", \"text\": \"The expansion of Bantu-speaking populations stands as one of the most transformative demographic events in late Holocene Africa, significantly reshaping the continent\\u2019s linguistic, cultural, and biological landscapes. This movement, which is estimated to have occurred between 6 and 4 ka, began in western Africa and spread gradually through the Congo rainforest, eventually reaching eastern and southern Africa in a serial-founder fashion. While our study could not sample individuals from the Congo, the Ugandan pig samples we examined carry the E2\\u2217 sub-haplogroup, which is prevalent in western Africa (Figure S2). ABC model estimates suggest that pigs with the E2\\u2217 sub-haplogroup first dispersed from western Africa into eastern Africa around 1.98 ka (Figure 4D and Data S2). This dispersal coincides spatially\\u2014though not necessarily causally\\u2014with broader population movements and ecological changes in western and central Africa. Despite the uncertainty surrounding the exact timing of animal introductions due to the high RMAE value, the broader pattern of Bantu-speaking population dispersals\\u2014and the associated movement of domesticated species such as guinea fowl, pearl millet, African rice, and yam\\u2014may provide a geographic framework for understanding how pigs could have been incorporated into eastward translocations. The presence of pigs in this context is plausible, particularly given historical accounts of feral pigs in regions like Chad during the 19th century. While documentation of pig introduction during the early period of European contact remains scarce, it is widely accepted that the Portuguese played a key role in this process, particularly along Africa\\u2019s coastal regions. This is reflected in the widespread adoption of the Portuguese term porco for pigs in several local languages., Therefore, the notable increase in the Ne of the collective E2\\u2217 sub-haplogroup in African pigs around 500 years ago in our study (Figure 4B), likely coincides with the onset of Portuguese colonization in Africa during the 16th and 17th centuries AD. It is plausible that Portuguese colonists encouraged local Bantu farmers to expand pig production, following the population bottleneck of pigs initially caused by the spread of Islam.\"}, {\"pmc\": \"PMC12775879\", \"pmid\": \"\", \"reference_ids\": [\"mmc1\", \"mmc1\", \"bib31\", \"bib57\", \"bib58\", \"bib59\", \"bib60\", \"bib61\", \"bib62\", \"bib10\", \"bib14\", \"bib58\", \"bib60\", \"mmc7\", \"bib63\", \"bib63\", \"fig3\", \"mmc1\", \"bib46\", \"fig3\", \"mmc1\", \"mmc1\", \"fig1\", \"mmc1\", \"mmc1\", \"mmc1\"], \"section\": \"The possible impact of improved British breeds and Indian Ocean trade on African pigs\", \"text\": \"The high frequency of haplogroup D\\u2217 haplotypes (Figures S1 and S2) suggests that the majority of pigs associated with Tanzania\\u2014and more broadly, with coastal eastern Africa\\u2014likely originated from a source distinct from those in surrounding inland regions. In this context, the genetic connection between eastern Africa and Asia becomes particularly significant. Several Far-Eastern mitochondrial signatures, such as those embedded in D\\u2217 sub-haplogroups (as defined by MitoToolPy)\\u2014including D1a1\\u2217, D1b\\u2217, D1e\\u2217, D1h\\u2217, D3a\\u2217, and D4\\u2217\\u2014have been detected not only in eastern African pigs from Uganda and Kenya but also in pig populations across Europe and South Asia.,,,,, This shared genetic footprint raises the possibility of at least two plausible routes of introduction into eastern Africa: (i) indirect introgression from improved British breeds brought during the colonial period, which had been extensively hybridized with Chinese sows in the 18th and 19th centuries to select for pigs with earlier reproductive maturity and increased fatness and (ii) direct introduction to the East African coast with India playing a critical role as a transit point, facilitating the movement of pigs together with other domesticates from eastern and southeastern Asia to Africa, likely during the peak of the Indian Ocean trade.,,, Supporting the complexity of these historical processes, the high haplotype diversity coupled with low nucleotide diversity observed across all populations (Data S6) provides insight into demographic dynamics. Such a pattern typically suggests population expansion following a bottleneck or a period of low effective population size. At the same time, this pattern may also reflect the introduction of multiple, genetically distinct populations, consistent with both colonial and pre-colonial trade-driven introductions. Colonial-era introductions, particularly from Europe, appear to have had a more pronounced influence in western Africa. This is supported by both geographic proximity to Europe and a greater degree of haplogroup sharing with European pigs (Figure 3B). In contrast, coastal eastern Africa seems to have experienced a more complex admixture of sources, with contributions potentially spanning both European colonialism and earlier Indian Ocean trade influences. Further evidence of these dynamics is seen in the substantial increase in Ne observed in sub-haplogroup D3a\\u2217, which begins around 4ka (Figure S9) and encompasses all three eastern African pig populations (Uganda, Tanzania, and Kenya). Notably, this timeline aligns with archaeological evidence for the arrival of southeastern Asian domestic chickens at Zanzibar during the late fourth millennium suggesting contemporaneous movements of other domesticates. Adding to the genetic complexity, we also report the first identification of the rare mitochondrial DNA sub-haplogroup A1\\u2217 in pigs from eastern Africa, specifically in Tanzania (Figures 3C, S1, and S2). The deep divergence time of this haplotype (Figure S17), its basal phylogenetic position (Figure 1B), and its close clustering with Chinese wild boars (Figures S4, S5, and S7) all point to a potential introgression event involving wild pig populations. Given that A1\\u2217 has historically been confined to Asia, its presence in eastern Africa likely reflects gene flow through domesticated pigs that carried introgressed East Asian wild boar genetic material or, it may suggest a direct introduction of East Asian wild boars, which we speculate may have been brought to the region by Portuguese colonists during their presence along the coastal stretches of eastern Africa. However, this remains a hypothesis, as there is currently no direct historical or archaeological evidence to confirm it. We propose this idea as a direction for future research.\"}, {\"pmc\": \"PMC12775879\", \"pmid\": \"\", \"reference_ids\": [\"fig3\", \"fig4\", \"bib13\", \"bib14\", \"bib12\", \"fig3\", \"mmc1\", \"bib2\", \"bib62\", \"bib5\", \"bib12\", \"bib12\", \"bib64\", \"bib65\", \"bib66\"], \"section\": \"The consequences of recent European colonization on African pigs\", \"text\": \"While we observed the presence of all major European (commercial, native, and wild) and eastern Asian (northern and southern Chinese) MSY haplotypes in the studied African pig populations\\u2014consistent with the mtDNA dataset (Figure 3bii and 4D)\\u2014European haplotypes were found at a significantly higher frequency than their eastern Asian counterparts. This contrasts with the mtDNA data, where eastern Asian alleles are especially prevalent, particularly in East Africa. One possible explanation for this discrepancy is that eastern Asian MSY haplotypes may have been replaced over time by European ones due to later waves of introduction. For example, Ramirez et al. reported a high frequency of the Asia-specific HY3 haplotype in Kenyan and Zimbabwean Mukota pigs (35% and 100%, respectively). However, a subsequent study by Noce et al. found no trace of HY3 in eastern African pigs, supporting the hypothesis that European male lineages have gradually supplanted earlier Asian ones. This pattern aligns with historical records of large-scale European pig imports into Africa, driven by colonial agricultural development and undocumented subsistence-level exchanges. This pattern is mirrored in our mtDNA results, which show a high frequency of the sub-haplogroup E1a1a1\\u2217 (Figures 3A and S2), linked to the Duroc breed\\u2014an exotic European pig developed in Wisconsin from diverse lineages such as Berkshire, Iberian, Tamworth, and Red Guinea Hog., Crossbreeding between African indigenous pigs and imported European breeds is not uncommon, especially in remote rural areas across the continent., For instance, in eastern Africa\\u2014such as Uganda\\u2014the local commercial pig industry has largely transitioned to exotic breeds like Camborough, Landrace, and Large White, often resulting in widespread hybridization. This trend likely reflects both a recognition of the superior productivity traits associated with exotic breeds and the effects of loose breeding management practices. Accidental interbreeding may be especially common in regions with active or historical restocking programs. The persistence of mtDNA haplogroup D in African pig populations compared to HY3 may therefore be due to (i) European crossbreeds that trace their maternal lineages to Chinese breeds and (ii) the fact that very few farmers own boars making pigs carrying HY3 alleles particularly susceptible to genetic bottlenecks. Interestingly, we also detected MSY genetic signatures of European wild boars in several western African pig populations. Historical accounts confirm that European wild boars were introduced to northern, western, and southern Africa during the colonial era, often for use in hunting expeditions, providing a plausible explanation for their genetic imprint in these regions.\"}, {\"pmc\": \"PMC12775879\", \"pmid\": \"\", \"reference_ids\": [\"bib12\", \"bib27\", \"bib67\", \"bib68\", \"bib24\", \"bib30\", \"bib38\", \"bib69\", \"bib17\", \"bib69\", \"bib70\"], \"section\": \"The consequences of recent European colonization on African pigs\", \"text\": \"In conclusion, this study sheds light on the genetic diversity and population structure of African indigenous pig breeds, highlighting their importance not only as cultural and economic assets but also as critical reservoirs of genetic resources. Importantly, this research supports the formulation of evidence-based livestock policies in Africa that promote the protection and utilization of indigenous breeds as highlighted by the African Union \\u2013 Inter-African Bureau of Animal Resources. By aligning conservation efforts with international and regional agricultural development goals, policymakers can enhance food security, rural livelihoods, and climate resilience across the continent. It is expected that a more detailed depiction of the evolutionary history of African indigenous pigs could be achieved by analyzing high-resolution Y chromosomal markers, whole-genome variations, structural variation, and ancient DNA., Integrating those molecular data with a thorough revision of the morphology of modern and archaeological specimens would further allow to cross validate timings of divergence and dispersals (e.g.,,). Modern morphometric methods have notably proved useful tools to quantify morphological variations in domesticated animals and their closely related wild taxa, and notably in suids (e.g.,,,). We recommend that future research focuses on identifying and characterizing the mechanisms of adaptive evolution in African indigenous pigs, particularly those conferring resistance to endemic diseases in Africa, to further inform strategies for African indigenous pig conservation, improved survivability, and enhanced meat production.\"}, {\"pmc\": \"PMC12775879\", \"pmid\": \"\", \"reference_ids\": [\"mmc2\"], \"section\": \"\", \"text\": \"This paper analyzes existing, publicly available data. The accessions for this data can be found in Data S1A and S1B.\"}, {\"pmc\": \"PMC12775879\", \"pmid\": \"\", \"reference_ids\": [\"mmc2\", \"bib71\", \"bib72\", \"mmc2\"], \"section\": \"\", \"text\": \"The Y chromosome variation data reported in this paper (Data S1B) have been deposited in the Genome Variation Map (GVM) in National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, under accession number GVM001001. The assembled mitogenomes from this study (Data S1A) have been submitted to GenBank under the accession ID PQ388328-PQ389421.\"}, {\"pmc\": \"PMC12775879\", \"pmid\": \"\", \"reference_ids\": [\"sec8.1\"], \"section\": \"\", \"text\": \"No other custom code/software was used for data analysis in the study. The publicly available software and algorithms used in the present study are listed in the key resources table.\"}, {\"pmc\": \"PMC12775879\", \"pmid\": \"\", \"reference_ids\": [\"gs14\", \"gs15\", \"gs16\"], \"section\": \"Acknowledgments\", \"text\": \"We appreciate all the volunteers who contributed to this project. We thank the support from the Animal Branch of the Germplasm Bank of Wild Species;  (the Large Research Infrastructure Funding); and West Africa Livestock Innovation Centre, The Gambia. L.A.O is supported by ANSO scholarship for young talent. A.C.A. is supported by the Yunnan Revitalization Talent Support Program: High-end Foreign Expert Project. A.S. is supported by the French government in the framework of the University of Bordeaux\\u2019s IdEx \\u201cInvestments for the Future\\u201d program/GPR \\u201cHuman Past\\u201d. This paper contributes to the results framework of CGIAR\\u2019s SAPLING Initiative. ICARDA wishes to acknowledge the China government\\u2019s contribution to its activities and support from donors to the  Trust Fund. This work was supported by grants from the  (SAJC202402 to M.-S.P.); the National Foreign Expert Project (H20240773 to A.C.A.); and the Chinese Academy of Sciences President\\u2019s International Fellowship Initiative, Special Expert (2024FSB0002 to A.C.A.).\"}, {\"pmc\": \"PMC12775879\", \"pmid\": \"\", \"reference_ids\": [\"mmc2\", \"bib5\", \"bib98\", \"bib99\", \"mmc1\", \"mmc2\", \"mmc1\", \"mmc2\"], \"section\": \"Sample information\", \"text\": \"In this study, we de novo assembled a total of 463 near-complete African pig mitogenomes (79 from Benin, 58 from Cameroon, 61 from The Gambia, 105 from Nigeria, 124 from Tanzania, and 36 from Uganda), from whole genome paired-end illumina resequencing reads (Data S1A). The sample collection represented pigs managed under traditional scavenging systems by rural smallholder farmers in sub-Saharan Africa with no known history of crossbreeding with commercial breeds (e.g.,,), To minimize relatedness, efforts were made\\u2014through farmer questionnaires\\u2014to avoid sampling related animals up to the third generation. These were analyzed alongside mitogenomes of domestic pigs and wild boar populations from Europe (n = 212), Asia (n = 666), northern Africa (n = 5) and sub-Saharan African pigs from Angola (n = 3) and Kenya (n = 7) comprising data that are newly sequenced as well as derived from IAnimal, GenBank, and Sequence Read Archive (SRA) repository (Figure S10 and Data S1A). We also generated sequences of three MSY genes (DDX3Y, AMELY, and UTY) of male pigs from Uganda (n = 11), Tanzania (n = 44), Benin (n = 24), The Gambia (n = 47), Nigeria (n = 46), and Cameroon (n = 21) and analyzed them together with male pigs data from Europe (n = 452), eastern Asia (n = 263), additional Nigeria (n = 5), Kenya (n = 2) and Angola (n = 2), comprising individuals derived from IAnimal, SRA and newly sequenced data (Figure S10 and Data S1B and S1C).\"}, {\"pmc\": \"PMC12775879\", \"pmid\": \"\", \"reference_ids\": [\"bib75\", \"bib74\", \"bib76\", \"bib77\", \"bib78\", \"bib77\", \"bib79\", \"bib80\", \"mmc2\"], \"section\": \"Mitogenome assembly\", \"text\": \"Raw resequencing reads were trimmed of adapters and low-quality bases (<10) using fastp (v0.23.0). Trimmed reads were retained for assembly if they were \\u2265150 base pairs (bps) in length with a phred score \\u226520, respectively. Reads were then aligned to the S. scrofa mitochondrial reference genome (GenBank number: NC_000845.1) using default parameters with BWA-MEM (v0.7.17). Sequence Alignment Map (SAM) files were converted to Binary Alignment Map (BAM) files, sorted, indexed, and duplicates identified using samtools (v1.3.1), picard (http://broadinstitute.github.io/picard/), and GATK (v4.1.4.1). The list of successfully mapped reads was retrieved by invoking other tools using samtools (v1.3.1), and consequently used to extract mapped reads from the original fastq files using seqtk (https://github.com/lh3/seqtk/). The extracted mitogenome reads were assembled into contigs using megahit (v1.2.9) and then aligned against the mitochondrial reference genome using Aliview (v1.28). The final dataset comprised 1,374 (1,333 newly assembled and 41 from GenBank) mitogenomes spanning a total length of 14,979 bps after excluding gaps and ambiguous bases (Data S1A).\"}, {\"pmc\": \"PMC12775879\", \"pmid\": \"\", \"reference_ids\": [\"bib73\", \"bib77\", \"mmc1\", \"mmc2\", \"bib81\", \"bib100\", \"mmc1\", \"mmc1\", \"bib101\", \"bib82\", \"bib83\", \"bib84\"], \"section\": \"Y chromosome variant calling\", \"text\": \"Using GATK HaplotypeCaller in the GVCF mode and the S. scrofa 11.1 reference genome assembly (https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000003025.6/), a total of 326,302 high-quality Y chromosomal variants were generated. The detected variants were filtered by the \\u201cVariantFiltration\\u201d tool with parameters of \\u201cQD < 2.0, FS > 60.0, MQ < 40.0, MQRankSum < \\u22128.0\\u201d. To minimize false positives in the called variants, the average depth of sex chromosomes and autosomes was calculated individually using samtools (v.1.3.1). We utilized the differences in depth of coverage of sex chromosomes between female and male individuals to determine the sexes of our African pig samples. We further determined the sexes of all WGS data and validated this using publicly available sample sex information (Figure S11 and Data S1B and S1C). Plink (v1.9) was used to filter MSY chromosomal variants with the following criteria: (i) removed female individuals and obtained 216,440 Y chromosome SNPs (Figure S12); (ii) retained SNP sites with missing genotype rate of <5%; (iii) removed SNP sites with minor allele frequencies <0.001; and (iv) retained only hemizygous sites in the kept male samples by removing all heterozygous sites (Figure S12). Lastly, similarly to Escouflaire and Capitan, we used beagle to impute missing genotypes, and the three MSY genes were extracted and concatenated using vcftools (v0.1.12b) and BCFtools, and then converted into fasta format using vcf2fasta (https://github.com/santiagosnchez/vcf2fasta). The final dataset comprised of 917 male specific concatenated gene sequences with total lengths of 462 SNPs.\"}, {\"pmc\": \"PMC12775879\", \"pmid\": \"\", \"reference_ids\": [\"bib85\", \"bib32\", \"bib86\", \"bib87\", \"bib88\"], \"section\": \"Haplogroup classification and population structure\", \"text\": \"The online MAFFT (https://mafft.cbrc.jp/alignment/server/index.html) was used to perform multiple sequence alignments for the newly assembled and NCBI mitogenomes, and MSY sequences. MitoToolPy\\u2013seq.py was then used to classify mtDNA haplogroups and their geographic distribution was visualized using ArcMap (v10.7.1). To assess population stratification based on mitogenomes and MSY-chromosome gene sequences, we performed principal component analysis (PCA) using Tassel (v5). The PCA plot was visualized using ggplot2 package in R.\"}, {\"pmc\": \"PMC12775879\", \"pmid\": \"\", \"reference_ids\": [\"bib89\", \"bib90\", \"bib91\", \"bib92\"], \"section\": \"Phylogenetic analysis and network construction\", \"text\": \"We constructed a maximum likelihood (ML) tree using RaxML (v8) with the GTRCAT model and 1,000 bootstrap replicates. This model has been found to be efficient and accurate enough with faster computation times for phylogenetic analysis (https://evomics.org/learning/phylogenetics/raxml/). African warthogs (Phacochoerus africanus), pygmy hogs (Porcula salvania), and Malaysian bearded pig (S. barbatus) were used as outgroups. The ML tree was visualized with iTOL (v5). To generate the haplotype network, we used FastHaN, and visualized it using tcsBU (tcsBU \\u2013 TCS Beautifier (up.pt)).\"}, {\"pmc\": \"PMC12775879\", \"pmid\": \"\", \"reference_ids\": [\"bib97\"], \"section\": \"Genetic diversity, demographic dynamics and divergence time\", \"text\": \"Haplotype diversity, nucleotide diversity and number of haplotypes for haplogroup D mitogenome sequences were calculated using Arlequin (v3.5). Arlequin was also used to perform Fu\\u2019s Fs and Tajima\\u2019s D neutrality tests to detect evidence of recent population expansion. The significance of the deviations from neutrality were assessed by 1000 coalescent simulations.\"}, {\"pmc\": \"PMC12775879\", \"pmid\": \"\", \"reference_ids\": [\"bib93\", \"bib30\", \"bib94\", \"bib102\", \"bib103\", \"bib95\"], \"section\": \"Genetic diversity, demographic dynamics and divergence time\", \"text\": \"To infer demographic dynamics of African specific pig mitogenomes and which ones clustered in sub-haplogroups E2\\u2217, E1a1a1\\u2217 and D3a\\u2217 of the global haplotype networks, we implemented three separate analyses with the Bayesian Skyline plot (BSP) using the random starting tree model in BEAST (v2.0). The recently published pig mitogenome mutation rate of 1.2612 x10\\u22127 was used to calibrate the BSP. We performed each run three times starting from a random tree with a Markov Chain Monte Carlo (MCMC) simulation for 60 million generations, sampled every 6,000 generations with the first 10% generations discarded as burn-in. For this analysis, we invoked the HKY substitution model without site invariants, which was determined using IQ-TREE web server (http://iqtree.cibiv.univie.ac.at/) to be the best fit model. A strict molecular clock model was applied, given its suitability for phylogenies with shallow roots due to low rate variation among branches., Additionally, we used the coalescent Bayesian Skyline tree prior with 10 groups under a piecewise-constant skyline model to capture population size changes over time. Tracer (v1.7.1) was used to assess convergence across runs.\"}, {\"pmc\": \"PMC12775879\", \"pmid\": \"\", \"reference_ids\": [\"bib31\"], \"section\": \"Genetic diversity, demographic dynamics and divergence time\", \"text\": \"Bayesian divergence tree between the mitogenome sub-haplogroups E1\\u2217 and E2\\u2217 was generated using BEAST (v2.0). We employed a random starting tree model with a constant population size coalescent tree prior, incorporating eastern Asian wild boars (S. scrofa) from Wu et al. as the outgroup. The mutation rate of 1.2612 x 10\\u22127 was used to calibrate the divergence times. The analysis was conducted by employing an MCMC chain length of 10 million generations, sampling trees every 10,000 generations with a strict clock and using the HKY substitution model without site invariants. We also calculated the divergence time between the mitogenome sub-haplogroup A1 and A1a. We tested both a strict clock and an uncorrelated lognormal-distributed relaxed clock under HKY+G+I. For both models three MCMC runs with 30,000,000 iterations were run, with 10,000 sampling frequency. The first 10% of the generations were discarded as burn-in. The estimated effective sample size (ESS) for all parameters was greater than 200, as determined using Tracer (v1.7.1). The two models were compared with a marginal likelihood estimation using general stepping-stone sampling (GSS) (https://beast.community/model_selection_2). The strict clock provided higher effective sample size (ESS) values because of earlier chain convergence; therefore, it was the chosen model to generate divergence tree. Tree visualization was performed using FigTree (v1.4.4) (http://tree.bio.ed.ac.uk/software/figtree/).\"}, {\"pmc\": \"PMC12775879\", \"pmid\": \"\", \"reference_ids\": [\"bib96\", \"bib37\", \"bib104\", \"fig4\", \"mmc1\"], \"section\": \"Inference of mtDNA sub-haplogroup E2 colonization history in Africa\", \"text\": \"To investigate possible colonization trajectories of the sub-haplogroup E2\\u2217 in Africa, we used the Approximate Bayesian Computation (ABC) to test multiple scenarios of dispersal using the near complete (14,979 bps) mitogenome sequences. Simulations were performed using DIYABC (v2.1). We constructed the dispersal scenarios based on the ML phylogenetic and divergence time tree results as well as archaeological and genetic inferences of European pig domestication from wild boars discerned from previous studies., We divided lineage E2\\u2217 into three metapopulations according to geographic distributions: (i) European-Iberian Peninsula group (EUR) comprising Iberian and Yucatan pigs; (ii) western Africa (WA) group comprising pigs from Angola, Benin, Cameroon, Gambia, and Nigeria; and (iii) eastern Africa (EA) group which included only Ugandan pigs within the sub-haplogroup E2\\u2217. To test our hypothesis of an African origin of E2\\u2217 lineage and determine its dispersal routes across the continent, we implemented an independent model invoking two scenarios that considered the two African pig groups (WA and EA) as the ones contributing to the main difference in the sub-haplogroup (Figures 4D and S13).\"}, {\"pmc\": \"PMC12775879\", \"pmid\": \"\", \"reference_ids\": [\"fig4\", \"mmc1\", \"bib105\", \"mmc1\", \"bib105\", \"bib105\", \"mmc1\", \"mmc1\"], \"section\": \"Inference of mtDNA sub-haplogroup E2 colonization history in Africa\", \"text\": \"Starting with EUR (NEEUR) as the source population: (i) the first dispersal was postulated to be from EUR to WA (NEWA) at time t2 and posteriorly from WA to EA (NEEA) at time t1 (Figure 4D), and (ii) the expansion begun from EUR to EA at time t2 and posteriorly from EA to WA at time t1 (Figure S11). The HKY substitution model was used as selected using IQ\\u2013TREE web server to be the best fit. The mean mutation rate was set as 10\\u22128 to 10\\u22127 per site per generation. The statistical summaries (SS) were selected after PCA analysis (Figure S14) to pre-evaluate the similarity between the simulated and empirical datasets through the \\u201cevaluate scenario-prior combination option\\u201d which checks whether the models together with the chosen prior distributions have the potential to generate a subset of summary statistics close to the observed summary statistics. The SS for the simulated dataset, which were considered under the one-sample SS, included mean of pairwise differences, mean of number of rarest nucleotides at segregating sites, and variance of numbers of the rarest nucleotides at segregating sites. The two-sample SS included number of haplotypes, number of segregating sites, mean of pairwise differences and F. After simulating one million datasets for each of the two scenarios and used both a direct and logistic regression method to compare the posterior probability of scenarios based on the selected summary statistics. The precision of each parameter estimates was evaluated by computing the relative median of the absolute error (RMAE). Finally, the model was verified by calculating the goodness-of-fit statistics of the winner scenario from the observed dataset and visualized using PCA (Figures S15 and S16).\"}]"

Metadata

"{}"