A dataset of scientific dates from archaeological sites in eastern Africa spanning 5000 BCE to 1800 CE
PMCID: PMC12084580
PMID: 40379663
Abstract
Large collections of archaeological spatiotemporal data can reveal past cultural and demographic trends, land use strategies, and processes of environmental adaptation. Within Africa, archaeological Big Data can contribute to the study of the spread of agriculture, domesticated species, and specific artefacts and technologies, as well as their ecological impacts. Although reviews addressing these topics are available for different parts of the continent, existing mid-late Holocene archaeology datasets have yet to be compiled into a central, open-access, standardized informatic-oriented dataset. Here we present Wanyika , a dataset of scientific dates from archaeological sites in eastern Africa spanning almost 7 millennia, from ~5000 BCE to 1800 CE. This dataset compiles published scientific dates and associated botanical, faunal, iron, and ceramic finds from sites in Kenya, Tanzania, the Comoros Islands, and Madagascar. The records also include data for megafauna extinctions in Madagascar. We describe the spatiotemporal coverage of the dataset, how the data were collected, the structure of the dataset, and the applied quality control measures.
Full Text
Since the early 20th century, there has been significant growth in available archaeological data for the mid-to-late Holocene in eastern Africa. These data are the outcome of the application of a variety of approaches, including excavation and survey, as well as archaeobotanical, zooarchaeological, geoarchaeological, isotopic, palaeoproteomic, coring, and remote sensing methods. However, available records have yet to be compiled into a standardized dataset format. Here, we present Wanyika, a dataset of scientific dates and associated archaeological records from mid-late Holocene sites covering four countries (plus a selection of sites in Rwanda) in eastern Africa (Fig. 1). The dataset focuses on these four countries as they possess some of the best documented archaeological records in eastern Africa for this time period, in particular as a result of the application of radiocarbon dating. Wanyika is an informatics-oriented dataset that draws together data spanning almost seven millennia, from 5000 BCE to 1800 CE. The Bantu term ‘Wanyika’ translates as “people of the wilderness” and is used to refer to all inland ethnic groups of eastern Africa, as well as those that migrated to the littoral islands and Madagascar. The associated archaeological records include spatiotemporal data pertaining to botanical, faunal, iron, and ceramic finds from published archaeological sites, in addition to several unpublished sites, across key regions of mainland and island eastern Africa. We have included iron and ceramic finds because they are closely—although not exclusively—associated with the spread of food production in eastern Africa. Ceramic finds are vital because the ceramic styles of hunter-gatherers, pastoralists, and farmers are different. Records for megafaunal persistence and coexistence with humans in Madagascar are also included. Rather than a comprehensive overview, the Wanyika dataset is a preliminary work that serves as a foundation for future research.
The Wanyika dataset covers sites located in Kenya, Tanzania, Comoros, and Madagascar that date to the period between c. 5,000 BCE and 1,800 CE (Fig. 1). The dataset addresses all scientifically dated sites in these countries, providing details of available dates, as well as information about associated crop, faunal, iron, and ceramic finds. In addition, selected Rwandan sites with early evidence for domesticated crops and iron artefacts are included in light of their importance to the study of farming dispersals in eastern Africa.
To facilitate the exploration of geographical patterns in the data (e.g., Table 3), Kenya, Tanzania and Madagascar were sub-divided into smaller sub-regions (‘Country Regions’), e.g., southwestern, northwestern (see Table 1). These divisions are widely used in the archaeological literature, but have not been formally constrained before using geographical coordinates. The boundaries used in this study were defined as follows. Mainland Kenya is divided into four broadly equal-sized regions demarcated by latitude 0.5° and longitude 37.7°, with the hinterland, coast and islands demarcated as the fifth region. Likewise, mainland Tanzania is divided into four regions using latitude −6° and longitude 35°, with the hinterland, coast and islands demarcated as the fifth region. Madagascar is also divided into four almost equal-sized regions using latitude -19° and longitude 47°. The predominant vegetation cover for each site and region have also been included. These are divided into six categories, including forest/wood/grassland mosaic, montane forest, coastal forest mosaic, dry coastal wooded grassland, dry northern wooded grassland, and dry southern wooded grassland.
Data collection followed the workflow summarized in Fig. 2. The authors drew upon more than 500 scientific publications as data sources based on citations in major review articles and other seminal works on the study regions. The scientific search engine Google Scholar was employed to locate further articles using a combination of keywords such as specific country/region names, “archaeology”, “scientific dates”, “archaeobotany”, “zooarchaeology”, “iron”, and “ceramics”. The authors also screened all available volumes of Azania: Archaeological Research in Africa (1967 to 2023) in order to collect further scientific dates and information about associated archaeological finds. For just over half of the published dates, the associated assemblage evidence, i.e. archaeobotanical, zooarchaeological and artefactual evidence, was obtained from separately published specialist articles. Where it was necessary to clarify data issues and locate missing publication data, the original publication authors or expert archaeologists working in eastern Africa were consulted. Radiocarbon dating laboratories were also contacted to provide missing details on published dates and dated material. Based on the references given in review articles and other seminal publications, and on discussions with researchers familiar with the study region, we estimate that at least 90% of the published scientific dates and associated archaeological records from the sampled eastern Africa countries are captured in the Wanyika dataset.
We employ the term “scientific date” to refer to a date that is determined using scientific dating methods to establish the age of an artefact, feature, and/or site. These methods provide a quantifiable measure of time with an associated margin of error. Five types of scientific dating techniques were reported in the publications that were consulted to compile the Wanyika dataset: radiocarbon dating (14C), optically stimulated luminescence (OSL), infrared stimulated luminescence (IRSL), thermoluminescence (TL), and obsidian hydration (OH) (Fig. 1(b)). Radiocarbon dates are calculated based on the abundance of the 14C isotope in samples (e.g., archaeobotanical remains or collagen extracted from zooarchaeological remains). Luminescence dating methods such as OSL, IRSL, and TL determine the last time mineral grains were exposed to sunlight or sufficiently high temperatures. Finally, OH dating measures obsidian water absorption to determine the age of an object. Although OH dating can be used to determine absolute or relative dates, only absolute dates (which can be summarized in intervals of calendar years, as opposed to just older or younger than) are recorded in the Wanyika dataset. Relative dates based on material culture typologies such as ceramics and beads are also not included in our dataset.
We developed four chronometric quality control criteria for the scientific dates and used them to grade dates in the dataset into classes. The first criterion is based on a combination of stratigraphic integrity and reliability and is applied to all dates in the dataset to grade them into four classes (Class A–D, with A being the most secure and reliable, and D being the least secure and reliable) (Fig. 1 and Tables 2, 3). The other three criteria were only applied to 14C dates and are based on (i) whether the date was obtained on short or long lived plant material (i.e., the potential presence of an old wood effect), (ii) the possible presence of an aquatic radiocarbon reservoir effect, and (iii) the accuracy of the chronometric determination (Fig. 1 and Table 4). In each of these cases, dates were assigned into three classes (Class A–C), again with A being the most reliable and C being the least reliable. A description of how the quality control criteria were applied is provided below.
See Table 1 for information on demarcation of country regions.
Subsequently, the points assigned to each date for their stratigraphic integrity and reliability were added and divided by two to give a mean score, which was used to generate Class A-D dates with mean scores of 6, 4–5.5, 2–3.5 and 0–1.5, respectively. The application of the grading system produced comparable class qualities (Table 2). Table 3 summarizes the number of dates based on country and region, dating method, and the stratigraphic integrity and reliability quality control grades.
Radiocarbon dates underwent additional chronometric hygiene using three different criteria (Table 4). The first two criteria consider the in-built age of the sample at death, which varies with the type of material selected for dating. In the case of plant materials, age offsets may be present due to the selection of long-lived wood materials, often referred to as ‘the old wood effect’. Dates obtained on taxonomically identified short-lived plant parts (e.g., annual seeds, leaves, twigs) were graded as Class A, dates obtained on taxonomically identified long-lived plant parts were graded as Class B, and dates where the taxonomic identification of plant parts was not reported were graded as Class C (see Column BJ in the dataset).
The Wanyika dataset has 75 fields organized within eight major categories (Table 5). We provide definitions of these 75 column fields below. Wanyika is a spatiotemporal, flat-file dataset in which each row of the dataset represents a single scientific date associated with archaeological records. The total number of dataset records is 1792, each associated with one of 422 sites. The presence of domesticated crop, faunal, iron, and/or ceramic finds is marked by “Yes” in the specific cell, while absence is represented by a blank cell.
E – Country Region: Name of the region where a site is located within a country, defined by administrative boundaries and latitude/longitude. Information on demarcation of country regions is provided in Table 1.
BF – Ceramic Phase (Pottery Ware): Lists the name/s of ceramic style/s associated with the row data. These include 25 major ceramic/pottery tradition/wares in eastern Africa listed in Table 6.
BG – Regional Cultural Phase (Eastern Africa): Lists the regional cultural phase represented by the recorded cultural assemblage. These includes, (a) Prehistoric (applicable for Madagascar), LSA (Late Stone Age), PN (Pastoral Neolithic), the latter divided into SPN (Savanna Pastoral Neolithic) and EPN (Elmenteitan Neolithic), PIA (Pastoral Iron Age); and (b) EIA (Early Iron Age), MIA (Middle Iron Age), LIA (Late Iron Age). See Table 6 for the time period and ceramic tradition associated with each regional cultural phase.
BJ – Chrono Hygiene 3: Grade based on the possibility that the 14C date is affected by long life span plant material (long/short). See Table 4
BK – Chrono Hygiene 4: Grade based on the possibility of the dated material being affected by aquatic 14C reservoir. See Table 4
BL – Chrono Hygiene 5: Grade based on the accuracy of chronological determinations (pre-treatment protocol) for 14C dates. See Table 4
Sections
"[{\"pmc\": \"PMC12084580\", \"pmid\": \"40379663\", \"reference_ids\": [\"Fig1\"], \"section\": \"Introduction\", \"text\": \"Since the early 20th century, there has been significant growth in available archaeological data for the mid-to-late Holocene in eastern Africa. These data are the outcome of the application of a variety of approaches, including excavation and survey, as well as archaeobotanical, zooarchaeological, geoarchaeological, isotopic, palaeoproteomic, coring, and remote sensing methods. However, available records have yet to be compiled into a standardized dataset format. Here, we present Wanyika, a dataset of scientific dates and associated archaeological records from mid-late Holocene sites covering four countries (plus a selection of sites in Rwanda) in eastern Africa (Fig.\\u00a01). The dataset focuses on these four countries as they possess some of the best documented archaeological records in eastern Africa for this time period, in particular as a result of the application of radiocarbon dating. Wanyika is an informatics-oriented dataset that draws together data spanning almost seven millennia, from 5000 BCE to 1800 CE. The Bantu term \\u2018Wanyika\\u2019 translates as \\u201cpeople of the wilderness\\u201d and is used to refer to all inland ethnic groups of eastern Africa, as well as those that migrated to the littoral islands and Madagascar. The associated archaeological records include spatiotemporal data pertaining to botanical, faunal, iron, and ceramic finds from published archaeological sites, in addition to several unpublished sites, across key regions of mainland and island eastern Africa. We have included iron and ceramic finds because they are closely\\u2014although not exclusively\\u2014associated with the spread of food production in eastern Africa. Ceramic finds are vital because the ceramic styles of hunter-gatherers, pastoralists, and farmers are different. Records for megafaunal persistence and coexistence with humans in Madagascar are also included. Rather than a comprehensive overview, the Wanyika dataset is a preliminary work that serves as a foundation for future research.\"}, {\"pmc\": \"PMC12084580\", \"pmid\": \"40379663\", \"reference_ids\": [\"Fig1\"], \"section\": \"Sampling strategy\", \"text\": \"The Wanyika dataset covers sites located in Kenya, Tanzania, Comoros, and Madagascar that date to the period between c. 5,000 BCE and 1,800 CE (Fig.\\u00a01). The dataset addresses all scientifically dated sites in these countries, providing details of available dates, as well as information about associated crop, faunal, iron, and ceramic finds. In addition, selected Rwandan sites with early evidence for domesticated crops and iron artefacts are included in light of their importance to the study of farming dispersals in eastern Africa.\"}, {\"pmc\": \"PMC12084580\", \"pmid\": \"40379663\", \"reference_ids\": [\"Tab3\", \"Tab1\"], \"section\": \"Demarcation of country regions and vegetation cover\", \"text\": \"To facilitate the exploration of geographical patterns in the data (e.g., Table\\u00a03), Kenya, Tanzania and Madagascar were sub-divided into smaller sub-regions (\\u2018Country Regions\\u2019), e.g., southwestern, northwestern (see Table\\u00a01). These divisions are widely used in the archaeological literature, but have not been formally constrained before using geographical coordinates. The boundaries used in this study were defined as follows. Mainland Kenya is divided into four broadly equal-sized regions demarcated by latitude 0.5\\u00b0 and longitude 37.7\\u00b0, with the hinterland, coast and islands demarcated as the fifth region. Likewise, mainland Tanzania is divided into four regions using latitude \\u22126\\u00b0 and longitude 35\\u00b0, with the hinterland, coast and islands demarcated as the fifth region. Madagascar is also divided into four almost equal-sized regions using latitude -19\\u00b0 and longitude 47\\u00b0. The predominant vegetation cover for each site and region have also been included. These are divided into six categories, including forest/wood/grassland mosaic, montane forest, coastal forest mosaic, dry coastal wooded grassland, dry northern wooded grassland, and dry southern wooded grassland.\"}, {\"pmc\": \"PMC12084580\", \"pmid\": \"40379663\", \"reference_ids\": [\"Fig2\"], \"section\": \"Data collection and deposition\", \"text\": \"Data collection followed the workflow summarized in Fig.\\u00a02. The authors drew upon more than 500 scientific publications as data sources based on citations in major review articles and other seminal works on the study regions. The scientific search engine Google Scholar was employed to locate further articles using a combination of keywords such as specific country/region names, \\u201carchaeology\\u201d, \\u201cscientific dates\\u201d, \\u201carchaeobotany\\u201d, \\u201czooarchaeology\\u201d, \\u201ciron\\u201d, and \\u201cceramics\\u201d. The authors also screened all available volumes of Azania: Archaeological Research in Africa (1967 to 2023) in order to collect further scientific dates and information about associated archaeological finds. For just over half of the published dates, the associated assemblage evidence, i.e. archaeobotanical, zooarchaeological and artefactual evidence, was obtained from separately published specialist articles. Where it was necessary to clarify data issues and locate missing publication data, the original publication authors or expert archaeologists working in eastern Africa were consulted. Radiocarbon dating laboratories were also contacted to provide missing details on published dates and dated material. Based on the references given in review articles and other seminal publications, and on discussions with researchers familiar with the study region, we estimate that at least 90% of the published scientific dates and associated archaeological records from the sampled eastern Africa countries are captured in the Wanyika dataset.\"}, {\"pmc\": \"PMC12084580\", \"pmid\": \"40379663\", \"reference_ids\": [\"Fig1\"], \"section\": \"Scientific dates and calibration\", \"text\": \"We employ the term \\u201cscientific date\\u201d to refer to a date that is determined using scientific dating methods to establish the age of an artefact, feature, and/or site. These methods provide a quantifiable measure of time with an associated margin of error. Five types of scientific dating techniques were reported in the publications that were consulted to compile the Wanyika dataset: radiocarbon dating (14C), optically stimulated luminescence (OSL), infrared stimulated luminescence (IRSL), thermoluminescence (TL), and obsidian hydration (OH) (Fig.\\u00a01(b)). Radiocarbon dates are calculated based on the abundance of the 14C isotope in samples (e.g., archaeobotanical remains or collagen extracted from zooarchaeological remains). Luminescence dating methods such as OSL, IRSL, and TL determine the last time mineral grains were exposed to sunlight or sufficiently high temperatures. Finally, OH dating measures obsidian water absorption to determine the age of an object. Although OH dating can be used to determine absolute or relative dates, only absolute dates (which can be summarized in intervals of calendar years, as opposed to just older or younger than) are recorded in the Wanyika dataset. Relative dates based on material culture typologies such as ceramics and beads are also not included in our dataset.\"}, {\"pmc\": \"PMC12084580\", \"pmid\": \"40379663\", \"reference_ids\": [\"Fig1\", \"Tab2\", \"Tab3\", \"Fig1\", \"Tab4\"], \"section\": \"Quality control\", \"text\": \"We developed four chronometric quality control criteria for the scientific dates and used them to grade dates in the dataset into classes. The first criterion is based on a combination of stratigraphic integrity and reliability and is applied to all dates in the dataset to grade them into four classes (Class A\\u2013D, with A being the most secure and reliable, and D being the least secure and reliable) (Fig.\\u00a01 and Tables\\u00a02, 3). The other three criteria were only applied to 14C dates and are based on (i) whether the date was obtained on short or long lived plant material (i.e., the potential presence of an old wood effect), (ii) the possible presence of an aquatic radiocarbon reservoir effect, and (iii) the accuracy of the chronometric determination (Fig.\\u00a01 and Table\\u00a04). In each of these cases, dates were assigned into three classes (Class A\\u2013C), again with A being the most reliable and C being the least reliable. A description of how the quality control criteria were applied is provided below.\"}, {\"pmc\": \"PMC12084580\", \"pmid\": \"40379663\", \"reference_ids\": [\"Tab1\"], \"section\": \"\", \"text\": \"See Table\\u00a01 for information on demarcation of country regions.\"}, {\"pmc\": \"PMC12084580\", \"pmid\": \"40379663\", \"reference_ids\": [\"Tab2\", \"Tab3\"], \"section\": \"Stratigraphic integrity and reliability grading\", \"text\": \"Subsequently, the points assigned to each date for their stratigraphic integrity and reliability were added and divided by two to give a mean score, which was used to generate Class A-D dates with mean scores of 6, 4\\u20135.5, 2\\u20133.5 and 0\\u20131.5, respectively. The application of the grading system produced comparable class qualities (Table\\u00a02). Table\\u00a03 summarizes the number of dates based on country and region, dating method, and the stratigraphic integrity and reliability quality control grades.\"}, {\"pmc\": \"PMC12084580\", \"pmid\": \"40379663\", \"reference_ids\": [\"Tab4\"], \"section\": \"Radiocarbon (14C) date grading\", \"text\": \"Radiocarbon dates underwent additional chronometric hygiene using three different criteria (Table\\u00a04). The first two criteria consider the in-built age of the sample at death, which varies with the type of material selected for dating. In the case of plant materials, age offsets may be present due to the selection of long-lived wood materials, often referred to as \\u2018the old wood effect\\u2019. Dates obtained on taxonomically identified short-lived plant parts (e.g., annual seeds, leaves, twigs) were graded as Class A, dates obtained on taxonomically identified long-lived plant parts were graded as Class B, and dates where the taxonomic identification of plant parts was not reported were graded as Class C (see Column BJ in the dataset).\"}, {\"pmc\": \"PMC12084580\", \"pmid\": \"40379663\", \"reference_ids\": [\"Tab5\"], \"section\": \"Data Records\", \"text\": \"The Wanyika dataset has 75 fields organized within eight major categories (Table\\u00a05). We provide definitions of these 75 column fields below. Wanyika is a spatiotemporal, flat-file dataset in which each row of the dataset represents a single scientific date associated with archaeological records. The total number of dataset records is 1792, each associated with one of 422 sites. The presence of domesticated crop, faunal, iron, and/or ceramic finds is marked by \\u201cYes\\u201d in the specific cell, while absence is represented by a blank cell.\"}, {\"pmc\": \"PMC12084580\", \"pmid\": \"40379663\", \"reference_ids\": [\"Tab1\"], \"section\": \"Definition of column fields\", \"text\": \"E - Country Region: Name of the region where a site is located within a country, defined by administrative boundaries and latitude/longitude. Information on demarcation of country regions is provided in Table\\u00a01.\"}, {\"pmc\": \"PMC12084580\", \"pmid\": \"40379663\", \"reference_ids\": [\"Tab6\"], \"section\": \"Definition of column fields\", \"text\": \"BF - Ceramic Phase (Pottery Ware): Lists the name/s of ceramic style/s associated with the row data. These include 25 major ceramic/pottery tradition/wares in eastern Africa listed in Table\\u00a06.\"}, {\"pmc\": \"PMC12084580\", \"pmid\": \"40379663\", \"reference_ids\": [\"Tab6\"], \"section\": \"Definition of column fields\", \"text\": \"BG - Regional Cultural Phase (Eastern Africa): Lists the regional cultural phase represented by the recorded cultural assemblage. These includes, (a) Prehistoric (applicable for Madagascar), LSA (Late Stone Age), PN (Pastoral Neolithic), the latter divided into SPN (Savanna Pastoral Neolithic) and EPN (Elmenteitan Neolithic), PIA (Pastoral Iron Age); and (b) EIA (Early Iron Age), MIA (Middle Iron Age), LIA (Late Iron Age). See Table\\u00a06 for the time period and ceramic tradition associated with each regional cultural phase.\"}, {\"pmc\": \"PMC12084580\", \"pmid\": \"40379663\", \"reference_ids\": [\"Tab4\"], \"section\": \"Definition of column fields\", \"text\": \"BJ - Chrono Hygiene 3: Grade based on the possibility that the 14C date is affected by long life span plant material (long/short). See Table\\u00a04\"}, {\"pmc\": \"PMC12084580\", \"pmid\": \"40379663\", \"reference_ids\": [\"Tab4\"], \"section\": \"Definition of column fields\", \"text\": \"BK - Chrono Hygiene 4: Grade based on the possibility of the dated material being affected by aquatic 14C reservoir. See Table\\u00a04\"}, {\"pmc\": \"PMC12084580\", \"pmid\": \"40379663\", \"reference_ids\": [\"Tab4\"], \"section\": \"Definition of column fields\", \"text\": \"BL - Chrono Hygiene 5: Grade based on the accuracy of chronological determinations (pre-treatment protocol) for 14C dates. See Table\\u00a04\"}]"
Metadata
"{}"