PMC Articles

SPIN enables high throughput species identification of archaeological bone by proteomics

PMCID: PMC9072323

PMID: 35513387


Abstract

Species determination based on genetic evidence is an indispensable tool in archaeology, forensics, ecology, and food authentication. Most available analytical approaches involve compromises with regard to the number of detectable species, high cost due to low throughput, or a labor-intensive manual process. Here, we introduce “Species by Proteome INvestigation” (SPIN), a shotgun proteomics workflow for analyzing archaeological bone capable of querying over 150 mammalian species by liquid chromatography-tandem mass spectrometry (LC-MS/MS). Rapid peptide chromatography and data-independent acquisition (DIA) with throughput of 200 samples per day reduce expensive MS time, whereas streamlined sample preparation and automated data interpretation save labor costs. We confirm the successful classification of known reference bones, including domestic species and great apes, beyond the taxonomic resolution of the conventional peptide mass fingerprinting (PMF)-based Zooarchaeology by Mass Spectrometry (ZooMS) method. In a blinded study of degraded Iron-Age material from Scandinavia, SPIN produces reproducible results between replicates, which are consistent with morphological analysis. Finally, we demonstrate the high throughput capabilities of the method in a high-degradation context by analyzing more than two hundred Middle and Upper Palaeolithic bones from Southern European sites with late Neanderthal occupation. While this initial study is focused on modern and archaeological mammalian bone, SPIN will be open and expandable to other biological tissues and taxa. Available methods to identify species from fragmented archaeological bone and remains suffer a trade-off between cost and resolution. Here, the authors present a workflow that uses automated sample preparation, 10 to 20 times faster data acquisition, and computerized data interpretation to make the technology applicable to large-scale studies.


Full Text

To increase sample processing throughput, we developed a sample preparation protocol consisting of only a few manual steps for easy scale-up (Fig. 1a). A combination of hydrochloric acid and the non-ionic detergent NP-40 in the extraction buffer facilitated demineralization and protein extraction in the same mixture, which otherwise usually requires intermediate centrifugation and buffer exchange. Compared to other tested combinations, HCl and NP-40 were the only candidates that enabled efficient protein cleanup without precipitation (Table S1) and high peptide identification rates in the subsequent LC-MS/MS analysis (Fig. S1). We optimized the demineralization (Fig. S2) and extraction times (Fig. S3) and tested the effect of reduction and alkylation during the extraction step (Fig. S4). Contaminants, detergents, and minerals were removed using a modified protein aggregation capture (PAC) protocol with optimized solvent amounts (Fig. S5), which was automated on a magnetic bead-handling Kingfisher robot starting with only 5 mg of bone material. When parallelized with fast analysis, the automated sample preparation enables a single laboratory operator to continuously process and analyze 200 bone samples per day (Fig. S17). We benchmarked our new sample preparation workflow against the commonly used “in-solution” digestion protocol, “filter-aided sample preparation” (FASP), “gel-aided sample preparation” (GASP), and the more recent “S-trap” using 5 mg per replicate of the same Pleistocene Mammoth bone sample for all methods (Fig. 1b). The number of identified peptide precursors by LC-MS/MS was the lowest for FASP and GASP, which was probably due to losses in the filter or gel. “In-solution” and “S-trap” performed about two-fold better, although “in-solution” had a relatively poor digestion efficiency leading to more missed tryptic cleavages (Fig. S6). The protocol developed for SPIN produced significantly more peptide precursor identifications by almost a factor of two compared to S-trap and in-solution. When comparing robotic processing to manual sample preparation following the SPIN protocol, we observed similar numbers of peptide identifications, but better reproducibility with the robot. Importantly, from a practical standpoint, the sample preparation procedure in SPIN requires much less hands-on time than FASP and GASP, which cannot easily be scaled to 96-well format. Moreover, although the “in-solution” digestion protocol is relatively fast, the lack of protein cleanup makes it more susceptible to contamination problems, which complicates scale-up. Finally, from an economic standpoint, “S-trap” can be costly due to the requirement for proprietary filter devices, whereas the modified PAC workflow in SPIN can be performed with any type of magnetic beads and a simple magnet rack instead of a Kingfisher robot.
To maximize scalability and throughput, SPIN uses very short online LC gradients to speed up the chromatographic separation of the peptide mixtures in line with Orbitrap tandem MS. We achieved sufficient proteome coverage for species discrimination (Fig. 1c) at a throughput of 100 samples per day with data-dependent acquisition (DDA, Fig. S7) and 200 samples per day with data-independent acquisition (DIA, Fig. S8, Table S2). Consequently, data acquisition in SPIN is roughly 20 times faster than common practice in palaeoproteomics (<10 samples per day) and 4 times faster than high-throughput plasma proteomics methods (60 samples per day). In a modern bovine bone sample, fast-scanning DDA identified about 1200 peptide precursors in 11 min, while DIA reached the same number of identifications in 5.6 min when analyzed without a spectral library by directDIA (Fig. 1c). Almost twice as many peptide precursors were identified by analyzing the same file using a dedicated spectral library that was generated once for each species by DDA analysis of offline fractionated peptide mixtures (Figs. S9 and S10). However, as spectral library-based DIA yielded more overlapping peptide precursors, it did not result in a proportional gain in absolute sequence coverage (Fig. 1d). Across the 40 modern reference samples, the DDA method had a median coverage of 3678 amino acids and thereby outperformed the spectral library-free directDIA approach with 3226 amino acids (Fig. S11). The highest median coverage of 4480 amino acids was achieved with library-based DIA. As expected, sequence coverage was highest for the two most abundant bone proteins, COL1A1 and COL1A2. Since there was almost no additional coverage gained by including more protein sequences for the database search than the top 20 most abundant protein-coding genes, we decided to focus the SPIN analysis on only those 20 genes and thereby reduce noise and simplify the protein sequence database assembly and alignment (Supplementary Data 1).
The completeness and quality of protein sequence databases are vastly different between taxa (Fig. 2b). Missing genes, gaps, and stretches of incorrect amino acid sequences can introduce a bias towards well-annotated species when the proteomics-based taxonomic assignment is performed based on simple metrics like the number of identified peptides or protein groups. Therefore, we built a species inference algorithm encompassing site-based species-to-species comparisons (Fig. 2a). It is based on a gene-wise multiple sequence alignment (MSA) for all protein sequences across all available mammalian species for each of the 20 most highly expressed protein-coding genes in bone (Supplementary Data 1). Further manual refinement was needed to remove faulty sequence inserts or obvious prediction errors like frameshifts and the SPIN sequence inference algorithm was configured to automatically remove species (21 out of 177 in the current database) that lack more than 5 out of the 20 genes, because species with too many missing genes could not be reliably assigned. For all species identified in this study, sequences for all 20 genes were available for all taxa, except the white-tailed deer (19 genes), the European bison (15 genes), and the aurochs (15 genes). To confirm that the sequence information of the 20 genes is sufficient for resolving the taxonomy of all the 156 species in the database, we built a phylogenetic tree based on the refined sequences of the 20 most highly expressed bone proteins. Reassuringly, we observed that its topology matched the currently established phylogeny based on morphological and genomic data (Figs. 2b and S16). The protein sequence alignment was also the basis for creating a “site-specific difference matrix” by performing every possible species-to-species comparison. For all pairwise comparisons, this matrix only considers sites that are known for both species and have different amino acids. The aligned database was also used for mapping the LC-MS/MS-based peptide identifications to the correct genes and locations (“Mapping to alignment”, Fig. 2a). Once mapped, the peptide data could be converted to site-level by splitting peptide identifications into amino acid identifications. In this format, the data can be used to score every species-to-species comparison in the site difference matrix. To best integrate the multiple MS metrics in the scoring scheme, peptide intensity, precursor count, peptide count, and maximum score, were scaled and combined into a normalized joined score (J-Score). The J-Score does not necessarily reflect the actual amino acid probability for each site but assigns a higher weight to amino acids with better underlying data. The summed J-Scores were used for determining the winner of every species-to-species comparison in the site-specific difference matrix (Fig. 2c). The algorithm allows for a single or multiple indistinguishable species to win most comparisons depending on the sequence coverage and the number of closely related species in the protein sequence database. We added an optional “fine grouping” step using a manually curated list of marker peptides to keep the phylogenetic placement between closely related species consistent, even at low sequence coverage. To this point, a species was assigned to every sample, even including the blanks. The algorithm includes two mechanisms for controlling and minimizing the false-discovery rate (FDR), one to identify samples with too low signal, like blanks, and a second one to control for species that are not yet in the database. Samples with low peptide intensity were removed through quantitative comparison to the abundance of autolysis-derived protease peptides, which we treated like a spike-in standard. The threshold was automatically calibrated based on relative protease intensity in laboratory blanks. The second control mechanism was aimed at the identification of species with insufficient sequence coverage, as this would lead to unreliable classification. Therefore, we extended the database with an equally sized set of decoy species (randomly generated chimera species). The final results were then ranked by sequence coverage and a cutoff was applied to keep the number of decoy identifications below 1%. The comparison of site coverage and relative protease abundance demonstrated that most of the blanks along with the empty samples were successfully removed using the two thresholds and that the coverage decreases with sample age (Fig. 2d). To ensure global FDR control and for easy comparability between samples, all samples analyzed in the entire study were processed in the species inference pipeline, together. The species results were collected in a global result table (Supplementary Data 2) and the proteomics results converted to site-level were used to create a consensus of the identified sequences for each sample (Supplementary Data 3).
We optimized and assessed the performance of the different data acquisition and interpretation strategies using a set of 49 known reference bones from 13 species (Fig. 3a). We compared the sequence coverage (Fig. S11) and shared and unique peptide identifications (Fig. S12) between the three data types library-based DIA, directDIA, and DDA. All samples were placed in the correct genus using library-based DIA, whereas spectral library-free directDIA could not differentiate human from chimpanzee, and DDA was not able to exclude goats for one of the eight sheep samples. Interestingly, all three methods performed equally well, when it came to the placement of taxa within the families. Within bovines, the domestic cattle (Bos taurus) could be distinguished from European bison (Bison bonasus) but not from the aurochs (Bos primigenius). The European bison itself could not be discriminated against from American bison (Bison bison) and yak (Bos mutus) and in one case, from zebu (Bos taurus indicus) (Fig. 3b). The closely related goat (Capra hircus) and sheep (Ovis aries), were correctly identified in all DIA analyses and 10 out of 11 samples in DDA analysis. Within equines, domestic horse (Equus ferus caballus) was successfully discriminated from donkey (Equus africanus asinus), but not the Mongolian wild horse (Equus ferus przewalskii) (Fig. 3c). Fine-grouping was not actually required to distinguish goat and sheep or horse and donkey, but it made the caballus/przewalskii classification more uniform.
Besides common domesticated animals and their wild relatives, we explored the potential to detect and correctly identify great apes. While all three peptide identification methods could successfully classify human (Homo sapiens), orangutan (Pongo abelii), and gorilla (Gorilla gorilla), only DDA and library-based DIA analyses could correctly separate chimpanzee (Pan troglodytes) from Homo sapiens (Fig. 3d). It is noteworthy that both DDA and library-based DIA analysis of two chimpanzee bones assigned one of them to chimpanzee and the other to bonobo (Pan paniscus). Unfortunately, the low quality of the available bonobo protein database prevented closer investigation. These results confirmed that the SPIN workflow can be used to classify great apes at the genus level.
As a proof-of-concept, we investigated the potential to detect species hybrids with SPIN by analyzing two samples from mules. Focusing on two peptides that are distinct between horse and donkey, one of the two mule samples showed high intensity for both sequence variants, as expected, but the second mule showed peptide intensities typical for a donkey (Fig. 3c). We concluded that SPIN is technically capable of hybrid detection, but an assessment of its reliability would require a larger study size.
To benchmark the SPIN analysis strategy against standard-practice bioarchaeological species determination based on bone morphology, we analyzed a set of 63 bone fragments related to human activities at the “Salpetermosen Syd 10” site (MNS50010, ZMK5/2013) in Denmark, which dates to the early Pre-Roman Iron-Age (380 BC–540 AD, Fig. 4a). Some of the bones showed strong signs of decay due to the age and the wet anoxic conditions in the Salpetermosen bog (Fig. 4b). Each specimen was morphologically analyzed by an experienced zooarchaeologist, and the SPIN analysis was conducted in technical duplicates starting on bone powder level. Variations in the input amount, peptide recoveries, and LC-MS/MS performance resulted in one experiment with higher and one with lower average MS intensity. The study was blinded by keeping the morphological species identification undisclosed until the SPIN analysis was finalized.
SPIN analysis using the 5 min DIA method and fine-grouping resulted in 49 exact and 3 approximate species identifications in the higher intensity experiment. In case of the lower-intensity replica experiment, we obtained 44 exact and 2 false species identifications (Fig. 4c). The remaining 11 (high-intensity replicate) and 17 (low-intensity replicate) samples were excluded by the algorithm due to high relative protease intensity. The laboratory blanks were also correctly excluded. The comparison of replicates showed perfect reproducibility between duplicates with site coverage >3000 amino acids. We observed four cases of missing identifications in the high-intensity replicate that were identified in the low-intensity experiment, which we attributed to variability in the bone chips, input amounts, and peptide recovery. Importantly, there were no contradicting species identifications between the two replicates. For 94 out of 98 identified samples from both replicates, the SPIN analysis was in agreement with the morphological analysis (Fig. 4c). Two of the inconsistencies were sheep identifications for a sample that had the appearance of a goat bone, while the other two had low peptide intensity and were classified as cattle and sheep by SPIN but morphologically closer to pig and cattle, respectively. Sheep and goat, which often cannot be discriminated morphologically, were unambiguously identified by SPIN in 39 cases and could not be distinguished in two. For cattle, only Bos is plausible at this time and location and the SPIN and morphological identifications showed good agreement. Nevertheless, we were interested in the performance of SPIN for distinguishing Bos and Bison in degraded material and therefore looked at the overall species distribution in the high-intensity replica experiment (Fig. 4d). Discriminating between Bos and Bison was only possible for 5 out of 14 bovine bones and became significantly more challenging with lower sequence coverage (Supplementary Data 2). Cattle and horses could not be distinguished morphologically in 10 cases, 7 of which could be resolved by SPIN. All three laboratory blanks were correctly excluded (“signal too low”) by the relative protease intensity threshold.
We compared the performance of the three different types of peptide identifications by library DIA, directDIA, and DDA, which performed very similarly to the reference samples. Compared to the well-preserved reference samples, the differences became much more apparent in the more degraded Salpetermosen sample set. The pseudo ROC-curve analysis shows that the DIA-based methods outcompeted DDA, especially in the low amount replicate, indicating higher sensitivity in DIA-based measurements (Fig. 4e). Between the two DIA methods, library-based DIA consistently produced more true species identifications than directDIA.
To challenge the SPIN workflow with highly degraded samples and demonstrate its scalability, we analyzed a set of 213 archaeological bone fragments from three Portuguese archaeological sites with early human occupation (Fig. 5a). To this end, we translated the output of the species inference algorithm to reflect the most likely ancestors that were present at the location and time (Table S3). Analogous to all other samples in this study, we analyzed the Portuguese bones with DDA and DIA, which took 52 h and 26 h of MS acquisition time, respectively. We used the library-based DIA results as the basis for species identification because of its higher resolution, as demonstrated with the Salpetermosen dataset. However, to allow the identification of species for which no spectral library is currently available, such as rodents, we replaced the result with the taxonomy identified by directDIA, whenever directDIA detected such a species. As both results were based on the same raw data, the relative protease threshold remained unaffected, but the sequence coverage was lower with directDIA. In addition, compared to the reference and Salpetermosen samples, these Southern European Middle and Upper Palaeolithic samples suffered from reduced protein sequence coverage across the proteome assembly (Figs. 5c, S13) and an increase in protein deamidation (Fig. S14).
For Lapa do Picareiro, 94 out of 95 samples could be confidently assigned with a species. For these specimens, which were dated approximately between 38,000–41,000 BP (layers GG-II) and 45,000 BP (layer JJ), species composition was relatively similar for both layers (Fig. 5b). Of particular interest was the identification of one specimen of the now-extinct European wild ass (E. hemionus hydruntinus), alongside 37 caballine horses. Most of the ibex and chamois bones, which are not easy to distinguish morphologically, could be uniquely assigned to one of the two (18 out of 20). Finally, bovines and wild boar were exclusively identified in the older “JJ” layer.
For Vale Boi, dated between 31,500 and 29,000 BP, 60 out of 84 samples could be confidently identified. The remaining 24 samples failed to meet the abundance-based quality threshold and were therefore not assigned a species identity (Fig. 5b). The vast majority of the identified bones from layers 6 and 7 and all bones in layer 8 were classified as deer, which is in agreement with the previously reported numbers for large mammals. Equids, including one E. hydruntinus, were only identified in layer 6. With directDIA, four smaller bone fragments from layers 6 and 7 could be classified as rabbits, which were highly abundant at Vale Boi.
Finally, for Gruta da Companheira 12 out of 34 samples could be assigned a confident species identification. Expected to date around 50,000–60,000 BP, 14 samples failed to meet the FDR threshold while eight samples were excluded due to failing to meet the abundance-based quality threshold. Amongst the confidently identified species at Gruta da Companheira were bovines, deer, and rabbits, whereas the only ovicaprine sample could not be uniquely assigned to either ibex or chamois. Although below the relative protease cutoff, the two most interesting bones, which were both found in Galeria 2, matched best to great apes. The sample with the highest sequence coverage (1817 aa) was classified as human or chimpanzee (Fig. S15), whereas the sample with lower coverage (235 aa) matched equally well with all great apes. Here, the SPIN results can be used as a starting point for future in-depth protein and ancient DNA analyses to find out whether these are actually human remains and to eventually define their genetic profile.
To evaluate the SPIN method against the current method of choice for species identification by mass spectrometry, we compared it directly with MALDI-TOF MS-based species analysis of collagen type I PMF, i.e., ZooMS analysis. To ensure a fair comparison, we assembled a representative test set of peptides comprising all 46 samples from our reference set, 20 from Salpetermosen, and 21 from the Portuguese bone assemblages (total = 87 samples), as well as 3 extraction blanks. SPIN and ZooMS analysis were performed using the same amount of peptides per replicate that were generated following the protocol developed for SPIN (Methods, Fig. 1a).
Amongst the known references, SPIN and ZooMS obtained the same level of taxonomic identity in 14 cases (30%), SPIN was more specific in 28 cases (60%), and ZooMS could not determine the species of 4 samples (9%), all of which were pig bones readily identified by SPIN (Fig. 6a). In case of the material from Salpetermosen (Fig. 6b), the same level of identification was achieved for 10 samples (50%), more specific classification was achieved with SPIN in 3 cases (15%) and with ZooMS in 1 case (5%). Samples were not identifiable by SPIN in 1 case (5%) and by ZooMS in 6 cases (30%). For the most degraded material from the Portuguese sites (Fig. 6c), species identifications were on the same level for 10 samples (48%), more specific with SPIN for 4 samples (19%), and exclusively identified by SPIN in 6 cases (29%). We found no case in any of the samples, where the ZooMS and SPIN identifications were mutually exclusive. With ZooMS, all three laboratory blanks were all marked as “unidentifiable”, whereas SPIN assigned one of them as Bos taurus (Fig. 6a). The majority of the ZooMS identifications (57 out of 87, 63.3%) provided a level of taxonomic specificity that cannot be improved further without adding more peptide markers beyond the nine we looked for. We therefore conclude that SPIN provides a level of taxonomic specificity unreachable by current ZooMS collagen PMF approaches, most likely due to the analysis of more divergent non-collagenous proteins, higher dynamic range achieved by chromatography, confidence control for peptide identifications, and higher resolution of the used MS instrument used for SPIN.
a Alluvial diagram showing species identification of 46 reference bone samples and 3 laboratory blanks. Small bars on the x-axis indicate individual samples. Color and position in the middle column represent the true species, whereas the left and right column report the species identification by SPIN and PMF, respectively. Bars with color gradients indicate changing species assignments. b Alluvial diagram showing species identification of 20 representative samples from the Danish Salpetermosen site (Fig. 5). Left column indicates the species identification by SPIN, whereas the right column indicates the species identified by PMF. c Alluvial diagram showing species identification of 21 representative samples from the three Portuguese sites (Fig. 5). Left column indicates the species identification by SPIN, whereas the right column indicates the species identified by PMF.
So far, the SPIN data analysis is limited to 156 species that can be analyzed with DirectDIA, providing good species separation, or alternatively 13 spectral libraries for library-based DIA analysis, achieving the best possible species identification. Depending on the research question and archaeological context, studies may require an extension of the protein database and the set of spectral libraries. Although genomes are available for a plethora of extant mammals, the main bottleneck towards a more comprehensive protein database in our eyes is the genome annotation and sequence prediction for understudied species. With the appropriate sequence database and reference samples at hand, generating spectral libraries is a relatively straightforward task if required for a project. We envision that the number of available reference data will grow with more research groups sharing their protein sequence databases and spectral libraries, in the future. Furthermore, the SPIN workflow itself will likely be improved and expanded over time. We think of it as a modular protocol that can serve as the foundation for “SPIN-off” methods with custom building blocks, like: (i) sample preparation modified to support protein extraction from heavily-processed food products, (ii) data acquisition adapted for different instruments, and (iii) data interpretation including sex identification. Furthermore, it can be adapted to resolve mixtures of proteins from multiple taxa or to quantify protein damage (Fig. S14), in the future. Finally, we anticipate that the SPIN workflow will make LC-MS/MS more accessible for everyone, due to the reduction of the analytical costs per sample and a high degree of automation.
Each bone sample in this study was taken with permission from the respective museum, curator, or institution and the impact was minimized by only removing necessary amounts. The list of all samples and museum identifiers can be found in Supplementary Data 4. A fragment of a Pleistocene mammoth bone from permafrost and dated to ~43,000 BP was used for optimizing methods. Reference samples for Bos taurus, Ovis aries, Sus scrofa (mandibles), and Equus caballus (phalanx) were from the mixed viking-medieval deposits of the archaeological site Hotel Skandinavien (Århus Søndervold) (ZMK139/1964) in Århus, Denmark. The Laboratory of Biological Anthropology, Department of Forensic Medicine, at the University of Copenhagen provided the human reference sample, dentine from a previously described 200–400-year-old premolar from “Almindelig Hospitals cemetery on Østerbrogade” in Copenhagen, Denmark. Reference samples for Bos primigenius, Bison bonasus, Capra hircus, Equus asinus, Equus primigenius, mule (Equus caballus X Equus asinus), Pongo pygmaeus, Gorilla gorilla, and Pan troglodytes were provided by the Natural History Museum of Denmark. The collection of 63 bone fragments dated to the Danish Early Iron-Age was sourced from the “Salpetermosen Syd 10” (MNS50010) site in Denmark. The set of 213 upper-palaeolithic bone fragments from three Portuguese sites consisted of 84 samples from level 6 (29,300 BP), level 7 (30,400 BP), and level 8 (31,500 BP, unpublished) from Vale Boi, 95 samples from layers GG-II (38.000–41,000 BP) and JJ (45,000 BP) from Lapa do Picareiro, and 34 samples from Galeria 1 and 2 from Gruta da Companheira.
The database includes all protein sequences from UniProt knowledgebase release 2020_06 and NCBI RefSeq release 201 (July 2020) from mammalian species and matching to the top 20 genes (Fig. 2b) in.fasta format. The NCBI entries were reannotated with gene and species information using the respective GenPept files and the fasta-headers were changed to a pseudo-Uniprot format: “>NCBI | [protein ID] | [protein ID]_[gene alias] [protein description] OS = [species name] OX = [species ID] GN = [gene name]”. Relevant Uniprot entries with missing or false gene annotations were added by sequence similarity-based reannotation. The UniRef90 (release 2020_06) repository was used to annotate each “90 % similarity” cluster with its most common gene name, followed by downloading all additional proteins matching the top 20 genes and updating the fasta-headers to include the correct gene name. Protein sequences from species missing in the databases like Equus przewalskii, Bison bonasus, and the extinct Bos primigenius were manually extracted from the available genomes using reference sequences of the closest living relatives and the local BLAST and visualization in UGENE. After combining all protein sequences, filtering for mammalian species, and removing duplicates, the sequences were split into 20 separate files for each gene.
For the comparison between PMF- and LC-MS/MS-based species identification, 90 samples and 3 extraction blanks (Supplementary Data 5) were analyzed with both methods. Peptides generated using the SPIN protocol for protein extraction and digestion were split into a 10 µL aliquot for LC-MS/MS and a 30 µL aliquot for triplicate MALDI-TOF analysis. For desalting, the 30 µL were loaded and washed on Evotips as described in “Protein cleanup and digestion for SPIN” and subsequently eluted with two times 20 µL 50% ACN. After vacuum concentration to dryness, the peptides were reconstituted in 3.5 µL 50% ACN and 1 µL was spotted on a MALDI target plate for each replicate and mixed with 1 µL of CHCA (α-cyano-4-hydroxycinnamic acid). MALDI-TOF MS data acquisition was performed using an AutoFlex LRF MALDI-TOF MS instrument (Bruker) at the Fraunhofer Institute (Leipzig, Germany) in reflector mode, positive polarity, matrix suppression up to 590 Da, and spectral collection in the range of 700–3500 m/z. Spectra were exported to.txt files, their baselines were removed, were aligned to their respective triplicate replicates, and subsequently merged through R (R Core Team, 2018). Masses of the nine common peptide markers observed for mammalian species were taken into account following the nomenclature of Brown and the peptide marker database presented by Welker. Instead of peptide marker COL1ɑ1 586–618, which contains a tryptic cleavage site and was therefore cleaved during efficient PAC digestion, the fully tryptic peptide COL1ɑ1 604–618 in the mass range 1281–1327 Da was used.
Further information on research design is available in the Nature Research Reporting Summary linked to this article.