PMC Articles

The global biogeography of tree leaf form and habit

PMCID: PMC10654052

PMID: 37872262


Abstract

Understanding what controls global leaf type variation in trees is crucial for comprehending their role in terrestrial ecosystems, including carbon, water and nutrient dynamics. Yet our understanding of the factors influencing forest leaf types remains incomplete, leaving us uncertain about the global proportions of needle-leaved, broadleaved, evergreen and deciduous trees. To address these gaps, we conducted a global, ground-sourced assessment of forest leaf-type variation by integrating forest inventory data with comprehensive leaf form (broadleaf vs needle-leaf) and habit (evergreen vs deciduous) records. We found that global variation in leaf habit is primarily driven by isothermality and soil characteristics, while leaf form is predominantly driven by temperature. Given these relationships, we estimate that 38% of global tree individuals are needle-leaved evergreen, 29% are broadleaved evergreen, 27% are broadleaved deciduous and 5% are needle-leaved deciduous. The aboveground biomass distribution among these tree types is approximately 21% (126.4 Gt), 54% (335.7 Gt), 22% (136.2 Gt) and 3% (18.7 Gt), respectively. We further project that, depending on future emissions pathways, 17–34% of forested areas will experience climate conditions by the end of the century that currently support a different forest type, highlighting the intensification of climatic stress on existing forests. By quantifying the distribution of tree leaf types and their corresponding biomass, and identifying regions where climate change will exert greatest pressure on current leaf types, our results can help improve predictions of future terrestrial ecosystem functioning and carbon cycling. Integrating inventory data with machine learning models reveals the global composition of tree types—needle-leaved evergreen individuals dominate, followed by broadleaved evergreen and deciduous trees—and climate change risks.


Full Text

Here we analyse the global distribution of needle-leaved, broadleaved, evergreen and deciduous tree species, by integrating ground-sourced information from 9,781 standardized forest inventory plots in the Global Forest Biodiversity initiative (GFBi) database with species-level leaf habit (evergreen vs deciduous) and leaf form (broadleaf vs needle-leaf) information accessed from the TRY plant trait database (Fig. 1a). The 9,781 forest inventory plots represented a subsample of the full GFBi data of >1 million records to ensure an equal representation of different forest biomes across the globe and avoid modelling bias caused by uneven spatial sampling (see Methods). Using information on both the occurrences of individual trees per plot and the basal-area weighted occurrences of each individual, we calculated plot-level leaf-type proportions (1) on the basis of the leaf type of each individual tree (tree-based leaf type) and (2) by weighting each tree by its basal area (area-based leaf type) (see Methods section 'Tree leaf-type data'). The first estimate allowed us to map the tree densities of each leaf type, while the second estimate allowed us to map the leaf types represented by the largest trees in a plot. To interpolate patterns across the globe, we combined our plot-level forest leaf information with 58 environmental variables, representing vegetation characteristics, climate, topography, vegetation, soil conditions and human-related features. We also tested the relative importance of 29 commonly studied variables on leaf-type variation and characterized the relationships between environmental features and leaf type.
To characterize the variation in forest leaf type, we first summarized the proportion of evergreen vs deciduous (leaf habit) and broadleaved vs needle-leaved (leaf form) individuals within each plot (Fig. 1a). Across all 9,781 forest plots, 13% contain exclusively broadleaved evergreen trees, 7% contain exclusively broadleaved deciduous trees, 12% contain exclusively needle-leaved evergreen trees, and the remaining 68% contain a mixture of leaf habits and forms (Fig. 1b).
Our random forest model predicted 75% of the global variation of forest leaf-type classes (10-fold cross-validation , see Methods). We also ran spatially buffered leave-one-out cross-validation (LOO-CV) to account for the potential effect of spatial autocorrelation on model evaluation statistics, which resulted in a coefficient of determination (R2) of 0.56 at a buffer radius of 300 km (see Methods and Supplementary Fig. 1). Within each class, our model explained 90%, 59%, 75% and 29% (10-fold cross-validation R2) of the global variation in the proportion of broadleaved evergreen, broadleaved deciduous, needle-leaved evergreen and needle-leaved deciduous trees within forests, respectively. These predictive relationships were then used to upscale the observations across the global extent of forest coverage (Fig. 2).
While considerable uncertainties exist for individual predictions at the pixel level, these uncertainties rapidly decrease as the model is projected to a larger area (<5% at 250 km2; Supplementary Fig. 2). Our model shows high confidence in tropical and boreal forests, whereas predictive confidence is lower in mixed forests and ecotones between different types of forest (Supplementary Figs. 3 and 4). For example, models for broadleaved evergreen and deciduous species share low predictive confidence in central African savanna regions. Similarly, low predictive confidence can be found across eastern Russian mixed forests. The low predictive confidence for these regions can be attributed to low sample size and mixed occurrence of broadleaved evergreen and deciduous species, as well as differences in the year of observation across forest survey plots, which may lead to larger uncertainties in ecotones where forest types can shift in relatively short time periods.
To assess the relative importance of environmental features on variation in leaf types, we ran random forest models including a range of environmental variables (see Methods). We combined these environmental factors into three groups (climate, soil and topography). To test for the relative importance of climate, soil and topographic characteristics, we ran a principal component analysis (PCA) for each of these variable groups and selected the first six principal components, which explained ≥90% of the total variation across all variables. Our analyses show that climate and soil characteristics jointly determine the global leaf-type distribution (Fig. 3a). With respect to variation in leaf habit, temperature fluctuations, that is, isothermality and temperature seasonality, were the most important variables (Fig. 3c). Yet, the entire combination of soil features (first six principal components of soil variables) was as important as climate for predicting leaf habit in our random forest model (Fig. 3a), suggesting that soil characteristics play an important role in the global distribution of evergreen ‘vs’ deciduous species. Especially soil texture, in combination with pH, appears to affect global variation in tree leaf habit. Acidic soils, commonly found in tropical and boreal regions, inhibit nutrient (N and P) supply by reducing cation availability and limiting tree growth rates. This might explain why broadleaved deciduous species that require high nutrient supply are less abundant in regions with acidic soil. Broadleaved evergreen species, by contrast, may better cope with nutrient poor, acidic soils. Similarly, needle-leaved evergreen species that can maintain growth even under low nutrient supply are more abundant in regions with acidic soil. The high tannin and phenol contents of needle leaves further contribute to the acidification of soils, probably creating a positive feedback towards coniferous dominance. Overall, our results point towards a feedback between tree leaf habits and soil conditions, highlighting the connection between physical soil features and soil water and nutrient availability.
Variation in leaf form was best predicted by climate variables (Fig. 3b), with the most important variable being temperature of the coldest quarter (Fig. 3d). By contrast to leaf habit, soil, topographic and vegetation features were less important in driving variation in leaf form, indicating adaptation to extreme climates, cold winters or extended dry seasons, as a major determinant of leaf form. This supports previous studies indicating that diverse leaf forms evolved to adapt to different climates.
To quantify the proportions of different leaf types across global forests, we combined a global tree density distribution map with our individual-based leaf type models (see Methods and Supplementary Fig. 5). At the global scale, we estimate that of the ∼3 trillion adult trees presently existing, 29.1% (95% CI = 27.5–30.7%) are broadleaved evergreen, 27.1% (23.8–30.6%) are broadleaved deciduous, 38.4% (35.2–41.6%) are needle-leaved evergreen and 5.4% (4.3–6.6%) are needle-leaved deciduous (Fig. 4 and Supplementary Fig. 6). Even though needle-leaved tree species comprise less than 2% of the world’s estimated 73,000 tree species, this small fraction of diversity represents around 44% of individual trees on Earth.
Of the 1.15 trillion needle-leaved evergreen trees growing worldwide, the majority (64.7%) are found in boreal regions, while 23.1%, 8.9%, 2.8% are found in temperate, arid and tropical regions, respectively (Fig. 4). In contrast, of the 0.87 trillion broadleaved evergreen trees growing worldwide, the majority exists in tropical (57.2%) and arid (29.8%) regions, with 6.3% and 4.3% existing in temperate and boreal regions, respectively (Fig. 4). Broadleaved deciduous trees show the widest range of occurrences. Of the 0.81 trillion broadleaved deciduous trees, 34.5% are found in boreal regions, 28.8% in temperate regions, 22.5% in arid regions and 11.5% in tropical regions (Fig. 4). We further estimate that there are 0.16 trillion needle-leaved deciduous trees across the globe, the vast majority of which grow in boreal regions.
Using our basal-area-based model of leaf types, we were able to estimate the biomass contribution of each leaf type within individual forest pixels by integrating our data with a recently published aboveground biomass map. Our analysis revealed that broadleaved evergreen trees store the largest proportion of global forest biomass, accounting for 54.4% (335.7 Gt) out of the total biomass of 617 Gt. Broadleaved deciduous trees contribute 22.1% (136.2 Gt), needle-leaved evergreen trees contribute 20.5% (126.4 Gt) and needle-leaved deciduous trees contribute 4% (18.7 Gt) (Supplementary Fig. 7). Interestingly, despite there being 42% more evergreen needle-leaved trees compared with broadleaved evergreen trees, their biomass contribution is 62% (209.3 Gt) lower. This distribution of biomass across different leaf types provides valuable insights into the carbon storage capacity of diverse forest ecosystems.
Climate change will strongly affect the functioning of terrestrial ecosystems by altering growth, mortality and reproduction of trees and their interactions with leaf form and habit. Our models allowed us to highlight areas of potential risk by identifying the regions where future climates will shift to conditions that currently support leaf types different from the prevailing ones. In these regions, trees are likely to experience more climatic stress in the future. To assess the extent and distribution of future changes in forest leaf-type climate envelopes, we projected our leaf-type models into the future using three climate change scenarios (low-emission scenario (SSP1–RCP2.6), business-as-usual scenario (SSP3–RCP7) and high-emission scenario (SSP5–RCP8.5)). To do so, we used our random forest models of present leaf type distributions and replaced all climate variables reflecting the 1979–2013 climate (see Supplementary Fig. 1) with climate model projections for 2071–2100 while keeping soil, topographic, vegetative and anthropogenic characteristics constant.
The results suggest that forests will experience substantial shifts in leaf-type climate conditions. Depending on future emissions pathways, 17 to 34% of future forested regions are projected to experience a climate by the end of the century that currently supports leaf types different from the prevailing ones (Fig. 5 and Supplementary Fig. 8; see Supplementary Fig. 9 for an alternative definition of forest types). The climate conditions that have historically supported evergreen forests are declining as global conditions shift towards those that have historically supported more deciduous forests, and this appears to be the case for both broadleaved and needle-leaved species (Supplementary Fig. 10). Specifically, 7–20% of the broadleaved evergreen forests are likely to experience changes towards conditions that currently support deciduous species (Fig. 5). Similarly, 29–67% of the needle-leaved evergreen forests will experience changes towards climate conditions that currently support mixed or deciduous forests (Fig. 5 and Supplementary Figs. 8–12). If these climate projections are realized, plants in those regions must either tolerate more stressful environmental conditions or shift their distributions, causing changes in forest composition. Previous studies predicting ecoregion shifts have also shown a heightened vulnerability to changing climate conditions, surpassing even the susceptibility of leaf types.
To address the uncertainties associated with our predictions, we employed a subsampling approach, running 100 independent models. This allowed us to assess the range of variability in our predictions and identify areas with lower predictive confidence. The resulting maps of model uncertainty (Supplementary Fig. 3) highlight regions where caution should be exercised when interpreting our predictions.
Incorporating local point observations into the model training was not feasible because the ground-based forest inventory data we used did not include field-measured environmental and soil characteristics. Although we examined additional point-level observations from the World Soil Information Service (WOSIS) dataset (see next paragraph), these observations often did not align spatially with the forest inventory plots, resulting in a reduced sample size (>80% reduction) and geographic bias (Fig. 1 and Supplementary Fig. 14). Moreover, training the model using point observations while relying on soil layers for model prediction could introduce further uncertainties and biases. To avoid these limitations, we conducted both model training and predictions using the same soil layers.
Nevertheless, we conducted analyses to determine the agreement between results based on global soil layers and point-level soil observations. We matched our forest inventory dataset with the WOSIS dataset, which contains local point observations of soil features. These analyses indicated that: (1) model predictions remained consistent when using point observations instead of global layers of soil characteristics (97–99% similarity in predictions), (2) the global soil layers exhibited good agreement with point observations for most soil variables, particularly the four most important variables (R2 = 0.42–0.62; Fig. 3c and Supplementary Fig. 13) and (3) the inferred importance of soil features in leaf type variation remained similar (<5% difference) when using point observations instead of soil layers as predictors (Supplementary Fig. 14). These analyses underscore the crucial role of soil features in determining global leaf-type variation.
In total, 58 global environmental layers reflecting climate, topography, vegetation, anthropogenic and soil (at 0 cm to 100 cm depth) characteristics were used as covariates in our analyses (Supplementary Fig. 1). Climate variables reflect the average climate between 1979–2013. All layers were standardized to 30-arc-second resolution (∼1 km2 at the equator).
To train global models of forest leaf types, we first ran a grid-search procedure, exploring the results of a suite of random forest models with different hyperparameters. The hyperparameters that were allowed to vary were the number of random trees (10, 20, 50, 100 and 250), the number of variables sampled at each split (1, 2, 4, 5, 8, 10, 15, 20 and 30) and the minimum sample size at the end of the nodes (1, 2, 5, 10, 15, 20 and 30); subsampling rate was constant at 0.632. To quantify predictive accuracy, we used the Bhattacharyya coefficient to compare the predicted and observed probabilities for each pixel, as is commonly used in image processing and feature extraction. Because four probability classes within a pixel are not independent, we cannot use standard loss functions to estimate predictive accuracy. Instead, we used the Bhattacharyya coefficient, given by , which quantifies the similarity between two vectors, O and P, with n categories. The Bhattacharyya coefficient falls within (0,1), equalling one only when for all i within a pixel (that is, when the predictions exactly match the observed) and zero when the vectors are completely disjoint. To evaluate overall model performance, we then adopted a similar approach to that of ref. for multinomial data, using the Bhattacharyya coefficients to calculate a pseudo-R2 () (equations 1∼5) on the basis of 10-fold cross-validation:in whichandwhere, MAE is the mean absolute error, O is the observed coverage of leaf type i, P is the predicted coverage of leaf type i (on out-of-fit data via cross-validation), is the average coverage of leaf type i across all the observations and n is the number of forest leaf types (here, n = 4). Note that the summation terms in equations (3) and (5) are the Bhattacharyya coefficients between the observed multinomial distribution and the predicted distribution (equation 3) and the average distribution (equation 5). Thus, equation (3) is the predictive loss term for each pixel, with the MAEmodel in equation (2) giving the average across all pixels, which equals zero only when the predictions perfectly match the observations. Similarly, equation (5) is the loss term for each pixel when using group-level means, with MAEmean in equation (4) giving the average loss across all pixels. By comparing MAEmodel to MAEmean, we follow the standard approach for computing R2 by quantifying performances relative to human predictions, with R2 =  equalling 1 only when our predictions are perfect (MAEmodel = 0) and R2 being 0 when our predictions are equal to or worse than the mean. Importantly, as suggested in ref. , equation (1) is estimated using out-of-fit cross-validated data, where the predicted values are estimated by omitting the corresponding observed values from the training data, with the resulting pseudo-R2 used to assess our four-element model output.
To create the final maps (Fig. 2 and Supplementary Fig. 5), we used the random forest model for each training dataset with the optimal suite of hyperparameters based on the from the grid search. Extrapolation of our predictions across global forest areas resulted in 100 four-band global layers, with each band representing the global probability of one forest leaf type. We averaged the predictions from these 100 model outputs by taking the mean for the final map. We calculated the 95% confidence intervals across the 100 model layers to represent sampling uncertainty.
As an alternative mapping approach, we used an independent tree-based classification and regression trees (CART) model (Supplementary Fig. 15). This approach was used to test whether model performance depends on model type. If the two models (random forest and CART) show similar results, this indicates that predictions are not biased by model selection. Using the same independent ‘Tallo’ dataset used for testing the robustness of the random forest model, the CART model had an explanatory power of 0.46, which is similar to the of 0.47 of the random forest model. When directly comparing both models, the CART model showed 87% similarity () with the random forest model. This suggests that our maps and predictions do not depend on the type of model, and we report the random forest model results throughout the main text since it showed slightly higher accuracy.
To generate global layers of soil features, the Soil Grids dataset relies on machine learning models informed by global, spatially explicit information on various climate variables. This global interpolation of soil information using climate data may thus lead to an overestimation of the covariation between climate and soil layers while reducing small-scale heterogeneity in soil features. To assess whether this potential caveat affects our results, we used point-level soil measurements from the WOSIS dataset, including clay content, silt content, pH and sand content. To spatially match this dataset with the full GFBi dataset containing more than 1.1 million plots across the globe, we used the ‘geosphere’ R package. We then selected the nearest soil observation that fell within 250 m or 1,000 m of the centre of each forest plot. This resulted in a spatial match between soil measurements and forest plots in 146 cases for the 250 m radius and in 1,893 cases for the 1,000 m radius (Supplementary Fig. 14a). To test whether model performance and predictions change when using point observations of soil features instead of global layers, we then trained random forest models using either WOSIS or Soil Grids soil data along with information on climate, topography, human and vegetation characteristics (Supplementary Fig. 14b–d). For both the 250 m and 1,000 m buffer radii, the models showed a high degree of agreement () between model predictions.
In a second step, we then tested whether the use of point observations vs global layers of soil features affects the estimated importance of variables in driving leaf-type variation. The analysis showed a slight reduction in the importance of soil variables of 5–6% when using point observations rather than Soil Grids data (Supplementary Fig. 13), which is probably driven by the slightly lower covariation of soil point data with climate variables (Supplementary Fig. 16). Nevertheless, the results remain similar, showing that this difference is unlikely to affect the conclusions of our study.
To evaluate how well our training dataset represents the full multivariate environmental covariate space, we performed a principal-component-analysis-based approach following refs. . We projected the covariate composite into the same space using the centreing values, scaling values and eigenvectors from the principal component analysis of the training data. We created the convex hulls for each of the bivariate combinations from the top principal components and classified whether each pixel falls in or outside each of these convex hulls. We used 24 principal components with 276 combinations for all covariates for the sampling dataset. This analysis showed that 99.2% of land pixels (778,975,911 of 785,150,461) cover at least 90% of the environmental variables present in our training data locations (Supplementary Fig. 17).
To account for the potential effect of spatial autocorrelation in model residuals on model validation statistics, we ran spatially buffered LOO-CV for a series of buffer radii from 10 m to 500 km. In LOO-CV, each observation is predicted on the basis of a model that includes all data outside the respective buffer radius. This results in 9,781 (=total number of observations) separate models for each buffer radius. Model performance was evaluated on the basis of . To assess the range of spatial autocorrelation, we ran semi-variograms for random cross-validation and LOO-CV model residuals in each forest type, showing that regardless of buffer radius or validation type, our residuals show weak spatial autocorrelation (Supplementary Fig. 1). Nevertheless, when eliminating any potential effects of spatial autocorrelation on model performance by applying large buffer radii of 300 km and 500 km, the out-of-sample remained high (0.56 and 0.53, respectively).
To evaluate the relative importance of environmental features on forest leaf-type variation, out of the total 58 environmental covariates, we tested the effects of 29 commonly used environmental characteristics using random forest models (see Supplementary Fig. 1). We separated the environmental features into four major groups: climate, soil, topography and vegetation. To equally represent each of the three groups in the model and minimize collinearity, we selected the first six principal components from climate, soil and topography groups. These six principal components explained ≥90% of the total variation across all group variables. We included all the three vegetation characteristics, which are forest age, tree density and canopy height, into the analysis without computing the principal components. We then computed the variance inflation factors (VIFs) across all 21 principal components using the R package HH. All VIFs were lower than 4, suggesting sufficient independence among covariates. The principal components were then used as predictors in random forest models with different combinations of hyperparameters (that is, 1 to 12 samples per split and a minimum sample size at the end of the nodes of 1 to 10), and we selected the ten best models with the highest coefficient of determining variation (R2). Variable importance was determined by calculating the relative influence of each variable, expressed by the variable’s attributed reduction in squared error. To quantify the importance of individual environmental factors, we used the same combinations of hyperparameters. Based on R2, the ten best random forest models were again chosen to explore the relative importance of each factor. The random forest models were run via the R package h2o.
To assess the extent and distribution of future changes in forest leaf-type climate envelopes, we projected our leaf-type models into the future using CMIP6 climate models for the time interval of 2071–2100 and three emission scenarios (SSP1–RCP2.6, SSP3–RCP7 and SSP5–RCP8.5). The CMIP6 climate models were extracted following the ISIMIP3b protocol and included five models (gfdl-esm4, ukesm1-0-II, mpi-esm1-2-hr, ipsl-cm6a-lr and mri-esm2-0). We projected the global forest leaf-type distribution for each emission scenario on the basis of each of the five climate models using our random forest models. To do so, we used our 100 random forest models of present leaf-type distributions and replaced the climate variables (‘bioclim’ variables from the CHELSA dataset, marked with hashtags in Supplementary Fig. 1) with CMIP6 climate model projections while keeping soil, topographic, vegetative and anthropogenic characteristics constant. For each emission scenario and CMIP6 model, we aggregated the 100 layers by taking the mean. We then aggregated the five CMIP6 model projections by taking the mean. For both evergreen vs deciduous and broadleaf vs needle-leaf proportions, we then subtracted the present predictions by the averaged model projections for the three climate change scenarios. We then summarized the predictions for each of the three climate change scenarios across latitude, aggregating predictions for each half degree. This allowed us to identify areas where future climates will shift to conditions that currently support leaf types different from the prevailing ones (Fig. 5 and Supplementary Fig. 9). To do so, we first obtained information on the forest type that currently is most abundant in each pixel (using area-based leaf-type proportions). Pixels in which > 60% of the forest area was covered by a single leaf type were assigned to that respective leaf type. Pixels in which none of the leaf types covered > 60% of the forest area were categorized as mixed forest (Fig. 5). The analysis was also conducted for a forest-area threshold of 80% to ensure that the results are not driven by the choice of the threshold (Supplementary Fig. 9). Following these definitions, we categorized forest pixels into groups using present and future climate scenarios. By scaling pixels by canopy cover, we could then calculate the total areas in which the climate is expected to shift to conditions that currently support a different forest type.
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.


Sections

"[{\"pmc\": \"PMC10654052\", \"pmid\": \"37872262\", \"reference_ids\": [\"Fig1\"], \"section\": \"Main\", \"text\": \"Here we analyse the global distribution of needle-leaved, broadleaved, evergreen and deciduous tree species, by integrating ground-sourced information from 9,781 standardized forest inventory plots in the Global Forest Biodiversity initiative (GFBi) database with species-level leaf habit (evergreen vs deciduous) and leaf form (broadleaf vs needle-leaf) information accessed from the TRY plant trait database (Fig. 1a). The 9,781 forest inventory plots represented a subsample of the full GFBi data of >1 million records to ensure an equal representation of different forest biomes across the globe and avoid modelling bias caused by uneven spatial sampling (see Methods). Using information on both the occurrences of individual trees per plot and the basal-area weighted occurrences of each individual, we calculated plot-level leaf-type proportions (1) on the basis of the leaf type of each individual tree (tree-based leaf type) and (2) by weighting each tree by its basal area (area-based leaf type) (see Methods section 'Tree leaf-type data'). The first estimate allowed us to map the tree densities of each leaf type, while the second estimate allowed us to map the leaf types represented by the largest trees in a plot. To interpolate patterns across the globe, we combined our plot-level forest leaf information with 58 environmental variables, representing vegetation characteristics, climate, topography, vegetation, soil conditions and human-related features. We also tested the relative importance of 29 commonly studied variables on leaf-type variation and characterized the relationships between environmental features and leaf type.\"}, {\"pmc\": \"PMC10654052\", \"pmid\": \"37872262\", \"reference_ids\": [\"Fig1\", \"Fig1\"], \"section\": \"Mapping global forest leaf types\", \"text\": \"To characterize the variation in forest leaf type, we first summarized the proportion of evergreen vs deciduous (leaf habit) and broadleaved vs needle-leaved (leaf form) individuals within each plot (Fig. 1a). Across all 9,781 forest plots, 13% contain exclusively broadleaved evergreen trees, 7% contain exclusively broadleaved deciduous trees, 12% contain exclusively needle-leaved evergreen trees, and the remaining 68% contain a mixture of leaf habits and forms (Fig. 1b).\"}, {\"pmc\": \"PMC10654052\", \"pmid\": \"37872262\", \"reference_ids\": [\"MOESM1\", \"Fig2\"], \"section\": \"Mapping global forest leaf types\", \"text\": \"Our random forest model predicted 75% of the global variation of forest leaf-type classes (10-fold cross-validation , see Methods). We also ran spatially buffered leave-one-out cross-validation (LOO-CV) to account for the potential effect of spatial autocorrelation on model evaluation statistics, which resulted in a coefficient of determination (R2) of 0.56 at a buffer radius of 300\\u2009km (see Methods and Supplementary Fig. 1). Within each class, our model explained 90%, 59%, 75% and 29% (10-fold cross-validation R2) of the global variation in the proportion of broadleaved evergreen, broadleaved deciduous, needle-leaved evergreen and needle-leaved deciduous trees within forests, respectively. These predictive relationships were then used to upscale the observations across the global extent of forest coverage (Fig. 2).\"}, {\"pmc\": \"PMC10654052\", \"pmid\": \"37872262\", \"reference_ids\": [\"MOESM1\", \"MOESM1\", \"MOESM1\"], \"section\": \"Mapping global forest leaf types\", \"text\": \"While considerable uncertainties exist for individual predictions at the pixel level, these uncertainties rapidly decrease as the model is projected to a larger area (<5% at 250\\u2009km2; Supplementary Fig. 2). Our model shows high confidence in tropical and boreal forests, whereas predictive confidence is lower in mixed forests and ecotones between different types of forest (Supplementary Figs. 3 and 4). For example, models for broadleaved evergreen and deciduous species share low predictive confidence in central African savanna regions. Similarly, low predictive confidence can be found across eastern Russian mixed forests. The low predictive confidence for these regions can be attributed to low sample size and mixed occurrence of broadleaved evergreen and deciduous species, as well as differences in the year of observation across forest survey plots, which may lead to larger uncertainties in ecotones where forest types can shift in relatively short time periods.\"}, {\"pmc\": \"PMC10654052\", \"pmid\": \"37872262\", \"reference_ids\": [\"Fig3\", \"Fig3\", \"Fig3\"], \"section\": \"Global environmental drivers of forest leaf-type variation\", \"text\": \"To assess the relative importance of environmental features on variation in leaf types, we ran random forest models including a range of environmental variables (see Methods). We combined these environmental factors into three groups (climate, soil and topography). To test for the relative importance of climate, soil and topographic characteristics, we ran a principal component analysis (PCA) for each of these variable groups and selected the first six principal components, which explained \\u226590% of the total variation across all variables. Our analyses show that climate and soil characteristics jointly determine the global leaf-type distribution (Fig. 3a). With respect to variation in leaf habit, temperature fluctuations, that is, isothermality and temperature seasonality, were the most important variables (Fig. 3c). Yet, the entire combination of soil features (first six principal components of soil variables) was as important as climate for predicting leaf habit in our random forest model (Fig. 3a), suggesting that soil characteristics play an important role in the global distribution of evergreen \\u2018vs\\u2019 deciduous species. Especially soil texture, in combination with pH, appears to affect global variation in tree leaf habit. Acidic soils, commonly found in tropical and boreal regions, inhibit nutrient (N and P) supply by reducing cation availability and limiting tree growth rates. This might explain why broadleaved deciduous species that require high nutrient supply are less abundant in regions with acidic soil. Broadleaved evergreen species, by contrast, may better cope with nutrient poor, acidic soils. Similarly, needle-leaved evergreen species that can maintain growth even under low nutrient supply are more abundant in regions with acidic soil. The high tannin and phenol contents of needle leaves further contribute to the acidification of soils, probably creating a positive feedback towards coniferous dominance. Overall, our results point towards a feedback between tree leaf habits and soil conditions, highlighting the connection between physical soil features and soil water and nutrient availability.\"}, {\"pmc\": \"PMC10654052\", \"pmid\": \"37872262\", \"reference_ids\": [\"Fig3\", \"Fig3\"], \"section\": \"Global environmental drivers of forest leaf-type variation\", \"text\": \"Variation in leaf form was best predicted by climate variables (Fig. 3b), with the most important variable being temperature of the coldest quarter (Fig. 3d). By contrast to leaf habit, soil, topographic and vegetation features were less important in driving variation in leaf form, indicating adaptation to extreme climates, cold winters or extended dry seasons, as a major determinant of leaf form. This supports previous studies indicating that diverse leaf forms evolved to adapt to different climates.\"}, {\"pmc\": \"PMC10654052\", \"pmid\": \"37872262\", \"reference_ids\": [\"MOESM1\", \"Fig4\", \"MOESM1\"], \"section\": \"Computation of tree densities with different leaf types\", \"text\": \"To quantify the proportions of different leaf types across global forests, we combined a global tree density distribution map with our individual-based leaf type models (see Methods and Supplementary Fig. 5). At the global scale, we estimate that of the \\u223c3 trillion adult trees presently existing, 29.1% (95% CI\\u2009=\\u200927.5\\u201330.7%) are broadleaved evergreen, 27.1% (23.8\\u201330.6%) are broadleaved deciduous, 38.4% (35.2\\u201341.6%) are needle-leaved evergreen and 5.4% (4.3\\u20136.6%) are needle-leaved deciduous (Fig. 4 and Supplementary Fig. 6). Even though needle-leaved tree species comprise less than 2% of the world\\u2019s estimated 73,000 tree species, this small fraction of diversity represents around 44% of individual trees on Earth.\"}, {\"pmc\": \"PMC10654052\", \"pmid\": \"37872262\", \"reference_ids\": [\"Fig4\", \"Fig4\", \"Fig4\"], \"section\": \"Computation of tree densities with different leaf types\", \"text\": \"Of the 1.15 trillion needle-leaved evergreen trees growing worldwide, the majority (64.7%) are found in boreal regions, while 23.1%, 8.9%, 2.8% are found in temperate, arid and tropical regions, respectively (Fig. 4). In contrast, of the 0.87 trillion broadleaved evergreen trees growing worldwide, the majority exists in tropical (57.2%) and arid (29.8%) regions, with 6.3% and 4.3% existing in temperate and boreal regions, respectively (Fig. 4). Broadleaved deciduous trees show the widest range of occurrences. Of the 0.81 trillion broadleaved deciduous trees, 34.5% are found in boreal regions, 28.8% in temperate regions, 22.5% in arid regions and 11.5% in tropical regions (Fig. 4). We further estimate that there are 0.16 trillion needle-leaved deciduous trees across the globe, the vast majority of which grow in boreal regions.\"}, {\"pmc\": \"PMC10654052\", \"pmid\": \"37872262\", \"reference_ids\": [\"MOESM1\"], \"section\": \"Computation of tree densities with different leaf types\", \"text\": \"Using our basal-area-based model of leaf types, we were able to estimate the biomass contribution of each leaf type within individual forest pixels by integrating our data with a recently published aboveground biomass map. Our analysis revealed that broadleaved evergreen trees store the largest proportion of global forest biomass, accounting for 54.4% (335.7\\u2009Gt) out of the total biomass of 617\\u2009Gt. Broadleaved deciduous trees contribute 22.1% (136.2\\u2009Gt), needle-leaved evergreen trees contribute 20.5% (126.4\\u2009Gt) and needle-leaved deciduous trees contribute 4% (18.7\\u2009Gt) (Supplementary Fig. 7). Interestingly, despite there being 42% more evergreen needle-leaved trees compared with broadleaved evergreen trees, their biomass contribution is 62% (209.3\\u2009Gt) lower. This distribution of biomass across different leaf types provides valuable insights into the carbon storage capacity of diverse forest ecosystems.\"}, {\"pmc\": \"PMC10654052\", \"pmid\": \"37872262\", \"reference_ids\": [\"MOESM1\"], \"section\": \"Climatic risk assessment of future leaf types\", \"text\": \"Climate change will strongly affect the functioning of terrestrial ecosystems by altering growth, mortality and reproduction of trees and their interactions with leaf form and habit. Our models allowed us to highlight areas of potential risk by identifying the regions where future climates will shift to conditions that currently support leaf types different from the prevailing ones. In these regions, trees are likely to experience more climatic stress in the future. To assess the extent and distribution of future changes in forest leaf-type climate envelopes, we projected our leaf-type models into the future using three climate change scenarios (low-emission scenario (SSP1\\u2013RCP2.6), business-as-usual scenario (SSP3\\u2013RCP7) and high-emission scenario (SSP5\\u2013RCP8.5)). To do so, we used our random forest models of present leaf type distributions and replaced all climate variables reflecting the 1979\\u20132013 climate (see Supplementary Fig. 1) with climate model projections for 2071\\u20132100 while keeping soil, topographic, vegetative and anthropogenic characteristics constant.\"}, {\"pmc\": \"PMC10654052\", \"pmid\": \"37872262\", \"reference_ids\": [\"Fig5\", \"MOESM1\", \"MOESM1\", \"MOESM1\", \"Fig5\", \"Fig5\", \"MOESM1\", \"MOESM1\"], \"section\": \"Climatic risk assessment of future leaf types\", \"text\": \"The results suggest that forests will experience substantial shifts in leaf-type climate conditions. Depending on future emissions pathways, 17 to 34% of future forested regions are projected to experience a climate by the end of the century that currently supports leaf types different from the prevailing ones (Fig. 5 and Supplementary Fig. 8; see Supplementary Fig. 9 for an alternative definition of forest types). The climate conditions that have historically supported evergreen forests are declining as global conditions shift towards those that have historically supported more deciduous forests, and this appears to be the case for both broadleaved and needle-leaved species (Supplementary Fig. 10). Specifically, 7\\u201320% of the broadleaved evergreen forests are likely to experience changes towards conditions that currently support deciduous species (Fig. 5). Similarly, 29\\u201367% of the needle-leaved evergreen forests will experience changes towards climate conditions that currently support mixed or deciduous forests (Fig. 5 and Supplementary Figs. 8\\u201312). If these climate projections are realized, plants in those regions must either tolerate more stressful environmental conditions or shift their distributions, causing changes in forest composition. Previous studies predicting ecoregion shifts have also shown a heightened vulnerability to changing climate conditions, surpassing even the susceptibility of leaf types.\"}, {\"pmc\": \"PMC10654052\", \"pmid\": \"37872262\", \"reference_ids\": [\"MOESM1\"], \"section\": \"Methodological considerations\", \"text\": \"To address the uncertainties associated with our predictions, we employed a subsampling approach, running 100 independent models. This allowed us to assess the range of variability in our predictions and identify areas with lower predictive confidence. The resulting maps of model uncertainty (Supplementary Fig. 3) highlight regions where caution should be exercised when interpreting our predictions.\"}, {\"pmc\": \"PMC10654052\", \"pmid\": \"37872262\", \"reference_ids\": [\"Fig1\", \"MOESM1\"], \"section\": \"Methodological considerations\", \"text\": \"Incorporating local point observations into the model training was not feasible because the ground-based forest inventory data we used did not include field-measured environmental and soil characteristics. Although we examined additional point-level observations from the World Soil Information Service (WOSIS) dataset (see next paragraph), these observations often did not align spatially with the forest inventory plots, resulting in a reduced sample size (>80% reduction) and geographic bias (Fig. 1 and Supplementary Fig. 14). Moreover, training the model using point observations while relying on soil layers for model prediction could introduce further uncertainties and biases. To avoid these limitations, we conducted both model training and predictions using the same soil layers.\"}, {\"pmc\": \"PMC10654052\", \"pmid\": \"37872262\", \"reference_ids\": [\"Fig3\", \"MOESM1\", \"MOESM1\"], \"section\": \"Methodological considerations\", \"text\": \"Nevertheless, we conducted analyses to determine the agreement between results based on global soil layers and point-level soil observations. We matched our forest inventory dataset with the WOSIS dataset, which contains local point observations of soil features. These analyses indicated that: (1) model predictions remained consistent when using point observations instead of global layers of soil characteristics (97\\u201399% similarity in predictions), (2) the global soil layers exhibited good agreement with point observations for most soil variables, particularly the four most important variables (R2\\u2009=\\u20090.42\\u20130.62; Fig. 3c and Supplementary Fig. 13) and (3) the inferred importance of soil features in leaf type variation remained similar (<5% difference) when using point observations instead of soil layers as predictors (Supplementary Fig. 14). These analyses underscore the crucial role of soil features in determining global leaf-type variation.\"}, {\"pmc\": \"PMC10654052\", \"pmid\": \"37872262\", \"reference_ids\": [\"MOESM1\"], \"section\": \"Environmental covariates\", \"text\": \"In total, 58 global environmental layers reflecting climate, topography, vegetation, anthropogenic and soil (at 0\\u2009cm to 100\\u2009cm depth) characteristics were used as covariates in our analyses (Supplementary Fig. 1). Climate variables reflect the average climate between 1979\\u20132013. All layers were standardized to 30-arc-second resolution (\\u223c1\\u2009km2 at the equator).\"}, {\"pmc\": \"PMC10654052\", \"pmid\": \"37872262\", \"reference_ids\": [\"Equ1\", \"Equ5\", \"Equ3\", \"Equ5\", \"Equ3\", \"Equ5\", \"Equ3\", \"Equ2\", \"Equ5\", \"Equ4\", \"Equ1\"], \"section\": \"Random forest modelling\", \"text\": \"To train global models of forest leaf types, we first ran a grid-search procedure, exploring the results of a suite of random forest models with different hyperparameters. The hyperparameters that were allowed to vary were the number of random trees (10, 20, 50, 100 and 250), the number of variables sampled at each split (1, 2, 4, 5, 8, 10, 15, 20 and 30) and the minimum sample size at the end of the nodes (1, 2, 5, 10, 15, 20 and 30); subsampling rate was constant at 0.632. To quantify predictive accuracy, we used the Bhattacharyya coefficient to compare the predicted and observed probabilities for each pixel, as is commonly used in image processing and feature extraction. Because four probability classes within a pixel are not independent, we cannot use standard loss functions to estimate predictive accuracy. Instead, we used the Bhattacharyya coefficient, given by , which quantifies the similarity between two vectors, O and P, with n categories. The Bhattacharyya coefficient falls within (0,1), equalling one only when  for all i within a pixel (that is, when the predictions exactly match the observed) and zero when the vectors are completely disjoint. To evaluate overall model performance, we then adopted a similar approach to that of ref.  for multinomial data, using the Bhattacharyya coefficients to calculate a pseudo-R2 () (equations 1\\u223c5) on the basis of 10-fold cross-validation:in whichandwhere, MAE is the mean absolute error, O is the observed coverage of leaf type i, P is the predicted coverage of leaf type i (on out-of-fit data via cross-validation),  is the average coverage of leaf type i across all the observations and n is the number of forest leaf types (here, n\\u2009=\\u20094). Note that the summation terms in equations (3) and (5) are the Bhattacharyya coefficients between the observed multinomial distribution and the predicted distribution (equation 3) and the average distribution (equation 5). Thus, equation (3) is the predictive loss term for each pixel, with the MAEmodel in equation (2) giving the average across all pixels, which equals zero only when the predictions perfectly match the observations. Similarly, equation (5) is the loss term for each pixel when using group-level means, with MAEmean in equation (4) giving the average loss across all pixels. By comparing MAEmodel to MAEmean, we follow the standard approach for computing R2 by quantifying performances relative to human predictions, with R2\\u2009=\\u2009 equalling 1 only when our predictions are perfect (MAEmodel\\u2009=\\u20090) and R2 being 0 when our predictions are equal to or worse than the mean. Importantly, as suggested in ref. , equation (1) is estimated using out-of-fit cross-validated data, where the predicted values are estimated by omitting the corresponding observed values from the training data, with the resulting pseudo-R2 used to assess our four-element model output.\"}, {\"pmc\": \"PMC10654052\", \"pmid\": \"37872262\", \"reference_ids\": [\"Fig2\", \"MOESM1\"], \"section\": \"Random forest modelling\", \"text\": \"To create the final maps (Fig. 2 and Supplementary Fig. 5), we used the random forest model for each training dataset with the optimal suite of hyperparameters based on the  from the grid search. Extrapolation of our predictions across global forest areas resulted in 100 four-band global layers, with each band representing the global probability of one forest leaf type. We averaged the predictions from these 100 model outputs by taking the mean for the final map. We calculated the 95% confidence intervals across the 100 model layers to represent sampling uncertainty.\"}, {\"pmc\": \"PMC10654052\", \"pmid\": \"37872262\", \"reference_ids\": [\"MOESM1\"], \"section\": \"Random forest modelling\", \"text\": \"As an alternative mapping approach, we used an independent tree-based classification and regression trees (CART) model (Supplementary Fig. 15). This approach was used to test whether model performance depends on model type. If the two models (random forest and CART) show similar results, this indicates that predictions are not biased by model selection. Using the same independent \\u2018Tallo\\u2019 dataset used for testing the robustness of the random forest model, the CART model had an explanatory power of 0.46, which is similar to the  of 0.47 of the random forest model. When directly comparing both models, the CART model showed 87% similarity () with the random forest model. This suggests that our maps and predictions do not depend on the type of model, and we report the random forest model results throughout the main text since it showed slightly higher accuracy.\"}, {\"pmc\": \"PMC10654052\", \"pmid\": \"37872262\", \"reference_ids\": [\"MOESM1\", \"MOESM1\"], \"section\": \"Cross-validation using external data\", \"text\": \"To generate global layers of soil features, the Soil Grids dataset relies on machine learning models informed by global, spatially explicit information on various climate variables. This global interpolation of soil information using climate data may thus lead to an overestimation of the covariation between climate and soil layers while reducing small-scale heterogeneity in soil features. To assess whether this potential caveat affects our results, we used point-level soil measurements from the WOSIS dataset, including clay content, silt content, pH and sand content. To spatially match this dataset with the full GFBi dataset containing more than 1.1 million plots across the globe, we used the \\u2018geosphere\\u2019 R package. We then selected the nearest soil observation that fell within 250\\u2009m or 1,000\\u2009m of the centre of each forest plot. This resulted in a spatial match between soil measurements and forest plots in 146 cases for the 250\\u2009m radius and in 1,893 cases for the 1,000\\u2009m radius (Supplementary Fig. 14a). To test whether model performance and predictions change when using point observations of soil features instead of global layers, we then trained random forest models using either WOSIS or Soil Grids soil data along with information on climate, topography, human and vegetation characteristics (Supplementary Fig. 14b\\u2013d). For both the 250\\u2009m and 1,000\\u2009m buffer radii, the models showed a high degree of agreement () between model predictions.\"}, {\"pmc\": \"PMC10654052\", \"pmid\": \"37872262\", \"reference_ids\": [\"MOESM1\", \"MOESM1\"], \"section\": \"Cross-validation using external data\", \"text\": \"In a second step, we then tested whether the use of point observations vs global layers of soil features affects the estimated importance of variables in driving leaf-type variation. The analysis showed a slight reduction in the importance of soil variables of 5\\u20136% when using point observations rather than Soil Grids data (Supplementary Fig. 13), which is probably driven by the slightly lower covariation of soil point data with climate variables (Supplementary Fig. 16). Nevertheless, the results remain similar, showing that this difference is unlikely to affect the conclusions of our study.\"}, {\"pmc\": \"PMC10654052\", \"pmid\": \"37872262\", \"reference_ids\": [\"MOESM1\"], \"section\": \"Interpolation vs extrapolation in model predictions\", \"text\": \"To evaluate how well our training dataset represents the full multivariate environmental covariate space, we performed a principal-component-analysis-based approach following refs. . We projected the covariate composite into the same space using the centreing values, scaling values and eigenvectors from the principal component analysis of the training data. We created the convex hulls for each of the bivariate combinations from the top principal components and classified whether each pixel falls in or outside each of these convex hulls. We used 24 principal components with 276 combinations for all covariates for the sampling dataset. This analysis showed that 99.2% of land pixels (778,975,911 of 785,150,461) cover at least 90% of the environmental variables present in our training data locations (Supplementary Fig. 17).\"}, {\"pmc\": \"PMC10654052\", \"pmid\": \"37872262\", \"reference_ids\": [\"MOESM1\"], \"section\": \"LOO-CV\", \"text\": \"To account for the potential effect of spatial autocorrelation in model residuals on model validation statistics, we ran spatially buffered LOO-CV for a series of buffer radii from 10\\u2009m to 500\\u2009km. In LOO-CV, each observation is predicted on the basis of a model that includes all data outside the respective buffer radius. This results in 9,781 (=total number of observations) separate models for each buffer radius. Model performance was evaluated on the basis of . To assess the range of spatial autocorrelation, we ran semi-variograms for random cross-validation and LOO-CV model residuals in each forest type, showing that regardless of buffer radius or validation type, our residuals show weak spatial autocorrelation (Supplementary Fig. 1). Nevertheless, when eliminating any potential effects of spatial autocorrelation on model performance by applying large buffer radii of 300\\u2009km and 500\\u2009km, the out-of-sample  remained high (0.56 and 0.53, respectively).\"}, {\"pmc\": \"PMC10654052\", \"pmid\": \"37872262\", \"reference_ids\": [\"MOESM1\"], \"section\": \"Environmental drivers of forest leaf-type variations\", \"text\": \"To evaluate the relative importance of environmental features on forest leaf-type variation, out of the total 58 environmental covariates, we tested the effects of 29 commonly used environmental characteristics using random forest models (see Supplementary Fig. 1). We separated the environmental features into four major groups: climate, soil, topography and vegetation. To equally represent each of the three groups in the model and minimize collinearity, we selected the first six principal components from climate, soil and topography groups. These six principal components explained \\u226590% of the total variation across all group variables. We included all the three vegetation characteristics, which are forest age, tree density and canopy height, into the analysis without computing the principal components. We then computed the variance inflation factors (VIFs) across all 21 principal components using the R package HH. All VIFs were lower than 4, suggesting sufficient independence among covariates. The principal components were then used as predictors in random forest models with different combinations of hyperparameters (that is, 1 to 12 samples per split and a minimum sample size at the end of the nodes of 1 to 10), and we selected the ten best models with the highest coefficient of determining variation (R2). Variable importance was determined by calculating the relative influence of each variable, expressed by the variable\\u2019s attributed reduction in squared error. To quantify the importance of individual environmental factors, we used the same combinations of hyperparameters. Based on R2, the ten best random forest models were again chosen to explore the relative importance of each factor. The random forest models were run via the R package h2o.\"}, {\"pmc\": \"PMC10654052\", \"pmid\": \"37872262\", \"reference_ids\": [\"MOESM1\", \"Fig5\", \"MOESM1\", \"Fig5\", \"MOESM1\"], \"section\": \"Forest types and their future climate envelopes\", \"text\": \"To assess the extent and distribution of future changes in forest leaf-type climate envelopes, we projected our leaf-type models into the future using CMIP6 climate models for the time interval of 2071\\u20132100 and three emission scenarios (SSP1\\u2013RCP2.6, SSP3\\u2013RCP7 and SSP5\\u2013RCP8.5). The CMIP6 climate models were extracted following the ISIMIP3b protocol and included five models (gfdl-esm4, ukesm1-0-II, mpi-esm1-2-hr, ipsl-cm6a-lr and mri-esm2-0). We projected the global forest leaf-type distribution for each emission scenario on the basis of each of the five climate models using our random forest models. To do so, we used our 100 random forest models of present leaf-type distributions and replaced the climate variables (\\u2018bioclim\\u2019 variables from the CHELSA dataset, marked with hashtags in Supplementary Fig. 1) with CMIP6 climate model projections while keeping soil, topographic, vegetative and anthropogenic characteristics constant. For each emission scenario and CMIP6 model, we aggregated the 100 layers by taking the mean. We then aggregated the five CMIP6 model projections by taking the mean. For both evergreen vs deciduous and broadleaf vs needle-leaf proportions, we then subtracted the present predictions by the averaged model projections for the three climate change scenarios. We then summarized the predictions for each of the three climate change scenarios across latitude, aggregating predictions for each half degree. This allowed us to identify areas where future climates will shift to conditions that currently support leaf types different from the prevailing ones (Fig. 5 and Supplementary Fig. 9). To do so, we first obtained information on the forest type that currently is most abundant in each pixel (using area-based leaf-type proportions). Pixels in which > 60% of the forest area was covered by a single leaf type were assigned to that respective leaf type. Pixels in which none of the leaf types covered > 60% of the forest area were categorized as mixed forest (Fig. 5). The analysis was also conducted for a forest-area threshold of 80% to ensure that the results are not driven by the choice of the threshold (Supplementary Fig. 9). Following these definitions, we categorized forest pixels into groups using present and future climate scenarios. By scaling pixels by canopy cover, we could then calculate the total areas in which the climate is expected to shift to conditions that currently support a different forest type.\"}, {\"pmc\": \"PMC10654052\", \"pmid\": \"37872262\", \"reference_ids\": [\"MOESM2\"], \"section\": \"Reporting summary\", \"text\": \"Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.\"}]"

Metadata

"{}"