Skip to contents
chooseGCM_SDM.knit

This Rmarkdown is part of the following article:

Esser, L.F., Bailly, D., Lima, M.R., Ré, R. 2025. chooseGCM: a toolkit to select General Circulation Models in R. Global Change Biology.

Introduction

One source of variability in climate change assessments is the choice of General Circulation Models (GCMs). There is a lack of consensus on how to properly choose GCMs for a given study. An ideal approach would be to encompass all GCMs, but this is exceedingly costly in terms of computational requirements. chooseGCM is a solution for GCMs selection in climate change research. It proposes a solution by supplying different algorithms, metrics and functions to deal with GCMs. We built this Rmarkdown to showcase the impact of the package on Species Distribution Modeling (SDMs), a field largely affected by GCM selection. Here we use the caretSDM package to build SDMs. At the end, we present the processing time to achieve each result and their correlation.

# Install and open CRAN libraries
for(x in c("devtools", "geodata", "terra", "tictoc") ){
  if (!require(x, character.only = TRUE)) {
      install.packages(x, dependencies = TRUE)
      library(x, character.only = TRUE)
  }
}

# Install and open chooseGCM
if (!require("chooseGCM", character.only = TRUE)) {
    devtools::install_github("luizesser/chooseGCM")
    library("chooseGCM", character.only = TRUE)
}

# Install and open caretSDM
if (!require("caretSDM", character.only = TRUE)) {
    devtools::install_github("luizesser/caretSDM")
    library("caretSDM", character.only = TRUE)
}

Methods

To make methods more reproducible, we set a seed, avoiding any random effect that can arise in the modeling process.

set.seed(1)

GCMs data

To predict the impact of climate change, we first downloaded the WorldClim 2.1 (Fick & Hijmans, 2017) data to current scenario and all available GCMs, reducing all other sources of variation by obtaining raster stacks only relative to the SSP5-8.5/2090 (10 arcmin resolution). This SSP is especially useful because it presents extreme scenarios, where variation between GCMs is higher (IPCC, 2022). Then, we imported GCM data to R by applying the function import_gcms, which returns a list of SpatRasters, each SpatRaster being a GCM. This structure will be used as input to all the other functions in the chooseGCM package.

# Download current data
geodata::worldclim_global("bio", 10, path="~/storage/WC_data/current")

# Download GCMs data
gcms <- c("ACCESS-CM2", "ACCESS-ESM1-5", "CanESM5-CanOE", "CMCC-ESM2", 
          "CNRM-CM6-1", "CNRM-CM6-1-HR", "CNRM-ESM2-1", "EC-Earth3-Veg", 
          "EC-Earth3-Veg-LR", "FIO-ESM-2-0", "GISS-E2-1-G", "GISS-E2-1-H", 
          "HadGEM3-GC31-LL", "INM-CM4-8", "INM-CM5-0", "IPSL-CM6A-LR", 
          "MIROC-ES2L", "MIROC6", "MPI-ESM1-2-HR", "MPI-ESM1-2-LR", 
          "MRI-ESM2-0", "UKESM1-0-LL")

for ( g in gcms ) {
  geodata::cmip6_world(model = g,
                       ssp = "585",
                       time = "2081-2100",
                       var = "bioc",
                       res = 10,
                       path = "~/storage/WC_data/future"
  )
}
# Import GCMs
s <- chooseGCM::import_gcms(path = "~/storage/WC_data/future/climate/wc2.1_10m", 
                            gcm_names = c("ACCESS-CM2", "ACCESS-ESM1-5", "CanESM5-CanOE",
                                          "CMCC-ESM2", "CNRM-CM6-1", "CNRM-CM6-1-HR", 
                                          "CNRM-ESM2-1", "EC-Earth3-Veg", "EC-Earth3-Veg-LR",
                                          "FIO-ESM-2-0", "GISS-E2-1-G", "GISS-E2-1-H",
                                          "HadGEM3-GC31-LL", "INM-CM4-8", "INM-CM5-0", 
                                          "IPSL-CM6A-LR", "MIROC-ES2L", "MIROC6", 
                                          "MPI-ESM1-2-HR", "MPI-ESM1-2-LR", "MRI-ESM2-0",
                                          "UKESM1-0-LL"))

Species data and study area

For this demonstration, we will use Araucaria angustifolia, the brazilian pine, as a study case. This flagship species is widely modeled and know, and is a key species for the Atlantic Rainforest biodiversity hotspot. Species records were obtained from GBIF and cleaned by excluding records from outside the native region and NAs. This set of records was then used to build a shape of accessible area by merging buffers of 500 km around each record.

occ_data <- geodata::sp_occurrence("Araucaria", "angustifolia")
occ_data <- occ_data |>
  dplyr::filter(lon >= -64) |> 
  dplyr::filter(lon <= -33) |> 
  dplyr::filter(lat <= -19.5) |> 
  dplyr::filter(lat >= -33)

study_area_Araucaria <- na.omit(occ_data[,c("lon", "lat")]) |>
  sf::st_as_sf(coords= c(1,2), crs = 4326) |> 
  sf::st_buffer(dist = 500000) |> 
  sf::st_union()
plot(study_area_Araucaria, axes=T)

Using the caretSDM package, we transformed the accessible area shape into a 20 km regular grid projected in the coordinate reference system (CRS) 6933, which is the most suitable metric CRS for Brazil. This CRS keeps the area of each cell equal all along the study area, avoiding meridian convergence that could influence area results. We then added predictors data (WorldClim’s current data previously downloaded) to the grid by averaging the values falling in the same cell. Finally, we excluded “quarter” variables to avoid statistical biases.

tictoc::tic("Building sdm_area object")
sa <- caretSDM::sdm_area(sf::st_as_sf(study_area_Araucaria), 
                            cell_size = 20000, 
                            crs = 6933, 
                            gdal = T) |> 
  caretSDM::add_predictors(stars::read_stars(list.files("~/storage/WC_data/current/climate/wc2.1_10m/", 
                                              full.names = T), 
                                   along = "band", 
                                   normalize_path = F)) |> 
  caretSDM::set_predictor_names(c("bio1",  "bio2",  "bio3",  "bio4",  "bio5", 
                                  "bio6",  "bio7",  "bio8",  "bio9",  "bio10", 
                                  "bio11", "bio12", "bio13", "bio14", "bio15",
                                  "bio16", "bio17", "bio18", "bio19"))  |>
  caretSDM::select_predictors(c("bio1", "bio2",  "bio3",  "bio4",  "bio5", "bio6", 
                                "bio7", "bio12", "bio13", "bio14", "bio15"))
## ! Making grid over study area is an expensive task. Please, be patient!
## ℹ Using GDAL to make the grid and resample the variables.
## ! Making grid over the study area is an expensive task. Please, be patient!
## ℹ Using GDAL to make the grid and resample the variables.
sa
##           caretSDM         
## ...........................
## Class                     : sdm_area
## Extent                    : -5893561 -4344152 -3753561 -1924152 (xmin, xmax, ymin, ymax)
## CRS                       : WGS 84 / NSIDC EASE- 
## Resolution                : (20000, 20000) (x, y)
## Number of Predictors      : 11 
## Predictors Names          : bio1, bio2, bio3, bio4, bio5, bio6, bio7, bio12, bio13, bio14, bio15
tictoc::toc()
## Building sdm_area object: 61.312 sec elapsed

To avoid the use of multiple records with the same environmental information, we included species records in the grid, filtering one record per cell. When using caretSDM, we inform the CRS of coordinates and it transforms internally the records to the grid’s CRS.

# Merge the grid data with occurrences data
tictoc::tic("Building occurrences_sdm object")
oc <- caretSDM::occurrences_sdm(na.omit(occ_data[c("species", "lon", "lat")]), crs = 4326) |> 
  caretSDM::join_area(sa)
## Warning: Some records from `occ` do not fall in `pred`.
## ℹ 1 elements from `occ` were excluded.
## ℹ If this seems too much, check how `occ` and `pred` intersect.
oc
##         caretSDM       
## .......................
## Class                 : occurrences
## Species Names         : Araucaria angustifolia 
## Number of presences   : 1706 
## =========================================
## Data:
## Simple feature collection with 6 features and 2 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -5071809 ymin: -3590793 xmax: -4166610 ymax: -2786901
## Projected CRS: WGS 84 / NSIDC EASE-Grid 2.0 Global
##   cell_id                species                  geometry
## 1   10058 Araucaria angustifolia POINT (-4830370 -3565372)
## 2    5333 Araucaria angustifolia POINT (-4166610 -2786901)
## 3    8094 Araucaria angustifolia POINT (-5071809 -3259770)
## 4    5443 Araucaria angustifolia POINT (-4397387 -2823145)
## 5    5443 Araucaria angustifolia POINT (-4397271 -2823148)
## 6   10176 Araucaria angustifolia POINT (-4908967 -3590793)
tictoc::toc()
## Building occurrences_sdm object: 0.231 sec elapsed

Data cleaning routine excluded records from NAs, coordinates in capitals, countries’ centroids, duplicated coordinates, identical latitude/longitude, invalid coordinates, non-terrestrial coordinates, and records falling within the same grid cell. Coordinates from scientific institutions were kept since there are multiple important institutions in the species native region and accessible area.

# Data cleaning routine
tictoc::tic("Data cleaning")
i <- caretSDM::input_sdm(oc, sa) |> 
     caretSDM::data_clean(institutions = FALSE)
## Testing country capitals
## Removed 0 records.
## Testing country centroids
## Removed 0 records.
## Testing duplicates
## Removed 361 records.
## Testing equal lat/lon
## Removed 0 records.
## Testing coordinate validity
## Removed 0 records.
## Testing sea coordinates
## Reading layer `ne_110m_land' from data source `/tmp/RtmpARk8db/ne_110m_land.shp' using driver `ESRI Shapefile'
## Simple feature collection with 127 features and 3 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## Geodetic CRS:  WGS 84
## Removed 1 records.
## [1] "Predictors identified, procceding with grid filter (removing NA and duplicated data)."
i
##             caretSDM           
## ...............................
## Class                         : input_sdm
## --------  Occurrences  --------
## Species Names                 : Araucaria angustifolia 
## Number of presences           : 364 
## Data Cleaning                 : NAs, Capitals, Centroids, Geographically Duplicated, Identical Lat/Long, Invalid, Non-terrestrial, Duplicated Cell (grid) 
## --------  Predictors  ---------
## Number of Predictors          : 11 
## Predictors Names              : bio1, bio2, bio3, bio4, bio5, bio6, bio7, bio12, bio13, bio14, bio15
tictoc::toc()
## Data cleaning: 3.267 sec elapsed

The final number of records for the species was 364.

Variable selection routine

Variable selection was performed in WorldClim 2.1 bioclimatic variables by previously excluding ‘quarter’ variables to avoid statistical biases (Booth, 2022). We also performed a variance inflation factor (VIF) selection routine where the variable with higher VIF in the variable pair with the highest Pearson correlation is excluded until all correlations are lower than 0.5.

# Variable selection routine
tictoc::tic("Variable selection")
i <- caretSDM::vif_predictors(i, th = 0.5)
i
##             caretSDM           
## ...............................
## Class                         : input_sdm
## --------  Occurrences  --------
## Species Names                 : Araucaria angustifolia 
## Number of presences           : 364 
## Data Cleaning                 : NAs, Capitals, Centroids, Geographically Duplicated, Identical Lat/Long, Invalid, Non-terrestrial, Duplicated Cell (grid) 
## --------  Predictors  ---------
## Number of Predictors          : 11 
## Predictors Names              : bio1, bio2, bio3, bio4, bio5, bio6, bio7, bio12, bio13, bio14, bio15 
## Area (VIF)                    : all
## Threshold                     : 0.5
## Selected Variables (VIF)      : bio3, bio4, bio5, bio12, bio15
tictoc::toc()
## Variable selection: 0.206 sec elapsed

Variables remaining were bio3 (isothermality), bio4 (temperature seasonality), bio5 (max temperature of warmest month), bio12 (annual precipitation) and bio15 (precipitation seasonality).

Pseudoabsence selection

We obtained 10 sets of pseudoabsences from outside a bioclimatic envelope built with the selected variables and projected in the accessible area. The number of pseudoabsences was always equal to the number of presences to avoid imbalance problems (Japkowicz & Stephen, 2002).

# Pseudoabsences selection
tictoc::tic("Pseudoabsence selection")
i <- caretSDM::pseudoabsences(i, n_set = 10, method = "bioclim", variables_selected = "vif")
i
##             caretSDM           
## ...............................
## Class                         : input_sdm
## --------  Occurrences  --------
## Species Names                 : Araucaria angustifolia 
## Number of presences           : 364 
## Pseudoabsence methods         :
##     Method to obtain PAs      : bioclim 
##     Number of PA sets         : 10 
##     Number of PAs in each set : 364 
## Data Cleaning                 : NAs, Capitals, Centroids, Geographically Duplicated, Identical Lat/Long, Invalid, Non-terrestrial, Duplicated Cell (grid) 
## --------  Predictors  ---------
## Number of Predictors          : 11 
## Predictors Names              : bio1, bio2, bio3, bio4, bio5, bio6, bio7, bio12, bio13, bio14, bio15 
## Area (VIF)                    : all
## Threshold                     : 0.5
## Selected Variables (VIF)      : bio3, bio4, bio5, bio12, bio15
tictoc::toc()
## Pseudoabsence selection: 0.105 sec elapsed

Modeling framework

As the aim of this modeling is to highlight the effect of GCMs and the GCMs selection, we used only one machine learning algorithm to build models: the naive bayes, which is a simple and quick algorithm. If we had a proper modeling hypothesis to test we should probably use more algorithms covering a variety of modeling approaches. We searched for optimal hyperparameters using a grid-based approach with 10 different combinations. To validate models, we used a cross-validation approach with 4-folds. In each fold, the area under the receiver operating characteristic curve was calculated (AUC).

# Modeling framework
tictoc::tic("Modeling framework")
suppressWarnings(
  i <- caretSDM::train_sdm(i, 
                           algo = c("naive_bayes"), 
                           crtl = caret::trainControl(method = "repeatedcv", 
                                          number = 4, 
                                          repeats = 10, 
                                          search = "grid",
                                          classProbs = TRUE, 
                                          returnResamp = "all", 
                                          summaryFunction = caretSDM::summary_sdm, 
                                          savePredictions = "all"),
                            variables_selected = "vif")
)
i
tictoc::toc()
##             caretSDM           
## ...............................
## Class                         : input_sdm
## --------  Occurrences  --------
## Species Names                 : Araucaria angustifolia 
## Number of presences           : 364 
## Pseudoabsence methods         :
##     Method to obtain PAs      : bioclim 
##     Number of PA sets         : 10 
##     Number of PAs in each set : 364 
## Data Cleaning                 : NAs, Capitals, Centroids, Geographically Duplicated, Identical Lat/Long, Invalid, Non-terrestrial, Duplicated Cell (grid) 
## --------  Predictors  ---------
## Number of Predictors          : 11 
## Predictors Names              : bio1, bio2, bio3, bio4, bio5, bio6, bio7, bio12, bio13, bio14, bio15 
## Area (VIF)                    : all
## Threshold                     : 0.5
## Selected Variables (VIF)      : bio3, bio4, bio5, bio12, bio15 
## -----------  Models  ----------
## Algorithms Names              : naive_bayes 
## Variables Names               : bio3 bio4 bio5 bio12 bio15 
## Model Validation              :
##     Method                    : repeatedcv 
##     Number                    : 4 
##     Metrics                   :
## $`Araucaria angustifolia`
##          algo       ROC       TSS Sensitivity Specificity
## 1 naive_bayes 0.9930081 0.9304945     0.95435    0.976075
## [1] "Modeling framework: 3.918 sec elapsed"

GCMs selection

We selected four sets of GCMs through chooseGCM: (1) the baseline, which is a global set including all GCMs; (2) the smallest subset of GCMs selected using a hierarchical clustering with mean distance closer to the global mean distance; (3) the smallest subset of GCMs selected using the K-means clustering algorithm with mean distance closer to the global mean distance; (4) the subset of GCMs automatically provided by Closestdist algorithm. After some pre-analysis, both subsets generated by the hierarchical and K-means clustering returned a subset of size three. In this sense, we also considered important to analize two more sets: (5) a random set, which includes three randomly selected GCMs; (6) the subset of GCMs selected using the Closestdist algorithm with size equals three.

Global set

# All GCMs - the baseline we are trying to replicate
tictoc::tic("All GCMs")
all_gcms <- c("ACCESS-CM2", "ACCESS-ESM1-5", "CanESM5-CanOE", "CMCC-ESM2", 
              "CNRM-CM6-1", "CNRM-CM6-1-HR", "CNRM-ESM2-1", 
              "EC-Earth3-Veg", "EC-Earth3-Veg-LR", "FIO-ESM-2-0",
              "GISS-E2-1-G", "GISS-E2-1-H", "HadGEM3-GC31-LL",
              "INM-CM4-8", "INM-CM5-0", "IPSL-CM6A-LR", 
              "MIROC-ES2L", "MIROC6", "MPI-ESM1-2-HR", 
              "MPI-ESM1-2-LR", "MRI-ESM2-0", "UKESM1-0-LL")
tictoc::toc()
## All GCMs: 0.002 sec elapsed

Random set

# Random subset
tictoc::tic("Random subset")
random_subset <- sample(all_gcms, 3)
random_subset
## [1] "HadGEM3-GC31-LL" "MIROC-ES2L"      "GISS-E2-1-G"
tictoc::toc()
## Random subset: 0.003 sec elapsed

Hclust set

# Hierarchical Clustering subset
tictoc::tic("Hierarchical Clustering subset")
mc_hclust <- montecarlo_gcms(s, 
                             var_names = caretSDM::selected_variables(i), 
                             study_area = study_area_Araucaria, 
                             scale = TRUE,
                             perm = 10000, 
                             dist_method = "euclidean", 
                             clustering_method = "hclust")
## Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
## of ggplot2 3.3.4.
## ℹ The deprecated feature was likely used in the factoextra package.
##   Please report the issue at <https://github.com/kassambara/factoextra/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
mc_hclust
## $montecarlo_plot

## 
## $suggested_gcms
## $suggested_gcms$k2
## [1] "ACCESS-ESM1-5" "CanESM5-CanOE"
## 
## $suggested_gcms$k3
## [1] "ACCESS-ESM1-5" "CanESM5-CanOE" "MPI-ESM1-2-HR"
## 
## $suggested_gcms$k4
## [1] "ACCESS-ESM1-5" "CanESM5-CanOE" "MPI-ESM1-2-HR" "EC-Earth3-Veg"
## 
## $suggested_gcms$k5
## [1] "FIO-ESM-2-0"   "ACCESS-ESM1-5" "CanESM5-CanOE" "MPI-ESM1-2-HR"
## [5] "EC-Earth3-Veg"
## 
## $suggested_gcms$k6
## [1] "FIO-ESM-2-0"   "ACCESS-ESM1-5" "CanESM5-CanOE" "MPI-ESM1-2-HR"
## [5] "EC-Earth3-Veg" "MPI-ESM1-2-LR"
## 
## $suggested_gcms$k7
## [1] "FIO-ESM-2-0"   "ACCESS-ESM1-5" "CanESM5-CanOE" "MPI-ESM1-2-HR"
## [5] "EC-Earth3-Veg" "GISS-E2-1-G"   "MPI-ESM1-2-LR"
## 
## $suggested_gcms$k8
## [1] "FIO-ESM-2-0"   "ACCESS-ESM1-5" "CanESM5-CanOE" "CMCC-ESM2"    
## [5] "MPI-ESM1-2-HR" "EC-Earth3-Veg" "GISS-E2-1-G"   "MPI-ESM1-2-LR"
## 
## $suggested_gcms$k9
## [1] "FIO-ESM-2-0"   "ACCESS-ESM1-5" "CanESM5-CanOE" "CMCC-ESM2"    
## [5] "MPI-ESM1-2-HR" "MIROC-ES2L"    "EC-Earth3-Veg" "GISS-E2-1-G"  
## [9] "MPI-ESM1-2-LR"
## 
## $suggested_gcms$k10
##  [1] "FIO-ESM-2-0"   "ACCESS-ESM1-5" "CanESM5-CanOE" "CMCC-ESM2"    
##  [5] "MPI-ESM1-2-HR" "INM-CM5-0"     "EC-Earth3-Veg" "GISS-E2-1-G"  
##  [9] "MIROC-ES2L"    "MPI-ESM1-2-LR"
## 
## $suggested_gcms$k11
##  [1] "FIO-ESM-2-0"   "ACCESS-ESM1-5" "CanESM5-CanOE" "CMCC-ESM2"    
##  [5] "MPI-ESM1-2-HR" "INM-CM5-0"     "EC-Earth3-Veg" "GISS-E2-1-G"  
##  [9] "IPSL-CM6A-LR"  "MIROC-ES2L"    "MPI-ESM1-2-LR"
## 
## $suggested_gcms$k12
##  [1] "FIO-ESM-2-0"   "ACCESS-ESM1-5" "CanESM5-CanOE" "CMCC-ESM2"    
##  [5] "CNRM-ESM2-1"   "INM-CM5-0"     "EC-Earth3-Veg" "GISS-E2-1-G"  
##  [9] "MPI-ESM1-2-HR" "IPSL-CM6A-LR"  "MIROC-ES2L"    "MPI-ESM1-2-LR"
## 
## $suggested_gcms$k13
##  [1] "FIO-ESM-2-0"   "ACCESS-ESM1-5" "CanESM5-CanOE" "CMCC-ESM2"    
##  [5] "CNRM-ESM2-1"   "INM-CM5-0"     "EC-Earth3-Veg" "GISS-E2-1-G"  
##  [9] "INM-CM4-8"     "IPSL-CM6A-LR"  "MIROC-ES2L"    "MPI-ESM1-2-HR"
## [13] "MPI-ESM1-2-LR"
## 
## $suggested_gcms$k14
##  [1] "ACCESS-CM2"    "ACCESS-ESM1-5" "CanESM5-CanOE" "CMCC-ESM2"    
##  [5] "CNRM-ESM2-1"   "INM-CM5-0"     "EC-Earth3-Veg" "FIO-ESM-2-0"  
##  [9] "GISS-E2-1-G"   "INM-CM4-8"     "IPSL-CM6A-LR"  "MIROC-ES2L"   
## [13] "MPI-ESM1-2-HR" "MPI-ESM1-2-LR"
## 
## $suggested_gcms$k15
##  [1] "ACCESS-CM2"    "ACCESS-ESM1-5" "CanESM5-CanOE" "CMCC-ESM2"    
##  [5] "CNRM-ESM2-1"   "INM-CM5-0"     "EC-Earth3-Veg" "FIO-ESM-2-0"  
##  [9] "GISS-E2-1-G"   "INM-CM4-8"     "IPSL-CM6A-LR"  "MIROC-ES2L"   
## [13] "MIROC6"        "MPI-ESM1-2-HR" "MPI-ESM1-2-LR"
## 
## $suggested_gcms$k16
##  [1] "ACCESS-CM2"    "ACCESS-ESM1-5" "CanESM5-CanOE" "CMCC-ESM2"    
##  [5] "CNRM-ESM2-1"   "CNRM-CM6-1-HR" "EC-Earth3-Veg" "FIO-ESM-2-0"  
##  [9] "GISS-E2-1-G"   "INM-CM4-8"     "INM-CM5-0"     "IPSL-CM6A-LR" 
## [13] "MIROC-ES2L"    "MIROC6"        "MPI-ESM1-2-HR" "MPI-ESM1-2-LR"
## 
## $suggested_gcms$k17
##  [1] "ACCESS-CM2"    "ACCESS-ESM1-5" "CanESM5-CanOE" "CMCC-ESM2"    
##  [5] "CNRM-ESM2-1"   "CNRM-CM6-1-HR" "EC-Earth3-Veg" "FIO-ESM-2-0"  
##  [9] "GISS-E2-1-G"   "INM-CM4-8"     "INM-CM5-0"     "IPSL-CM6A-LR" 
## [13] "MIROC-ES2L"    "MIROC6"        "MPI-ESM1-2-HR" "MPI-ESM1-2-LR"
## [17] "MRI-ESM2-0"   
## 
## $suggested_gcms$k18
##  [1] "ACCESS-CM2"    "ACCESS-ESM1-5" "CanESM5-CanOE" "CMCC-ESM2"    
##  [5] "CNRM-ESM2-1"   "CNRM-CM6-1-HR" "EC-Earth3-Veg" "FIO-ESM-2-0"  
##  [9] "GISS-E2-1-G"   "GISS-E2-1-H"   "INM-CM4-8"     "INM-CM5-0"    
## [13] "IPSL-CM6A-LR"  "MIROC-ES2L"    "MIROC6"        "MPI-ESM1-2-HR"
## [17] "MPI-ESM1-2-LR" "MRI-ESM2-0"   
## 
## $suggested_gcms$k19
##  [1] "ACCESS-CM2"      "ACCESS-ESM1-5"   "CanESM5-CanOE"   "CMCC-ESM2"      
##  [5] "CNRM-ESM2-1"     "CNRM-CM6-1-HR"   "EC-Earth3-Veg"   "FIO-ESM-2-0"    
##  [9] "GISS-E2-1-G"     "GISS-E2-1-H"     "HadGEM3-GC31-LL" "INM-CM4-8"      
## [13] "INM-CM5-0"       "IPSL-CM6A-LR"    "MIROC-ES2L"      "MIROC6"         
## [17] "MPI-ESM1-2-HR"   "MPI-ESM1-2-LR"   "MRI-ESM2-0"     
## 
## $suggested_gcms$k20
##  [1] "ACCESS-CM2"       "ACCESS-ESM1-5"    "CanESM5-CanOE"    "CMCC-ESM2"       
##  [5] "CNRM-ESM2-1"      "CNRM-CM6-1-HR"    "EC-Earth3-Veg"    "EC-Earth3-Veg-LR"
##  [9] "FIO-ESM-2-0"      "GISS-E2-1-G"      "GISS-E2-1-H"      "HadGEM3-GC31-LL" 
## [13] "INM-CM4-8"        "INM-CM5-0"        "IPSL-CM6A-LR"     "MIROC-ES2L"      
## [17] "MIROC6"           "MPI-ESM1-2-HR"    "MPI-ESM1-2-LR"    "MRI-ESM2-0"      
## 
## $suggested_gcms$k21
##  [1] "ACCESS-CM2"       "ACCESS-ESM1-5"    "CanESM5-CanOE"    "CMCC-ESM2"       
##  [5] "CNRM-CM6-1"       "CNRM-CM6-1-HR"    "CNRM-ESM2-1"      "EC-Earth3-Veg"   
##  [9] "EC-Earth3-Veg-LR" "FIO-ESM-2-0"      "GISS-E2-1-G"      "GISS-E2-1-H"     
## [13] "HadGEM3-GC31-LL"  "INM-CM4-8"        "INM-CM5-0"        "IPSL-CM6A-LR"    
## [17] "MIROC-ES2L"       "MIROC6"           "MPI-ESM1-2-HR"    "MPI-ESM1-2-LR"   
## [21] "MRI-ESM2-0"
hclust_subset <- mc_hclust$suggested_gcms$k3
tictoc::toc()
## Hierarchical Clustering subset: 16.691 sec elapsed

Kmeans set

# K-means Clustering subset
tictoc::tic("K-means Clustering subset")
mc_kmeans <- montecarlo_gcms(s, 
                             var_names = caretSDM::selected_variables(i), 
                             study_area = study_area_Araucaria, 
                             scale = TRUE,
                             perm = 10000, 
                             dist_method = "euclidean", 
                             clustering_method = "kmeans")
mc_kmeans 
## $montecarlo_plot

## 
## $suggested_gcms
## $suggested_gcms$k2
##                  1                  2 
## "EC-Earth3-Veg-LR"      "CNRM-ESM2-1" 
## 
## $suggested_gcms$k3
##                  1                  2                  3 
## "EC-Earth3-Veg-LR"      "UKESM1-0-LL"      "CNRM-ESM2-1" 
## 
## $suggested_gcms$k4
##                  1                  2                  3                  4 
##      "GISS-E2-1-G" "EC-Earth3-Veg-LR"      "UKESM1-0-LL"      "CNRM-ESM2-1" 
## 
## $suggested_gcms$k5
##               1               2               3               4               5 
## "EC-Earth3-Veg"   "UKESM1-0-LL"   "CNRM-ESM2-1" "CanESM5-CanOE"   "GISS-E2-1-G" 
## 
## $suggested_gcms$k6
##               1               2               3               4               5 
##   "CNRM-ESM2-1" "CanESM5-CanOE"   "UKESM1-0-LL" "MPI-ESM1-2-LR"    "MIROC-ES2L" 
##               6 
## "EC-Earth3-Veg" 
## 
## $suggested_gcms$k7
##               1               2               3               4               5 
##    "MIROC-ES2L" "MPI-ESM1-2-LR" "EC-Earth3-Veg"     "CMCC-ESM2"   "UKESM1-0-LL" 
##               6               7 
##   "CNRM-ESM2-1" "CanESM5-CanOE" 
## 
## $suggested_gcms$k8
##               1               2               3               4               5 
##     "CMCC-ESM2"   "UKESM1-0-LL" "MPI-ESM1-2-LR"    "MRI-ESM2-0" "EC-Earth3-Veg" 
##               6               7               8 
## "CanESM5-CanOE"    "CNRM-CM6-1"   "GISS-E2-1-G" 
## 
## $suggested_gcms$k9
##               1               2               3               4               5 
##    "MRI-ESM2-0"   "GISS-E2-1-G"  "IPSL-CM6A-LR"     "CMCC-ESM2"    "CNRM-CM6-1" 
##               6               7               8               9 
##   "UKESM1-0-LL" "CanESM5-CanOE" "EC-Earth3-Veg" "MPI-ESM1-2-LR" 
## 
## $suggested_gcms$k10
##               1               2               3               4               5 
##    "CNRM-CM6-1" "MPI-ESM1-2-LR"    "MRI-ESM2-0"    "MIROC-ES2L"   "GISS-E2-1-G" 
##               6               7               8               9              10 
## "CanESM5-CanOE" "EC-Earth3-Veg"   "UKESM1-0-LL"     "CMCC-ESM2"  "IPSL-CM6A-LR" 
## 
## $suggested_gcms$k11
##               1               2               3               4               5 
## "CNRM-CM6-1-HR"    "MIROC-ES2L" "MPI-ESM1-2-LR" "EC-Earth3-Veg"   "GISS-E2-1-G" 
##               6               7               8               9              10 
##   "CNRM-ESM2-1"    "MRI-ESM2-0"     "CMCC-ESM2"   "UKESM1-0-LL" "CanESM5-CanOE" 
##              11 
##  "IPSL-CM6A-LR" 
## 
## $suggested_gcms$k12
##               1               2               3               4               5 
##   "GISS-E2-1-G"    "CNRM-CM6-1" "MPI-ESM1-2-LR"    "MIROC-ES2L"     "CMCC-ESM2" 
##               6               7               8               9              10 
##   "UKESM1-0-LL" "CNRM-CM6-1-HR" "MPI-ESM1-2-HR" "EC-Earth3-Veg"  "IPSL-CM6A-LR" 
##              11              12 
##    "MRI-ESM2-0" "CanESM5-CanOE" 
## 
## $suggested_gcms$k13
##               1               2               3               4               5 
##   "UKESM1-0-LL"     "INM-CM4-8"     "CMCC-ESM2"    "MIROC-ES2L"  "IPSL-CM6A-LR" 
##               6               7               8               9              10 
##    "CNRM-CM6-1"   "GISS-E2-1-G" "CanESM5-CanOE"    "MRI-ESM2-0" "EC-Earth3-Veg" 
##              11              12              13 
## "CNRM-CM6-1-HR" "MPI-ESM1-2-LR" "MPI-ESM1-2-HR" 
## 
## $suggested_gcms$k14
##               1               2               3               4               5 
## "CanESM5-CanOE"    "CNRM-CM6-1"   "FIO-ESM-2-0" "MPI-ESM1-2-HR"   "UKESM1-0-LL" 
##               6               7               8               9              10 
##     "CMCC-ESM2"    "ACCESS-CM2"   "GISS-E2-1-G" "EC-Earth3-Veg"    "MIROC-ES2L" 
##              11              12              13              14 
##     "INM-CM4-8" "CNRM-CM6-1-HR"  "IPSL-CM6A-LR" "MPI-ESM1-2-LR" 
## 
## $suggested_gcms$k15
##               1               2               3               4               5 
##     "CMCC-ESM2" "MPI-ESM1-2-LR"    "MIROC-ES2L"   "GISS-E2-1-G"        "MIROC6" 
##               6               7               8               9              10 
##     "INM-CM4-8"  "IPSL-CM6A-LR" "CNRM-CM6-1-HR" "CanESM5-CanOE" "MPI-ESM1-2-HR" 
##              11              12              13              14              15 
##   "FIO-ESM-2-0"    "ACCESS-CM2"   "UKESM1-0-LL" "EC-Earth3-Veg"    "CNRM-CM6-1" 
## 
## $suggested_gcms$k16
##               1               2               3               4               5 
##   "GISS-E2-1-H"  "IPSL-CM6A-LR"    "ACCESS-CM2" "MPI-ESM1-2-HR" "MPI-ESM1-2-LR" 
##               6               7               8               9              10 
##        "MIROC6"    "CNRM-CM6-1"   "UKESM1-0-LL"   "FIO-ESM-2-0" "CNRM-CM6-1-HR" 
##              11              12              13              14              15 
##   "GISS-E2-1-G" "CanESM5-CanOE"    "MIROC-ES2L"     "CMCC-ESM2" "EC-Earth3-Veg" 
##              16 
##     "INM-CM4-8" 
## 
## $suggested_gcms$k17
##               1               2               3               4               5 
##   "GISS-E2-1-H"        "MIROC6" "CanESM5-CanOE" "MPI-ESM1-2-LR"   "FIO-ESM-2-0" 
##               6               7               8               9              10 
##    "MIROC-ES2L"     "INM-CM5-0"     "INM-CM4-8"     "CMCC-ESM2"    "ACCESS-CM2" 
##              11              12              13              14              15 
## "CNRM-CM6-1-HR"  "IPSL-CM6A-LR"   "GISS-E2-1-G" "MPI-ESM1-2-HR"   "UKESM1-0-LL" 
##              16              17 
## "EC-Earth3-Veg"    "CNRM-CM6-1" 
## 
## $suggested_gcms$k18
##               1               2               3               4               5 
##   "GISS-E2-1-H"    "CNRM-CM6-1"    "MRI-ESM2-0" "CanESM5-CanOE" "CNRM-CM6-1-HR" 
##               6               7               8               9              10 
##     "INM-CM5-0"     "CMCC-ESM2"   "FIO-ESM-2-0" "EC-Earth3-Veg"    "MIROC-ES2L" 
##              11              12              13              14              15 
##  "IPSL-CM6A-LR"        "MIROC6" "MPI-ESM1-2-HR"     "INM-CM4-8"    "ACCESS-CM2" 
##              16              17              18 
##   "GISS-E2-1-G" "MPI-ESM1-2-LR"   "UKESM1-0-LL" 
## 
## $suggested_gcms$k19
##                 1                 2                 3                 4 
##   "EC-Earth3-Veg"      "MIROC-ES2L"   "CanESM5-CanOE"     "GISS-E2-1-G" 
##                 5                 6                 7                 8 
##    "IPSL-CM6A-LR"     "FIO-ESM-2-0"   "ACCESS-ESM1-5"       "INM-CM4-8" 
##                 9                10                11                12 
##      "ACCESS-CM2"       "INM-CM5-0"      "CNRM-CM6-1"     "GISS-E2-1-H" 
##                13                14                15                16 
##       "CMCC-ESM2"          "MIROC6"   "MPI-ESM1-2-HR"   "CNRM-CM6-1-HR" 
##                17                18                19 
## "HadGEM3-GC31-LL"      "MRI-ESM2-0"   "MPI-ESM1-2-LR" 
## 
## $suggested_gcms$k20
##                 1                 2                 3                 4 
##      "MIROC-ES2L"   "EC-Earth3-Veg"      "MRI-ESM2-0"     "GISS-E2-1-G" 
##                 5                 6                 7                 8 
##     "GISS-E2-1-H"       "CMCC-ESM2"     "FIO-ESM-2-0"   "MPI-ESM1-2-HR" 
##                 9                10                11                12 
##          "MIROC6"   "CNRM-CM6-1-HR" "HadGEM3-GC31-LL"      "CNRM-CM6-1" 
##                13                14                15                16 
##    "IPSL-CM6A-LR"       "INM-CM5-0"      "ACCESS-CM2"       "INM-CM4-8" 
##                17                18                19                20 
##   "MPI-ESM1-2-LR"   "ACCESS-ESM1-5"   "CanESM5-CanOE"     "CNRM-ESM2-1" 
## 
## $suggested_gcms$k21
##                  1                  2                  3                  4 
##  "HadGEM3-GC31-LL"       "CNRM-CM6-1"    "MPI-ESM1-2-HR"      "GISS-E2-1-H" 
##                  5                  6                  7                  8 
##    "MPI-ESM1-2-LR"      "CNRM-ESM2-1"       "ACCESS-CM2"      "FIO-ESM-2-0" 
##                  9                 10                 11                 12 
##        "CMCC-ESM2"      "GISS-E2-1-G"    "CanESM5-CanOE"       "MRI-ESM2-0" 
##                 13                 14                 15                 16 
##    "EC-Earth3-Veg"        "INM-CM5-0"        "INM-CM4-8" "EC-Earth3-Veg-LR" 
##                 17                 18                 19                 20 
##    "CNRM-CM6-1-HR"    "ACCESS-ESM1-5"           "MIROC6"       "MIROC-ES2L" 
##                 21 
##     "IPSL-CM6A-LR"
kmeans_subset <- mc_kmeans$suggested_gcms$k3
tictoc::toc()
## K-means Clustering subset: 27.4 sec elapsed

Closestdist set

# Closestdist algorithm subset
tictoc::tic("Closestdist algorithm subset")
mc_close <- closestdist_gcms(s, 
                             var_names = caretSDM::selected_variables(i), 
                             study_area = study_area_Araucaria, 
                             method = "euclidean")
## CRS from s and study_area are not identical. Reprojecting study area.
mc_close
## $suggested_gcms
## [1] "ACCESS-ESM1-5" "CMCC-ESM2"     "MIROC-ES2L"    "CNRM-CM6-1-HR"
## 
## $best_mean_diff
## [1] 0.0060211
## 
## $global_mean
## [1] 69.70944
closestdist_subset <- mc_close$suggested_gcms
tictoc::toc()
## Closestdist algorithm subset: 2.806 sec elapsed

Closestdist3 set

# Closestdist algorithm subset
tictoc::tic("Closestdist algorithm subset")
mc_close3 <- montecarlo_gcms(s, 
                             var_names = caretSDM::selected_variables(i), 
                             study_area = study_area_Araucaria, 
                             scale = TRUE,
                             perm = 10000, 
                             dist_method = "euclidean", 
                             clustering_method = "closestdist")
mc_close3
## $montecarlo_plot

## 
## $suggested_gcms
## $suggested_gcms$k2
## [1] "CMCC-ESM2"     "MPI-ESM1-2-HR"
## 
## $suggested_gcms$k3
## [1] "CNRM-CM6-1"    "FIO-ESM-2-0"   "MPI-ESM1-2-LR"
## 
## $suggested_gcms$k4
## [1] "ACCESS-ESM1-5" "CMCC-ESM2"     "MIROC-ES2L"    "CNRM-CM6-1-HR"
## 
## $suggested_gcms$k5
## [1] "CNRM-CM6-1-HR" "GISS-E2-1-G"   "UKESM1-0-LL"   "IPSL-CM6A-LR" 
## [5] "FIO-ESM-2-0"  
## 
## $suggested_gcms$k6
## [1] "GISS-E2-1-H"     "MPI-ESM1-2-HR"   "IPSL-CM6A-LR"    "MIROC6"         
## [5] "CMCC-ESM2"       "HadGEM3-GC31-LL"
## 
## $suggested_gcms$k7
## [1] "ACCESS-ESM1-5"    "EC-Earth3-Veg-LR" "ACCESS-CM2"       "CMCC-ESM2"       
## [5] "FIO-ESM-2-0"      "MIROC-ES2L"       "INM-CM5-0"       
## 
## $suggested_gcms$k8
## [1] "INM-CM4-8"       "MIROC6"          "MPI-ESM1-2-LR"   "HadGEM3-GC31-LL"
## [5] "GISS-E2-1-H"     "UKESM1-0-LL"     "CMCC-ESM2"       "GISS-E2-1-G"    
## 
## $suggested_gcms$k9
## [1] "CNRM-ESM2-1"      "UKESM1-0-LL"      "EC-Earth3-Veg-LR" "MRI-ESM2-0"      
## [5] "EC-Earth3-Veg"    "CMCC-ESM2"        "MPI-ESM1-2-HR"    "MIROC-ES2L"      
## [9] "INM-CM5-0"       
## 
## $suggested_gcms$k10
##  [1] "EC-Earth3-Veg-LR" "HadGEM3-GC31-LL"  "ACCESS-CM2"       "CMCC-ESM2"       
##  [5] "EC-Earth3-Veg"    "CNRM-ESM2-1"      "FIO-ESM-2-0"      "IPSL-CM6A-LR"    
##  [9] "MPI-ESM1-2-HR"    "INM-CM4-8"       
## 
## $suggested_gcms$k11
##  [1] "CMCC-ESM2"        "EC-Earth3-Veg-LR" "IPSL-CM6A-LR"     "HadGEM3-GC31-LL" 
##  [5] "CNRM-ESM2-1"      "EC-Earth3-Veg"    "MPI-ESM1-2-HR"    "MRI-ESM2-0"      
##  [9] "MIROC-ES2L"       "INM-CM5-0"        "INM-CM4-8"       
## 
## $suggested_gcms$k12
##  [1] "ACCESS-CM2"       "ACCESS-ESM1-5"    "EC-Earth3-Veg"    "CMCC-ESM2"       
##  [5] "FIO-ESM-2-0"      "IPSL-CM6A-LR"     "EC-Earth3-Veg-LR" "CNRM-ESM2-1"     
##  [9] "INM-CM5-0"        "MRI-ESM2-0"       "INM-CM4-8"        "MIROC6"          
## 
## $suggested_gcms$k13
##  [1] "CanESM5-CanOE"    "EC-Earth3-Veg"    "EC-Earth3-Veg-LR" "UKESM1-0-LL"     
##  [5] "HadGEM3-GC31-LL"  "CNRM-CM6-1"       "ACCESS-CM2"       "MPI-ESM1-2-HR"   
##  [9] "INM-CM5-0"        "MRI-ESM2-0"       "FIO-ESM-2-0"      "MIROC6"          
## [13] "INM-CM4-8"       
## 
## $suggested_gcms$k14
##  [1] "ACCESS-CM2"       "INM-CM5-0"        "CMCC-ESM2"        "IPSL-CM6A-LR"    
##  [5] "CNRM-CM6-1-HR"    "GISS-E2-1-H"      "GISS-E2-1-G"      "UKESM1-0-LL"     
##  [9] "MPI-ESM1-2-LR"    "MIROC-ES2L"       "MIROC6"           "ACCESS-ESM1-5"   
## [13] "EC-Earth3-Veg-LR" "CNRM-ESM2-1"     
## 
## $suggested_gcms$k15
##  [1] "ACCESS-CM2"      "EC-Earth3-Veg"   "HadGEM3-GC31-LL" "IPSL-CM6A-LR"   
##  [5] "FIO-ESM-2-0"     "CMCC-ESM2"       "MIROC-ES2L"      "MIROC6"         
##  [9] "INM-CM4-8"       "CNRM-CM6-1-HR"   "GISS-E2-1-G"     "GISS-E2-1-H"    
## [13] "UKESM1-0-LL"     "MPI-ESM1-2-LR"   "ACCESS-ESM1-5"  
## 
## $suggested_gcms$k16
##  [1] "EC-Earth3-Veg"    "GISS-E2-1-H"      "EC-Earth3-Veg-LR" "ACCESS-ESM1-5"   
##  [5] "UKESM1-0-LL"      "MPI-ESM1-2-HR"    "INM-CM5-0"        "CNRM-CM6-1-HR"   
##  [9] "MRI-ESM2-0"       "MIROC6"           "IPSL-CM6A-LR"     "FIO-ESM-2-0"     
## [13] "INM-CM4-8"        "GISS-E2-1-G"      "CMCC-ESM2"        "MIROC-ES2L"      
## 
## $suggested_gcms$k17
##  [1] "EC-Earth3-Veg"    "INM-CM4-8"        "CNRM-ESM2-1"      "CNRM-CM6-1"      
##  [5] "GISS-E2-1-H"      "FIO-ESM-2-0"      "IPSL-CM6A-LR"     "GISS-E2-1-G"     
##  [9] "MIROC6"           "CMCC-ESM2"        "ACCESS-ESM1-5"    "CNRM-CM6-1-HR"   
## [13] "MPI-ESM1-2-LR"    "MPI-ESM1-2-HR"    "UKESM1-0-LL"      "EC-Earth3-Veg-LR"
## [17] "HadGEM3-GC31-LL" 
## 
## $suggested_gcms$k18
##  [1] "EC-Earth3-Veg"    "MPI-ESM1-2-HR"    "EC-Earth3-Veg-LR" "IPSL-CM6A-LR"    
##  [5] "CNRM-ESM2-1"      "ACCESS-CM2"       "MRI-ESM2-0"       "INM-CM5-0"       
##  [9] "INM-CM4-8"        "MIROC6"           "GISS-E2-1-H"      "GISS-E2-1-G"     
## [13] "CNRM-CM6-1-HR"    "CMCC-ESM2"        "ACCESS-ESM1-5"    "FIO-ESM-2-0"     
## [17] "MPI-ESM1-2-LR"    "MIROC-ES2L"      
## 
## $suggested_gcms$k19
##  [1] "ACCESS-CM2"       "CanESM5-CanOE"    "ACCESS-ESM1-5"    "IPSL-CM6A-LR"    
##  [5] "CNRM-CM6-1-HR"    "MIROC-ES2L"       "INM-CM4-8"        "GISS-E2-1-H"     
##  [9] "GISS-E2-1-G"      "FIO-ESM-2-0"      "CMCC-ESM2"        "UKESM1-0-LL"     
## [13] "MIROC6"           "MPI-ESM1-2-LR"    "INM-CM5-0"        "MRI-ESM2-0"      
## [17] "HadGEM3-GC31-LL"  "EC-Earth3-Veg-LR" "CNRM-ESM2-1"     
## 
## $suggested_gcms$k20
##  [1] "CanESM5-CanOE"    "EC-Earth3-Veg"    "EC-Earth3-Veg-LR" "UKESM1-0-LL"     
##  [5] "HadGEM3-GC31-LL"  "CNRM-CM6-1"       "ACCESS-CM2"       "MPI-ESM1-2-HR"   
##  [9] "INM-CM5-0"        "MRI-ESM2-0"       "FIO-ESM-2-0"      "MIROC6"          
## [13] "INM-CM4-8"        "IPSL-CM6A-LR"     "CNRM-CM6-1-HR"    "GISS-E2-1-H"     
## [17] "GISS-E2-1-G"      "MIROC-ES2L"       "CMCC-ESM2"        "ACCESS-ESM1-5"   
## 
## $suggested_gcms$k21
##  [1] "EC-Earth3-Veg"    "INM-CM4-8"        "CNRM-ESM2-1"      "CNRM-CM6-1"      
##  [5] "GISS-E2-1-H"      "FIO-ESM-2-0"      "IPSL-CM6A-LR"     "GISS-E2-1-G"     
##  [9] "MIROC6"           "CMCC-ESM2"        "ACCESS-ESM1-5"    "CNRM-CM6-1-HR"   
## [13] "MPI-ESM1-2-LR"    "MPI-ESM1-2-HR"    "UKESM1-0-LL"      "EC-Earth3-Veg-LR"
## [17] "HadGEM3-GC31-LL"  "INM-CM5-0"        "MRI-ESM2-0"       "CanESM5-CanOE"   
## [21] "ACCESS-CM2"
closestdist_subset3 <- mc_close3$suggested_gcms$k3
tictoc::toc()
## Closestdist algorithm subset: 21.779 sec elapsed
# Initialize an empty list to store the mean layers
mean_layers <- list()

# Loop through each layer index
for (j in caretSDM::get_predictor_names(i)) {
  # Extract the i-th layer from each SpatRaster in the list
  layer_stack <- lapply(s, function(x) x[[j]])
  
  # Calculate the mean of the extracted layers
  mean_layers[[j]] <- mean(rast(layer_stack))
}

# Combine the mean layers into a single SpatRaster object
mean_gcm <- rast(mean_layers)

# Transform to stars to use in caretSDM
mean_gcm <- stars::st_as_stars(mean_gcm)
names(mean_gcm) <- "mean_gcm"

Building Projections

Model projections were performed to each GCM set built in the previous section. We projected models to future scenarios of climate change, encompassing an extreme shared socioeconomic pathway (SSP5-8.5) to the year 2090 to the given subsets of GCMs selected using chooseGCM. We calculated the mean of the GCM projections (probability of occurrence) to obtain a final projection for each subset (mean probability of occurrence). We also stored the time taken to perform each projection. The ensemble of GCM projections was calculated using the average.

scen <- stars::read_stars(
        list.files("~/storage/WC_data/future/climate/wc2.1_10m",
                   full.names = T))[,,,c("bio01","bio02", "bio03","bio04", "bio05", 
                                         "bio06", "bio07", "bio12", "bio13", "bio14", "bio15")]
tictoc::tic("Projecting to all scenarios")
i_all <- i |>
      caretSDM::add_scenarios() |>
      caretSDM::add_scenarios(scen) |> 
      caretSDM::predict_sdm() |>
      caretSDM::gcms_ensembles(gcms = all_gcms)
##    s1_names s2_names
## 1     bio01     bio1
## 2     bio02     bio2
## 3     bio03     bio3
## 4     bio04     bio4
## 5     bio05     bio5
## 6     bio06     bio6
## 7     bio07     bio7
## 8     bio12    bio12
## 9     bio13    bio13
## 10    bio14    bio14
## 11    bio15    bio15
## [1] "Projecting: 1/24"
## [1] "Projecting: 2/24"
## [1] "Projecting: 3/24"
## [1] "Projecting: 4/24"
## [1] "Projecting: 5/24"
## [1] "Projecting: 6/24"
## [1] "Projecting: 7/24"
## [1] "Projecting: 8/24"
## [1] "Projecting: 9/24"
## [1] "Projecting: 10/24"
## [1] "Projecting: 11/24"
## [1] "Projecting: 12/24"
## [1] "Projecting: 13/24"
## [1] "Projecting: 14/24"
## [1] "Projecting: 15/24"
## [1] "Projecting: 16/24"
## [1] "Projecting: 17/24"
## [1] "Projecting: 18/24"
## [1] "Projecting: 19/24"
## [1] "Projecting: 20/24"
## [1] "Projecting: 21/24"
## [1] "Projecting: 22/24"
## [1] "Projecting: 23/24"
## [1] "Projecting: 24/24"
## [1] "Ensembling..."
## [1] "current"
## [1] "Araucaria angustifolia"
## [1] "ACCESS-CM2_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "ACCESS-ESM1-5_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "CanESM5-CanOE_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "CMCC-ESM2_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "CNRM-CM6-1_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "CNRM-CM6-1-HR_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "CNRM-ESM2-1_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "EC-Earth3-Veg_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "EC-Earth3-Veg-LR_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "FIO-ESM-2-0_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "GISS-E2-1-G_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "GISS-E2-1-H_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "HadGEM3-GC31-LL_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "INM-CM4-8_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "INM-CM5-0_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "IPSL-CM6A-LR_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "MIROC-ES2L_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "MIROC6_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "MPI-ESM1-2-HR_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "MPI-ESM1-2-LR_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "MRI-ESM2-0_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "UKESM1-0-LL_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "current"
## [1] "Araucaria angustifolia"
time_all <- tictoc::toc()
## Projecting to all scenarios: 2221.794 sec elapsed
tictoc::tic("Projecting to random_subset")
i_random <- i |>
      caretSDM::add_scenarios() |>
      caretSDM::add_scenarios(scen[paste0(random_subset, "_ssp585_2081-2100.tif")]) |> 
      caretSDM::predict_sdm() |>
      caretSDM::gcms_ensembles(gcms = random_subset)
##    s1_names s2_names
## 1     bio01     bio1
## 2     bio02     bio2
## 3     bio03     bio3
## 4     bio04     bio4
## 5     bio05     bio5
## 6     bio06     bio6
## 7     bio07     bio7
## 8     bio12    bio12
## 9     bio13    bio13
## 10    bio14    bio14
## 11    bio15    bio15
## [1] "Projecting: 1/5"
## [1] "Projecting: 2/5"
## [1] "Projecting: 3/5"
## [1] "Projecting: 4/5"
## [1] "Projecting: 5/5"
## [1] "Ensembling..."
## [1] "current"
## [1] "Araucaria angustifolia"
## [1] "HadGEM3-GC31-LL_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "MIROC-ES2L_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "GISS-E2-1-G_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "current"
## [1] "Araucaria angustifolia"
time_random <- tictoc::toc()
## Projecting to random_subset: 368.941 sec elapsed
tictoc::tic("Projecting to hclust_subset")
i_hclust <- i |>
      caretSDM::add_scenarios() |>
      caretSDM::add_scenarios(scen[paste0(hclust_subset, "_ssp585_2081-2100.tif")]) |> 
      caretSDM::predict_sdm() |>
      caretSDM::gcms_ensembles(gcms = hclust_subset)
##    s1_names s2_names
## 1     bio01     bio1
## 2     bio02     bio2
## 3     bio03     bio3
## 4     bio04     bio4
## 5     bio05     bio5
## 6     bio06     bio6
## 7     bio07     bio7
## 8     bio12    bio12
## 9     bio13    bio13
## 10    bio14    bio14
## 11    bio15    bio15
## [1] "Projecting: 1/5"
## [1] "Projecting: 2/5"
## [1] "Projecting: 3/5"
## [1] "Projecting: 4/5"
## [1] "Projecting: 5/5"
## [1] "Ensembling..."
## [1] "current"
## [1] "Araucaria angustifolia"
## [1] "ACCESS-ESM1-5_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "CanESM5-CanOE_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "MPI-ESM1-2-HR_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "current"
## [1] "Araucaria angustifolia"
time_hclust <- tictoc::toc()
## Projecting to hclust_subset: 366.447 sec elapsed
tictoc::tic("Projecting to kmeans_subset")
i_kmeans <- i |>
      caretSDM::add_scenarios() |>
      caretSDM::add_scenarios(scen[paste0(kmeans_subset, "_ssp585_2081-2100.tif")]) |> 
      caretSDM::predict_sdm() |>
      caretSDM::gcms_ensembles(gcms = kmeans_subset)
##    s1_names s2_names
## 1     bio01     bio1
## 2     bio02     bio2
## 3     bio03     bio3
## 4     bio04     bio4
## 5     bio05     bio5
## 6     bio06     bio6
## 7     bio07     bio7
## 8     bio12    bio12
## 9     bio13    bio13
## 10    bio14    bio14
## 11    bio15    bio15
## [1] "Projecting: 1/5"
## [1] "Projecting: 2/5"
## [1] "Projecting: 3/5"
## [1] "Projecting: 4/5"
## [1] "Projecting: 5/5"
## [1] "Ensembling..."
## [1] "current"
## [1] "Araucaria angustifolia"
## [1] "EC-Earth3-Veg-LR_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "UKESM1-0-LL_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "CNRM-ESM2-1_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "current"
## [1] "Araucaria angustifolia"
time_kmeans <- tictoc::toc()
## Projecting to kmeans_subset: 368.51 sec elapsed
tictoc::tic("Projecting to closestdist_subset")
i_closest <- i |>
      caretSDM::add_scenarios() |>
      caretSDM::add_scenarios(scen[paste0(closestdist_subset, "_ssp585_2081-2100.tif")]) |> 
      caretSDM::predict_sdm() |>
      caretSDM::gcms_ensembles(gcms = closestdist_subset)
##    s1_names s2_names
## 1     bio01     bio1
## 2     bio02     bio2
## 3     bio03     bio3
## 4     bio04     bio4
## 5     bio05     bio5
## 6     bio06     bio6
## 7     bio07     bio7
## 8     bio12    bio12
## 9     bio13    bio13
## 10    bio14    bio14
## 11    bio15    bio15
## [1] "Projecting: 1/6"
## [1] "Projecting: 2/6"
## [1] "Projecting: 3/6"
## [1] "Projecting: 4/6"
## [1] "Projecting: 5/6"
## [1] "Projecting: 6/6"
## [1] "Ensembling..."
## [1] "current"
## [1] "Araucaria angustifolia"
## [1] "ACCESS-ESM1-5_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "CMCC-ESM2_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "MIROC-ES2L_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "CNRM-CM6-1-HR_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "current"
## [1] "Araucaria angustifolia"
time_closest <- tictoc::toc()
## Projecting to closestdist_subset: 449.158 sec elapsed
tictoc::tic("Projecting to closestdist_subset3")
i_closest3 <- i |>
      caretSDM::add_scenarios() |>
      caretSDM::add_scenarios(scen[paste0(closestdist_subset3, "_ssp585_2081-2100.tif")]) |> 
      caretSDM::predict_sdm() |>
      caretSDM::gcms_ensembles(gcms = closestdist_subset3)
##    s1_names s2_names
## 1     bio01     bio1
## 2     bio02     bio2
## 3     bio03     bio3
## 4     bio04     bio4
## 5     bio05     bio5
## 6     bio06     bio6
## 7     bio07     bio7
## 8     bio12    bio12
## 9     bio13    bio13
## 10    bio14    bio14
## 11    bio15    bio15
## [1] "Projecting: 1/5"
## [1] "Projecting: 2/5"
## [1] "Projecting: 3/5"
## [1] "Projecting: 4/5"
## [1] "Projecting: 5/5"
## [1] "Ensembling..."
## [1] "current"
## [1] "Araucaria angustifolia"
## [1] "CNRM-CM6-1_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "FIO-ESM-2-0_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "MPI-ESM1-2-LR_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "current"
## [1] "Araucaria angustifolia"
time_closest3 <- tictoc::toc()
## Projecting to closestdist_subset3: 355.5 sec elapsed
tictoc::tic("Projecting to mean_gcm")
i_mean_gcm <- i |>
      caretSDM::add_scenarios() |>
      caretSDM::add_scenarios(mean_gcm) |> 
      caretSDM::predict_sdm()
## [1] "Projecting: 1/3"
## [1] "Projecting: 2/3"
## [1] "Projecting: 3/3"
## [1] "Ensembling..."
## [1] "current"
## [1] "Araucaria angustifolia"
## [1] "mean_gcm"
## [1] "Araucaria angustifolia"
## [1] "current"
## [1] "Araucaria angustifolia"
time_mean_gcm <- tictoc::toc()
## Projecting to mean_gcm: 156.651 sec elapsed

Comparing Projections

To compare whether chooseGCM performed better than alternative selection methods, we built a Pearson’s correlation matrix between the six sets of GCMs and built an Euclidean distance matrix. Numeric results were compiled into a table for better comparison and visualization.

df <- data.frame(
  all      = i_all$predictions$ensembles[,"_ssp585_2081-2100.tif"][[1]]$mean_occ_prob,
  random   = i_random$predictions$ensembles[,"_ssp585_2081-2100.tif"][[1]]$mean_occ_prob,
  hclust   = i_hclust$predictions$ensembles[,"_ssp585_2081-2100.tif"][[1]]$mean_occ_prob,
  kmeans   = i_kmeans$predictions$ensembles[,"_ssp585_2081-2100.tif"][[1]]$mean_occ_prob,
  closest  = i_closest$predictions$ensembles[,"_ssp585_2081-2100.tif"][[1]]$mean_occ_prob,
  closest3 = i_closest3$predictions$ensembles[,"_ssp585_2081-2100.tif"][[1]]$mean_occ_prob,
  mean_GCM = i_mean_gcm$predictions$ensembles[,"mean_gcm"][[1]]$mean_occ_prob
)
s_cor <- cor(df)
s_cor
##                all    random    hclust    kmeans   closest  closest3  mean_GCM
## all      1.0000000 0.7471318 0.7765558 0.7813567 0.9052983 0.8283943 0.8072922
## random   0.7471318 1.0000000 0.6441292 0.3849073 0.6531905 0.4522108 0.5170753
## hclust   0.7765558 0.6441292 1.0000000 0.4710737 0.6751460 0.4938375 0.6498367
## kmeans   0.7813567 0.3849073 0.4710737 1.0000000 0.6454123 0.7745283 0.7232116
## closest  0.9052983 0.6531905 0.6751460 0.6454123 1.0000000 0.7275182 0.6668466
## closest3 0.8283943 0.4522108 0.4938375 0.7745283 0.7275182 1.0000000 0.7380881
## mean_GCM 0.8072922 0.5170753 0.6498367 0.7232116 0.6668466 0.7380881 1.0000000
s_dist <- dist(t(df))
s_dist
##                all    random    hclust    kmeans   closest  closest3
## random    7.220488                                                  
## hclust    6.869828  9.166807                                        
## kmeans    7.083849 12.293921 11.375874                              
## closest   4.586968  8.920814  8.695204  9.308493                    
## closest3  6.772702 11.955831 11.514598  7.848486  8.451620          
## mean_GCM  8.992250 12.852741 11.122912 10.077909 10.861230  9.869753
df <- data.frame(Subset = c("Global", 
                            "Random", 
                            "Hclust", 
                            "Kmeans", 
                            "Closedist", 
                            "Closedist3", 
                            "MeanGCM"
                            ),
                 Projection_Time = c(
                   as.numeric(time_all$toc - time_all$tic),
                   as.numeric(time_random$toc - time_random$tic),
                   as.numeric(time_hclust$toc - time_hclust$tic),
                   as.numeric(time_kmeans$toc - time_kmeans$tic),
                   as.numeric(time_closest$toc - time_closest$tic),
                   as.numeric(time_closest3$toc - time_closest3$tic),
                   as.numeric(time_mean_gcm$toc - time_mean_gcm$tic)
                   ),
                 Time_saving = c(
                   100-as.numeric(time_all$toc - time_all$tic)*100/as.numeric(time_all$toc - time_all$tic),
                   100-as.numeric(time_random$toc - time_random$tic)*100/as.numeric(time_all$toc - time_all$tic),
                   100-as.numeric(time_hclust$toc - time_hclust$tic)*100/as.numeric(time_all$toc - time_all$tic),
                   100-as.numeric(time_kmeans$toc - time_kmeans$tic)*100/as.numeric(time_all$toc - time_all$tic),
                   100-as.numeric(time_closest$toc - time_closest$tic)*100/as.numeric(time_all$toc - time_all$tic),
                   100-as.numeric(time_closest3$toc - time_closest3$tic)*100/as.numeric(time_all$toc - time_all$tic),
                   100-as.numeric(time_mean_gcm$toc - time_mean_gcm$tic)*100/as.numeric(time_all$toc - time_all$tic)
                   ),
                 Distance_from_Global = as.numeric(as.matrix(s_dist)["all",]),
                 SDMs_Correlation = as.numeric(as.matrix(s_cor)["all",])
                 )

Results

The final result of this analysis is summarized in the df object. In this table we show how optimized our method is to reach a high correlation with the baseline, saving computational time.

df
##       Subset Projection_Time Time_saving Distance_from_Global SDMs_Correlation
## 1     Global        2221.794     0.00000             0.000000        1.0000000
## 2     Random         368.941    83.39446             7.220488        0.7471318
## 3     Hclust         366.447    83.50671             6.869828        0.7765558
## 4     Kmeans         368.510    83.41385             7.083849        0.7813567
## 5  Closedist         449.158    79.78399             4.586968        0.9052983
## 6 Closedist3         355.500    83.99942             6.772702        0.8283943
## 7    MeanGCM         156.651    92.94935             8.992250        0.8072922