
The chooseGCM contribution for Species Distribution Modeling
Source:vignettes/articles/chooseGCM-SDM.Rmd
chooseGCM-SDM.Rmd
This Rmarkdown is part of the following article:
Esser, L.F., Bailly, D., Lima, M.R., Ré, R. 2025. chooseGCM: a toolkit to select General Circulation Models in R. Global Change Biology.
Introduction
One source of variability in climate change assessments is the choice of
General Circulation Models (GCMs). There is a lack of consensus on how
to properly choose GCMs for a given study. An ideal approach would be to
encompass all GCMs, but this is exceedingly costly in terms of
computational requirements. chooseGCM
is a solution for
GCMs selection in climate change research. It proposes a solution by
supplying different algorithms, metrics and functions to deal with GCMs.
We built this Rmarkdown to showcase the impact of the package on Species
Distribution Modeling (SDMs), a field largely affected by GCM selection.
Here we use the
caretSDM
package to build SDMs. At the end, we present the processing time to
achieve each result and their correlation.
# Install and open CRAN libraries
for(x in c("devtools", "geodata", "terra", "tictoc") ){
if (!require(x, character.only = TRUE)) {
install.packages(x, dependencies = TRUE)
library(x, character.only = TRUE)
}
}
# Install and open chooseGCM
if (!require("chooseGCM", character.only = TRUE)) {
devtools::install_github("luizesser/chooseGCM")
library("chooseGCM", character.only = TRUE)
}
# Install and open caretSDM
if (!require("caretSDM", character.only = TRUE)) {
devtools::install_github("luizesser/caretSDM")
library("caretSDM", character.only = TRUE)
}
Methods
To make methods more reproducible, we set a seed, avoiding any random effect that can arise in the modeling process.
set.seed(1)
GCMs data
To predict the impact of climate change, we first downloaded the
WorldClim 2.1 (Fick
& Hijmans, 2017) data to current scenario and all available
GCMs, reducing all other sources of variation by obtaining raster stacks
only relative to the SSP5-8.5/2090 (10 arcmin resolution). This SSP is
especially useful because it presents extreme scenarios, where variation
between GCMs is higher
(IPCC, 2022).
Then, we imported GCM data to R by applying the function
import_gcms
, which returns a list
of
SpatRaster
s, each SpatRaster
being a GCM. This
structure will be used as input to all the other functions in the
chooseGCM
package.
# Download current data
geodata::worldclim_global("bio", 10, path="~/storage/WC_data/current")
# Download GCMs data
gcms <- c("ACCESS-CM2", "ACCESS-ESM1-5", "CanESM5-CanOE", "CMCC-ESM2",
"CNRM-CM6-1", "CNRM-CM6-1-HR", "CNRM-ESM2-1", "EC-Earth3-Veg",
"EC-Earth3-Veg-LR", "FIO-ESM-2-0", "GISS-E2-1-G", "GISS-E2-1-H",
"HadGEM3-GC31-LL", "INM-CM4-8", "INM-CM5-0", "IPSL-CM6A-LR",
"MIROC-ES2L", "MIROC6", "MPI-ESM1-2-HR", "MPI-ESM1-2-LR",
"MRI-ESM2-0", "UKESM1-0-LL")
for ( g in gcms ) {
geodata::cmip6_world(model = g,
ssp = "585",
time = "2081-2100",
var = "bioc",
res = 10,
path = "~/storage/WC_data/future"
)
}
# Import GCMs
s <- chooseGCM::import_gcms(path = "~/storage/WC_data/future/climate/wc2.1_10m",
gcm_names = c("ACCESS-CM2", "ACCESS-ESM1-5", "CanESM5-CanOE",
"CMCC-ESM2", "CNRM-CM6-1", "CNRM-CM6-1-HR",
"CNRM-ESM2-1", "EC-Earth3-Veg", "EC-Earth3-Veg-LR",
"FIO-ESM-2-0", "GISS-E2-1-G", "GISS-E2-1-H",
"HadGEM3-GC31-LL", "INM-CM4-8", "INM-CM5-0",
"IPSL-CM6A-LR", "MIROC-ES2L", "MIROC6",
"MPI-ESM1-2-HR", "MPI-ESM1-2-LR", "MRI-ESM2-0",
"UKESM1-0-LL"))
Species data and study area
For this demonstration, we will use Araucaria angustifolia, the brazilian pine, as a study case. This flagship species is widely modeled and know, and is a key species for the Atlantic Rainforest biodiversity hotspot. Species records were obtained from GBIF and cleaned by excluding records from outside the native region and NAs. This set of records was then used to build a shape of accessible area by merging buffers of 500 km around each record.
occ_data <- geodata::sp_occurrence("Araucaria", "angustifolia")
occ_data <- occ_data |>
dplyr::filter(lon >= -64) |>
dplyr::filter(lon <= -33) |>
dplyr::filter(lat <= -19.5) |>
dplyr::filter(lat >= -33)
study_area_Araucaria <- na.omit(occ_data[,c("lon", "lat")]) |>
sf::st_as_sf(coords= c(1,2), crs = 4326) |>
sf::st_buffer(dist = 500000) |>
sf::st_union()
plot(study_area_Araucaria, axes=T)
Using the caretSDM
package, we transformed the accessible
area shape into a 20 km regular grid projected in the coordinate
reference system (CRS) 6933, which is the most suitable metric CRS for
Brazil. This CRS keeps the area of each cell equal all along the study
area, avoiding meridian convergence that could influence area results.
We then added predictors data (WorldClim’s current data previously
downloaded) to the grid by averaging the values falling in the same
cell. Finally, we excluded “quarter” variables to avoid statistical
biases.
tictoc::tic("Building sdm_area object")
sa <- caretSDM::sdm_area(sf::st_as_sf(study_area_Araucaria),
cell_size = 20000,
crs = 6933,
gdal = T) |>
caretSDM::add_predictors(stars::read_stars(list.files("~/storage/WC_data/current/climate/wc2.1_10m/",
full.names = T),
along = "band",
normalize_path = F)) |>
caretSDM::set_predictor_names(c("bio1", "bio2", "bio3", "bio4", "bio5",
"bio6", "bio7", "bio8", "bio9", "bio10",
"bio11", "bio12", "bio13", "bio14", "bio15",
"bio16", "bio17", "bio18", "bio19")) |>
caretSDM::select_predictors(c("bio1", "bio2", "bio3", "bio4", "bio5", "bio6",
"bio7", "bio12", "bio13", "bio14", "bio15"))
## ! Making grid over study area is an expensive task. Please, be patient!
## ℹ Using GDAL to make the grid and resample the variables.
## ! Making grid over the study area is an expensive task. Please, be patient!
## ℹ Using GDAL to make the grid and resample the variables.
sa
## caretSDM
## ...........................
## Class : sdm_area
## Extent : -5893561 -4344152 -3753561 -1924152 (xmin, xmax, ymin, ymax)
## CRS : WGS 84 / NSIDC EASE-
## Resolution : (20000, 20000) (x, y)
## Number of Predictors : 11
## Predictors Names : bio1, bio2, bio3, bio4, bio5, bio6, bio7, bio12, bio13, bio14, bio15
tictoc::toc()
## Building sdm_area object: 61.312 sec elapsed
To avoid the use of multiple records with the same environmental
information, we included species records in the grid, filtering one
record per cell. When using caretSDM
, we inform the CRS of
coordinates and it transforms internally the records to the grid’s CRS.
# Merge the grid data with occurrences data
tictoc::tic("Building occurrences_sdm object")
oc <- caretSDM::occurrences_sdm(na.omit(occ_data[c("species", "lon", "lat")]), crs = 4326) |>
caretSDM::join_area(sa)
## Warning: Some records from `occ` do not fall in `pred`.
## ℹ 1 elements from `occ` were excluded.
## ℹ If this seems too much, check how `occ` and `pred` intersect.
oc
## caretSDM
## .......................
## Class : occurrences
## Species Names : Araucaria angustifolia
## Number of presences : 1706
## =========================================
## Data:
## Simple feature collection with 6 features and 2 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -5071809 ymin: -3590793 xmax: -4166610 ymax: -2786901
## Projected CRS: WGS 84 / NSIDC EASE-Grid 2.0 Global
## cell_id species geometry
## 1 10058 Araucaria angustifolia POINT (-4830370 -3565372)
## 2 5333 Araucaria angustifolia POINT (-4166610 -2786901)
## 3 8094 Araucaria angustifolia POINT (-5071809 -3259770)
## 4 5443 Araucaria angustifolia POINT (-4397387 -2823145)
## 5 5443 Araucaria angustifolia POINT (-4397271 -2823148)
## 6 10176 Araucaria angustifolia POINT (-4908967 -3590793)
tictoc::toc()
## Building occurrences_sdm object: 0.231 sec elapsed
Data cleaning routine excluded records from NAs, coordinates in capitals, countries’ centroids, duplicated coordinates, identical latitude/longitude, invalid coordinates, non-terrestrial coordinates, and records falling within the same grid cell. Coordinates from scientific institutions were kept since there are multiple important institutions in the species native region and accessible area.
# Data cleaning routine
tictoc::tic("Data cleaning")
i <- caretSDM::input_sdm(oc, sa) |>
caretSDM::data_clean(institutions = FALSE)
## Testing country capitals
## Removed 0 records.
## Testing country centroids
## Removed 0 records.
## Testing duplicates
## Removed 361 records.
## Testing equal lat/lon
## Removed 0 records.
## Testing coordinate validity
## Removed 0 records.
## Testing sea coordinates
## Reading layer `ne_110m_land' from data source `/tmp/RtmpARk8db/ne_110m_land.shp' using driver `ESRI Shapefile'
## Simple feature collection with 127 features and 3 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## Geodetic CRS: WGS 84
## Removed 1 records.
## [1] "Predictors identified, procceding with grid filter (removing NA and duplicated data)."
i
## caretSDM
## ...............................
## Class : input_sdm
## -------- Occurrences --------
## Species Names : Araucaria angustifolia
## Number of presences : 364
## Data Cleaning : NAs, Capitals, Centroids, Geographically Duplicated, Identical Lat/Long, Invalid, Non-terrestrial, Duplicated Cell (grid)
## -------- Predictors ---------
## Number of Predictors : 11
## Predictors Names : bio1, bio2, bio3, bio4, bio5, bio6, bio7, bio12, bio13, bio14, bio15
tictoc::toc()
## Data cleaning: 3.267 sec elapsed
The final number of records for the species was 364.
Variable selection routine
Variable selection was performed in WorldClim 2.1 bioclimatic variables by previously excluding ‘quarter’ variables to avoid statistical biases (Booth, 2022). We also performed a variance inflation factor (VIF) selection routine where the variable with higher VIF in the variable pair with the highest Pearson correlation is excluded until all correlations are lower than 0.5.
# Variable selection routine
tictoc::tic("Variable selection")
i <- caretSDM::vif_predictors(i, th = 0.5)
i
## caretSDM
## ...............................
## Class : input_sdm
## -------- Occurrences --------
## Species Names : Araucaria angustifolia
## Number of presences : 364
## Data Cleaning : NAs, Capitals, Centroids, Geographically Duplicated, Identical Lat/Long, Invalid, Non-terrestrial, Duplicated Cell (grid)
## -------- Predictors ---------
## Number of Predictors : 11
## Predictors Names : bio1, bio2, bio3, bio4, bio5, bio6, bio7, bio12, bio13, bio14, bio15
## Area (VIF) : all
## Threshold : 0.5
## Selected Variables (VIF) : bio3, bio4, bio5, bio12, bio15
tictoc::toc()
## Variable selection: 0.206 sec elapsed
Variables remaining were bio3 (isothermality), bio4 (temperature seasonality), bio5 (max temperature of warmest month), bio12 (annual precipitation) and bio15 (precipitation seasonality).
Pseudoabsence selection
We obtained 10 sets of pseudoabsences from outside a bioclimatic envelope built with the selected variables and projected in the accessible area. The number of pseudoabsences was always equal to the number of presences to avoid imbalance problems (Japkowicz & Stephen, 2002).
# Pseudoabsences selection
tictoc::tic("Pseudoabsence selection")
i <- caretSDM::pseudoabsences(i, n_set = 10, method = "bioclim", variables_selected = "vif")
i
## caretSDM
## ...............................
## Class : input_sdm
## -------- Occurrences --------
## Species Names : Araucaria angustifolia
## Number of presences : 364
## Pseudoabsence methods :
## Method to obtain PAs : bioclim
## Number of PA sets : 10
## Number of PAs in each set : 364
## Data Cleaning : NAs, Capitals, Centroids, Geographically Duplicated, Identical Lat/Long, Invalid, Non-terrestrial, Duplicated Cell (grid)
## -------- Predictors ---------
## Number of Predictors : 11
## Predictors Names : bio1, bio2, bio3, bio4, bio5, bio6, bio7, bio12, bio13, bio14, bio15
## Area (VIF) : all
## Threshold : 0.5
## Selected Variables (VIF) : bio3, bio4, bio5, bio12, bio15
tictoc::toc()
## Pseudoabsence selection: 0.105 sec elapsed
Modeling framework
As the aim of this modeling is to highlight the effect of GCMs and the GCMs selection, we used only one machine learning algorithm to build models: the naive bayes, which is a simple and quick algorithm. If we had a proper modeling hypothesis to test we should probably use more algorithms covering a variety of modeling approaches. We searched for optimal hyperparameters using a grid-based approach with 10 different combinations. To validate models, we used a cross-validation approach with 4-folds. In each fold, the area under the receiver operating characteristic curve was calculated (AUC).
# Modeling framework
tictoc::tic("Modeling framework")
suppressWarnings(
i <- caretSDM::train_sdm(i,
algo = c("naive_bayes"),
crtl = caret::trainControl(method = "repeatedcv",
number = 4,
repeats = 10,
search = "grid",
classProbs = TRUE,
returnResamp = "all",
summaryFunction = caretSDM::summary_sdm,
savePredictions = "all"),
variables_selected = "vif")
)
i
tictoc::toc()
## caretSDM
## ...............................
## Class : input_sdm
## -------- Occurrences --------
## Species Names : Araucaria angustifolia
## Number of presences : 364
## Pseudoabsence methods :
## Method to obtain PAs : bioclim
## Number of PA sets : 10
## Number of PAs in each set : 364
## Data Cleaning : NAs, Capitals, Centroids, Geographically Duplicated, Identical Lat/Long, Invalid, Non-terrestrial, Duplicated Cell (grid)
## -------- Predictors ---------
## Number of Predictors : 11
## Predictors Names : bio1, bio2, bio3, bio4, bio5, bio6, bio7, bio12, bio13, bio14, bio15
## Area (VIF) : all
## Threshold : 0.5
## Selected Variables (VIF) : bio3, bio4, bio5, bio12, bio15
## ----------- Models ----------
## Algorithms Names : naive_bayes
## Variables Names : bio3 bio4 bio5 bio12 bio15
## Model Validation :
## Method : repeatedcv
## Number : 4
## Metrics :
## $`Araucaria angustifolia`
## algo ROC TSS Sensitivity Specificity
## 1 naive_bayes 0.9930081 0.9304945 0.95435 0.976075
## [1] "Modeling framework: 3.918 sec elapsed"
GCMs selection
We selected four sets of GCMs through chooseGCM
: (1) the
baseline, which is a global set including all GCMs; (2) the smallest
subset of GCMs selected using a hierarchical clustering with mean
distance closer to the global mean distance; (3) the smallest subset of
GCMs selected using the K-means clustering algorithm with mean distance
closer to the global mean distance; (4) the subset of GCMs automatically
provided by Closestdist algorithm. After some pre-analysis, both subsets
generated by the hierarchical and K-means clustering returned a subset
of size three. In this sense, we also considered important to analize
two more sets: (5) a random set, which includes three randomly selected
GCMs; (6) the subset of GCMs selected using the Closestdist algorithm
with size equals three.
Global set
# All GCMs - the baseline we are trying to replicate
tictoc::tic("All GCMs")
all_gcms <- c("ACCESS-CM2", "ACCESS-ESM1-5", "CanESM5-CanOE", "CMCC-ESM2",
"CNRM-CM6-1", "CNRM-CM6-1-HR", "CNRM-ESM2-1",
"EC-Earth3-Veg", "EC-Earth3-Veg-LR", "FIO-ESM-2-0",
"GISS-E2-1-G", "GISS-E2-1-H", "HadGEM3-GC31-LL",
"INM-CM4-8", "INM-CM5-0", "IPSL-CM6A-LR",
"MIROC-ES2L", "MIROC6", "MPI-ESM1-2-HR",
"MPI-ESM1-2-LR", "MRI-ESM2-0", "UKESM1-0-LL")
tictoc::toc()
## All GCMs: 0.002 sec elapsed
Random set
# Random subset
tictoc::tic("Random subset")
random_subset <- sample(all_gcms, 3)
random_subset
## [1] "HadGEM3-GC31-LL" "MIROC-ES2L" "GISS-E2-1-G"
tictoc::toc()
## Random subset: 0.003 sec elapsed
Hclust set
# Hierarchical Clustering subset
tictoc::tic("Hierarchical Clustering subset")
mc_hclust <- montecarlo_gcms(s,
var_names = caretSDM::selected_variables(i),
study_area = study_area_Araucaria,
scale = TRUE,
perm = 10000,
dist_method = "euclidean",
clustering_method = "hclust")
## Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
## of ggplot2 3.3.4.
## ℹ The deprecated feature was likely used in the factoextra package.
## Please report the issue at <https://github.com/kassambara/factoextra/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
mc_hclust
## $montecarlo_plot
##
## $suggested_gcms
## $suggested_gcms$k2
## [1] "ACCESS-ESM1-5" "CanESM5-CanOE"
##
## $suggested_gcms$k3
## [1] "ACCESS-ESM1-5" "CanESM5-CanOE" "MPI-ESM1-2-HR"
##
## $suggested_gcms$k4
## [1] "ACCESS-ESM1-5" "CanESM5-CanOE" "MPI-ESM1-2-HR" "EC-Earth3-Veg"
##
## $suggested_gcms$k5
## [1] "FIO-ESM-2-0" "ACCESS-ESM1-5" "CanESM5-CanOE" "MPI-ESM1-2-HR"
## [5] "EC-Earth3-Veg"
##
## $suggested_gcms$k6
## [1] "FIO-ESM-2-0" "ACCESS-ESM1-5" "CanESM5-CanOE" "MPI-ESM1-2-HR"
## [5] "EC-Earth3-Veg" "MPI-ESM1-2-LR"
##
## $suggested_gcms$k7
## [1] "FIO-ESM-2-0" "ACCESS-ESM1-5" "CanESM5-CanOE" "MPI-ESM1-2-HR"
## [5] "EC-Earth3-Veg" "GISS-E2-1-G" "MPI-ESM1-2-LR"
##
## $suggested_gcms$k8
## [1] "FIO-ESM-2-0" "ACCESS-ESM1-5" "CanESM5-CanOE" "CMCC-ESM2"
## [5] "MPI-ESM1-2-HR" "EC-Earth3-Veg" "GISS-E2-1-G" "MPI-ESM1-2-LR"
##
## $suggested_gcms$k9
## [1] "FIO-ESM-2-0" "ACCESS-ESM1-5" "CanESM5-CanOE" "CMCC-ESM2"
## [5] "MPI-ESM1-2-HR" "MIROC-ES2L" "EC-Earth3-Veg" "GISS-E2-1-G"
## [9] "MPI-ESM1-2-LR"
##
## $suggested_gcms$k10
## [1] "FIO-ESM-2-0" "ACCESS-ESM1-5" "CanESM5-CanOE" "CMCC-ESM2"
## [5] "MPI-ESM1-2-HR" "INM-CM5-0" "EC-Earth3-Veg" "GISS-E2-1-G"
## [9] "MIROC-ES2L" "MPI-ESM1-2-LR"
##
## $suggested_gcms$k11
## [1] "FIO-ESM-2-0" "ACCESS-ESM1-5" "CanESM5-CanOE" "CMCC-ESM2"
## [5] "MPI-ESM1-2-HR" "INM-CM5-0" "EC-Earth3-Veg" "GISS-E2-1-G"
## [9] "IPSL-CM6A-LR" "MIROC-ES2L" "MPI-ESM1-2-LR"
##
## $suggested_gcms$k12
## [1] "FIO-ESM-2-0" "ACCESS-ESM1-5" "CanESM5-CanOE" "CMCC-ESM2"
## [5] "CNRM-ESM2-1" "INM-CM5-0" "EC-Earth3-Veg" "GISS-E2-1-G"
## [9] "MPI-ESM1-2-HR" "IPSL-CM6A-LR" "MIROC-ES2L" "MPI-ESM1-2-LR"
##
## $suggested_gcms$k13
## [1] "FIO-ESM-2-0" "ACCESS-ESM1-5" "CanESM5-CanOE" "CMCC-ESM2"
## [5] "CNRM-ESM2-1" "INM-CM5-0" "EC-Earth3-Veg" "GISS-E2-1-G"
## [9] "INM-CM4-8" "IPSL-CM6A-LR" "MIROC-ES2L" "MPI-ESM1-2-HR"
## [13] "MPI-ESM1-2-LR"
##
## $suggested_gcms$k14
## [1] "ACCESS-CM2" "ACCESS-ESM1-5" "CanESM5-CanOE" "CMCC-ESM2"
## [5] "CNRM-ESM2-1" "INM-CM5-0" "EC-Earth3-Veg" "FIO-ESM-2-0"
## [9] "GISS-E2-1-G" "INM-CM4-8" "IPSL-CM6A-LR" "MIROC-ES2L"
## [13] "MPI-ESM1-2-HR" "MPI-ESM1-2-LR"
##
## $suggested_gcms$k15
## [1] "ACCESS-CM2" "ACCESS-ESM1-5" "CanESM5-CanOE" "CMCC-ESM2"
## [5] "CNRM-ESM2-1" "INM-CM5-0" "EC-Earth3-Veg" "FIO-ESM-2-0"
## [9] "GISS-E2-1-G" "INM-CM4-8" "IPSL-CM6A-LR" "MIROC-ES2L"
## [13] "MIROC6" "MPI-ESM1-2-HR" "MPI-ESM1-2-LR"
##
## $suggested_gcms$k16
## [1] "ACCESS-CM2" "ACCESS-ESM1-5" "CanESM5-CanOE" "CMCC-ESM2"
## [5] "CNRM-ESM2-1" "CNRM-CM6-1-HR" "EC-Earth3-Veg" "FIO-ESM-2-0"
## [9] "GISS-E2-1-G" "INM-CM4-8" "INM-CM5-0" "IPSL-CM6A-LR"
## [13] "MIROC-ES2L" "MIROC6" "MPI-ESM1-2-HR" "MPI-ESM1-2-LR"
##
## $suggested_gcms$k17
## [1] "ACCESS-CM2" "ACCESS-ESM1-5" "CanESM5-CanOE" "CMCC-ESM2"
## [5] "CNRM-ESM2-1" "CNRM-CM6-1-HR" "EC-Earth3-Veg" "FIO-ESM-2-0"
## [9] "GISS-E2-1-G" "INM-CM4-8" "INM-CM5-0" "IPSL-CM6A-LR"
## [13] "MIROC-ES2L" "MIROC6" "MPI-ESM1-2-HR" "MPI-ESM1-2-LR"
## [17] "MRI-ESM2-0"
##
## $suggested_gcms$k18
## [1] "ACCESS-CM2" "ACCESS-ESM1-5" "CanESM5-CanOE" "CMCC-ESM2"
## [5] "CNRM-ESM2-1" "CNRM-CM6-1-HR" "EC-Earth3-Veg" "FIO-ESM-2-0"
## [9] "GISS-E2-1-G" "GISS-E2-1-H" "INM-CM4-8" "INM-CM5-0"
## [13] "IPSL-CM6A-LR" "MIROC-ES2L" "MIROC6" "MPI-ESM1-2-HR"
## [17] "MPI-ESM1-2-LR" "MRI-ESM2-0"
##
## $suggested_gcms$k19
## [1] "ACCESS-CM2" "ACCESS-ESM1-5" "CanESM5-CanOE" "CMCC-ESM2"
## [5] "CNRM-ESM2-1" "CNRM-CM6-1-HR" "EC-Earth3-Veg" "FIO-ESM-2-0"
## [9] "GISS-E2-1-G" "GISS-E2-1-H" "HadGEM3-GC31-LL" "INM-CM4-8"
## [13] "INM-CM5-0" "IPSL-CM6A-LR" "MIROC-ES2L" "MIROC6"
## [17] "MPI-ESM1-2-HR" "MPI-ESM1-2-LR" "MRI-ESM2-0"
##
## $suggested_gcms$k20
## [1] "ACCESS-CM2" "ACCESS-ESM1-5" "CanESM5-CanOE" "CMCC-ESM2"
## [5] "CNRM-ESM2-1" "CNRM-CM6-1-HR" "EC-Earth3-Veg" "EC-Earth3-Veg-LR"
## [9] "FIO-ESM-2-0" "GISS-E2-1-G" "GISS-E2-1-H" "HadGEM3-GC31-LL"
## [13] "INM-CM4-8" "INM-CM5-0" "IPSL-CM6A-LR" "MIROC-ES2L"
## [17] "MIROC6" "MPI-ESM1-2-HR" "MPI-ESM1-2-LR" "MRI-ESM2-0"
##
## $suggested_gcms$k21
## [1] "ACCESS-CM2" "ACCESS-ESM1-5" "CanESM5-CanOE" "CMCC-ESM2"
## [5] "CNRM-CM6-1" "CNRM-CM6-1-HR" "CNRM-ESM2-1" "EC-Earth3-Veg"
## [9] "EC-Earth3-Veg-LR" "FIO-ESM-2-0" "GISS-E2-1-G" "GISS-E2-1-H"
## [13] "HadGEM3-GC31-LL" "INM-CM4-8" "INM-CM5-0" "IPSL-CM6A-LR"
## [17] "MIROC-ES2L" "MIROC6" "MPI-ESM1-2-HR" "MPI-ESM1-2-LR"
## [21] "MRI-ESM2-0"
hclust_subset <- mc_hclust$suggested_gcms$k3
tictoc::toc()
## Hierarchical Clustering subset: 16.691 sec elapsed
Kmeans set
# K-means Clustering subset
tictoc::tic("K-means Clustering subset")
mc_kmeans <- montecarlo_gcms(s,
var_names = caretSDM::selected_variables(i),
study_area = study_area_Araucaria,
scale = TRUE,
perm = 10000,
dist_method = "euclidean",
clustering_method = "kmeans")
mc_kmeans
## $montecarlo_plot
##
## $suggested_gcms
## $suggested_gcms$k2
## 1 2
## "EC-Earth3-Veg-LR" "CNRM-ESM2-1"
##
## $suggested_gcms$k3
## 1 2 3
## "EC-Earth3-Veg-LR" "UKESM1-0-LL" "CNRM-ESM2-1"
##
## $suggested_gcms$k4
## 1 2 3 4
## "GISS-E2-1-G" "EC-Earth3-Veg-LR" "UKESM1-0-LL" "CNRM-ESM2-1"
##
## $suggested_gcms$k5
## 1 2 3 4 5
## "EC-Earth3-Veg" "UKESM1-0-LL" "CNRM-ESM2-1" "CanESM5-CanOE" "GISS-E2-1-G"
##
## $suggested_gcms$k6
## 1 2 3 4 5
## "CNRM-ESM2-1" "CanESM5-CanOE" "UKESM1-0-LL" "MPI-ESM1-2-LR" "MIROC-ES2L"
## 6
## "EC-Earth3-Veg"
##
## $suggested_gcms$k7
## 1 2 3 4 5
## "MIROC-ES2L" "MPI-ESM1-2-LR" "EC-Earth3-Veg" "CMCC-ESM2" "UKESM1-0-LL"
## 6 7
## "CNRM-ESM2-1" "CanESM5-CanOE"
##
## $suggested_gcms$k8
## 1 2 3 4 5
## "CMCC-ESM2" "UKESM1-0-LL" "MPI-ESM1-2-LR" "MRI-ESM2-0" "EC-Earth3-Veg"
## 6 7 8
## "CanESM5-CanOE" "CNRM-CM6-1" "GISS-E2-1-G"
##
## $suggested_gcms$k9
## 1 2 3 4 5
## "MRI-ESM2-0" "GISS-E2-1-G" "IPSL-CM6A-LR" "CMCC-ESM2" "CNRM-CM6-1"
## 6 7 8 9
## "UKESM1-0-LL" "CanESM5-CanOE" "EC-Earth3-Veg" "MPI-ESM1-2-LR"
##
## $suggested_gcms$k10
## 1 2 3 4 5
## "CNRM-CM6-1" "MPI-ESM1-2-LR" "MRI-ESM2-0" "MIROC-ES2L" "GISS-E2-1-G"
## 6 7 8 9 10
## "CanESM5-CanOE" "EC-Earth3-Veg" "UKESM1-0-LL" "CMCC-ESM2" "IPSL-CM6A-LR"
##
## $suggested_gcms$k11
## 1 2 3 4 5
## "CNRM-CM6-1-HR" "MIROC-ES2L" "MPI-ESM1-2-LR" "EC-Earth3-Veg" "GISS-E2-1-G"
## 6 7 8 9 10
## "CNRM-ESM2-1" "MRI-ESM2-0" "CMCC-ESM2" "UKESM1-0-LL" "CanESM5-CanOE"
## 11
## "IPSL-CM6A-LR"
##
## $suggested_gcms$k12
## 1 2 3 4 5
## "GISS-E2-1-G" "CNRM-CM6-1" "MPI-ESM1-2-LR" "MIROC-ES2L" "CMCC-ESM2"
## 6 7 8 9 10
## "UKESM1-0-LL" "CNRM-CM6-1-HR" "MPI-ESM1-2-HR" "EC-Earth3-Veg" "IPSL-CM6A-LR"
## 11 12
## "MRI-ESM2-0" "CanESM5-CanOE"
##
## $suggested_gcms$k13
## 1 2 3 4 5
## "UKESM1-0-LL" "INM-CM4-8" "CMCC-ESM2" "MIROC-ES2L" "IPSL-CM6A-LR"
## 6 7 8 9 10
## "CNRM-CM6-1" "GISS-E2-1-G" "CanESM5-CanOE" "MRI-ESM2-0" "EC-Earth3-Veg"
## 11 12 13
## "CNRM-CM6-1-HR" "MPI-ESM1-2-LR" "MPI-ESM1-2-HR"
##
## $suggested_gcms$k14
## 1 2 3 4 5
## "CanESM5-CanOE" "CNRM-CM6-1" "FIO-ESM-2-0" "MPI-ESM1-2-HR" "UKESM1-0-LL"
## 6 7 8 9 10
## "CMCC-ESM2" "ACCESS-CM2" "GISS-E2-1-G" "EC-Earth3-Veg" "MIROC-ES2L"
## 11 12 13 14
## "INM-CM4-8" "CNRM-CM6-1-HR" "IPSL-CM6A-LR" "MPI-ESM1-2-LR"
##
## $suggested_gcms$k15
## 1 2 3 4 5
## "CMCC-ESM2" "MPI-ESM1-2-LR" "MIROC-ES2L" "GISS-E2-1-G" "MIROC6"
## 6 7 8 9 10
## "INM-CM4-8" "IPSL-CM6A-LR" "CNRM-CM6-1-HR" "CanESM5-CanOE" "MPI-ESM1-2-HR"
## 11 12 13 14 15
## "FIO-ESM-2-0" "ACCESS-CM2" "UKESM1-0-LL" "EC-Earth3-Veg" "CNRM-CM6-1"
##
## $suggested_gcms$k16
## 1 2 3 4 5
## "GISS-E2-1-H" "IPSL-CM6A-LR" "ACCESS-CM2" "MPI-ESM1-2-HR" "MPI-ESM1-2-LR"
## 6 7 8 9 10
## "MIROC6" "CNRM-CM6-1" "UKESM1-0-LL" "FIO-ESM-2-0" "CNRM-CM6-1-HR"
## 11 12 13 14 15
## "GISS-E2-1-G" "CanESM5-CanOE" "MIROC-ES2L" "CMCC-ESM2" "EC-Earth3-Veg"
## 16
## "INM-CM4-8"
##
## $suggested_gcms$k17
## 1 2 3 4 5
## "GISS-E2-1-H" "MIROC6" "CanESM5-CanOE" "MPI-ESM1-2-LR" "FIO-ESM-2-0"
## 6 7 8 9 10
## "MIROC-ES2L" "INM-CM5-0" "INM-CM4-8" "CMCC-ESM2" "ACCESS-CM2"
## 11 12 13 14 15
## "CNRM-CM6-1-HR" "IPSL-CM6A-LR" "GISS-E2-1-G" "MPI-ESM1-2-HR" "UKESM1-0-LL"
## 16 17
## "EC-Earth3-Veg" "CNRM-CM6-1"
##
## $suggested_gcms$k18
## 1 2 3 4 5
## "GISS-E2-1-H" "CNRM-CM6-1" "MRI-ESM2-0" "CanESM5-CanOE" "CNRM-CM6-1-HR"
## 6 7 8 9 10
## "INM-CM5-0" "CMCC-ESM2" "FIO-ESM-2-0" "EC-Earth3-Veg" "MIROC-ES2L"
## 11 12 13 14 15
## "IPSL-CM6A-LR" "MIROC6" "MPI-ESM1-2-HR" "INM-CM4-8" "ACCESS-CM2"
## 16 17 18
## "GISS-E2-1-G" "MPI-ESM1-2-LR" "UKESM1-0-LL"
##
## $suggested_gcms$k19
## 1 2 3 4
## "EC-Earth3-Veg" "MIROC-ES2L" "CanESM5-CanOE" "GISS-E2-1-G"
## 5 6 7 8
## "IPSL-CM6A-LR" "FIO-ESM-2-0" "ACCESS-ESM1-5" "INM-CM4-8"
## 9 10 11 12
## "ACCESS-CM2" "INM-CM5-0" "CNRM-CM6-1" "GISS-E2-1-H"
## 13 14 15 16
## "CMCC-ESM2" "MIROC6" "MPI-ESM1-2-HR" "CNRM-CM6-1-HR"
## 17 18 19
## "HadGEM3-GC31-LL" "MRI-ESM2-0" "MPI-ESM1-2-LR"
##
## $suggested_gcms$k20
## 1 2 3 4
## "MIROC-ES2L" "EC-Earth3-Veg" "MRI-ESM2-0" "GISS-E2-1-G"
## 5 6 7 8
## "GISS-E2-1-H" "CMCC-ESM2" "FIO-ESM-2-0" "MPI-ESM1-2-HR"
## 9 10 11 12
## "MIROC6" "CNRM-CM6-1-HR" "HadGEM3-GC31-LL" "CNRM-CM6-1"
## 13 14 15 16
## "IPSL-CM6A-LR" "INM-CM5-0" "ACCESS-CM2" "INM-CM4-8"
## 17 18 19 20
## "MPI-ESM1-2-LR" "ACCESS-ESM1-5" "CanESM5-CanOE" "CNRM-ESM2-1"
##
## $suggested_gcms$k21
## 1 2 3 4
## "HadGEM3-GC31-LL" "CNRM-CM6-1" "MPI-ESM1-2-HR" "GISS-E2-1-H"
## 5 6 7 8
## "MPI-ESM1-2-LR" "CNRM-ESM2-1" "ACCESS-CM2" "FIO-ESM-2-0"
## 9 10 11 12
## "CMCC-ESM2" "GISS-E2-1-G" "CanESM5-CanOE" "MRI-ESM2-0"
## 13 14 15 16
## "EC-Earth3-Veg" "INM-CM5-0" "INM-CM4-8" "EC-Earth3-Veg-LR"
## 17 18 19 20
## "CNRM-CM6-1-HR" "ACCESS-ESM1-5" "MIROC6" "MIROC-ES2L"
## 21
## "IPSL-CM6A-LR"
kmeans_subset <- mc_kmeans$suggested_gcms$k3
tictoc::toc()
## K-means Clustering subset: 27.4 sec elapsed
Closestdist set
# Closestdist algorithm subset
tictoc::tic("Closestdist algorithm subset")
mc_close <- closestdist_gcms(s,
var_names = caretSDM::selected_variables(i),
study_area = study_area_Araucaria,
method = "euclidean")
## CRS from s and study_area are not identical. Reprojecting study area.
mc_close
## $suggested_gcms
## [1] "ACCESS-ESM1-5" "CMCC-ESM2" "MIROC-ES2L" "CNRM-CM6-1-HR"
##
## $best_mean_diff
## [1] 0.0060211
##
## $global_mean
## [1] 69.70944
closestdist_subset <- mc_close$suggested_gcms
tictoc::toc()
## Closestdist algorithm subset: 2.806 sec elapsed
Closestdist3 set
# Closestdist algorithm subset
tictoc::tic("Closestdist algorithm subset")
mc_close3 <- montecarlo_gcms(s,
var_names = caretSDM::selected_variables(i),
study_area = study_area_Araucaria,
scale = TRUE,
perm = 10000,
dist_method = "euclidean",
clustering_method = "closestdist")
mc_close3
## $montecarlo_plot
##
## $suggested_gcms
## $suggested_gcms$k2
## [1] "CMCC-ESM2" "MPI-ESM1-2-HR"
##
## $suggested_gcms$k3
## [1] "CNRM-CM6-1" "FIO-ESM-2-0" "MPI-ESM1-2-LR"
##
## $suggested_gcms$k4
## [1] "ACCESS-ESM1-5" "CMCC-ESM2" "MIROC-ES2L" "CNRM-CM6-1-HR"
##
## $suggested_gcms$k5
## [1] "CNRM-CM6-1-HR" "GISS-E2-1-G" "UKESM1-0-LL" "IPSL-CM6A-LR"
## [5] "FIO-ESM-2-0"
##
## $suggested_gcms$k6
## [1] "GISS-E2-1-H" "MPI-ESM1-2-HR" "IPSL-CM6A-LR" "MIROC6"
## [5] "CMCC-ESM2" "HadGEM3-GC31-LL"
##
## $suggested_gcms$k7
## [1] "ACCESS-ESM1-5" "EC-Earth3-Veg-LR" "ACCESS-CM2" "CMCC-ESM2"
## [5] "FIO-ESM-2-0" "MIROC-ES2L" "INM-CM5-0"
##
## $suggested_gcms$k8
## [1] "INM-CM4-8" "MIROC6" "MPI-ESM1-2-LR" "HadGEM3-GC31-LL"
## [5] "GISS-E2-1-H" "UKESM1-0-LL" "CMCC-ESM2" "GISS-E2-1-G"
##
## $suggested_gcms$k9
## [1] "CNRM-ESM2-1" "UKESM1-0-LL" "EC-Earth3-Veg-LR" "MRI-ESM2-0"
## [5] "EC-Earth3-Veg" "CMCC-ESM2" "MPI-ESM1-2-HR" "MIROC-ES2L"
## [9] "INM-CM5-0"
##
## $suggested_gcms$k10
## [1] "EC-Earth3-Veg-LR" "HadGEM3-GC31-LL" "ACCESS-CM2" "CMCC-ESM2"
## [5] "EC-Earth3-Veg" "CNRM-ESM2-1" "FIO-ESM-2-0" "IPSL-CM6A-LR"
## [9] "MPI-ESM1-2-HR" "INM-CM4-8"
##
## $suggested_gcms$k11
## [1] "CMCC-ESM2" "EC-Earth3-Veg-LR" "IPSL-CM6A-LR" "HadGEM3-GC31-LL"
## [5] "CNRM-ESM2-1" "EC-Earth3-Veg" "MPI-ESM1-2-HR" "MRI-ESM2-0"
## [9] "MIROC-ES2L" "INM-CM5-0" "INM-CM4-8"
##
## $suggested_gcms$k12
## [1] "ACCESS-CM2" "ACCESS-ESM1-5" "EC-Earth3-Veg" "CMCC-ESM2"
## [5] "FIO-ESM-2-0" "IPSL-CM6A-LR" "EC-Earth3-Veg-LR" "CNRM-ESM2-1"
## [9] "INM-CM5-0" "MRI-ESM2-0" "INM-CM4-8" "MIROC6"
##
## $suggested_gcms$k13
## [1] "CanESM5-CanOE" "EC-Earth3-Veg" "EC-Earth3-Veg-LR" "UKESM1-0-LL"
## [5] "HadGEM3-GC31-LL" "CNRM-CM6-1" "ACCESS-CM2" "MPI-ESM1-2-HR"
## [9] "INM-CM5-0" "MRI-ESM2-0" "FIO-ESM-2-0" "MIROC6"
## [13] "INM-CM4-8"
##
## $suggested_gcms$k14
## [1] "ACCESS-CM2" "INM-CM5-0" "CMCC-ESM2" "IPSL-CM6A-LR"
## [5] "CNRM-CM6-1-HR" "GISS-E2-1-H" "GISS-E2-1-G" "UKESM1-0-LL"
## [9] "MPI-ESM1-2-LR" "MIROC-ES2L" "MIROC6" "ACCESS-ESM1-5"
## [13] "EC-Earth3-Veg-LR" "CNRM-ESM2-1"
##
## $suggested_gcms$k15
## [1] "ACCESS-CM2" "EC-Earth3-Veg" "HadGEM3-GC31-LL" "IPSL-CM6A-LR"
## [5] "FIO-ESM-2-0" "CMCC-ESM2" "MIROC-ES2L" "MIROC6"
## [9] "INM-CM4-8" "CNRM-CM6-1-HR" "GISS-E2-1-G" "GISS-E2-1-H"
## [13] "UKESM1-0-LL" "MPI-ESM1-2-LR" "ACCESS-ESM1-5"
##
## $suggested_gcms$k16
## [1] "EC-Earth3-Veg" "GISS-E2-1-H" "EC-Earth3-Veg-LR" "ACCESS-ESM1-5"
## [5] "UKESM1-0-LL" "MPI-ESM1-2-HR" "INM-CM5-0" "CNRM-CM6-1-HR"
## [9] "MRI-ESM2-0" "MIROC6" "IPSL-CM6A-LR" "FIO-ESM-2-0"
## [13] "INM-CM4-8" "GISS-E2-1-G" "CMCC-ESM2" "MIROC-ES2L"
##
## $suggested_gcms$k17
## [1] "EC-Earth3-Veg" "INM-CM4-8" "CNRM-ESM2-1" "CNRM-CM6-1"
## [5] "GISS-E2-1-H" "FIO-ESM-2-0" "IPSL-CM6A-LR" "GISS-E2-1-G"
## [9] "MIROC6" "CMCC-ESM2" "ACCESS-ESM1-5" "CNRM-CM6-1-HR"
## [13] "MPI-ESM1-2-LR" "MPI-ESM1-2-HR" "UKESM1-0-LL" "EC-Earth3-Veg-LR"
## [17] "HadGEM3-GC31-LL"
##
## $suggested_gcms$k18
## [1] "EC-Earth3-Veg" "MPI-ESM1-2-HR" "EC-Earth3-Veg-LR" "IPSL-CM6A-LR"
## [5] "CNRM-ESM2-1" "ACCESS-CM2" "MRI-ESM2-0" "INM-CM5-0"
## [9] "INM-CM4-8" "MIROC6" "GISS-E2-1-H" "GISS-E2-1-G"
## [13] "CNRM-CM6-1-HR" "CMCC-ESM2" "ACCESS-ESM1-5" "FIO-ESM-2-0"
## [17] "MPI-ESM1-2-LR" "MIROC-ES2L"
##
## $suggested_gcms$k19
## [1] "ACCESS-CM2" "CanESM5-CanOE" "ACCESS-ESM1-5" "IPSL-CM6A-LR"
## [5] "CNRM-CM6-1-HR" "MIROC-ES2L" "INM-CM4-8" "GISS-E2-1-H"
## [9] "GISS-E2-1-G" "FIO-ESM-2-0" "CMCC-ESM2" "UKESM1-0-LL"
## [13] "MIROC6" "MPI-ESM1-2-LR" "INM-CM5-0" "MRI-ESM2-0"
## [17] "HadGEM3-GC31-LL" "EC-Earth3-Veg-LR" "CNRM-ESM2-1"
##
## $suggested_gcms$k20
## [1] "CanESM5-CanOE" "EC-Earth3-Veg" "EC-Earth3-Veg-LR" "UKESM1-0-LL"
## [5] "HadGEM3-GC31-LL" "CNRM-CM6-1" "ACCESS-CM2" "MPI-ESM1-2-HR"
## [9] "INM-CM5-0" "MRI-ESM2-0" "FIO-ESM-2-0" "MIROC6"
## [13] "INM-CM4-8" "IPSL-CM6A-LR" "CNRM-CM6-1-HR" "GISS-E2-1-H"
## [17] "GISS-E2-1-G" "MIROC-ES2L" "CMCC-ESM2" "ACCESS-ESM1-5"
##
## $suggested_gcms$k21
## [1] "EC-Earth3-Veg" "INM-CM4-8" "CNRM-ESM2-1" "CNRM-CM6-1"
## [5] "GISS-E2-1-H" "FIO-ESM-2-0" "IPSL-CM6A-LR" "GISS-E2-1-G"
## [9] "MIROC6" "CMCC-ESM2" "ACCESS-ESM1-5" "CNRM-CM6-1-HR"
## [13] "MPI-ESM1-2-LR" "MPI-ESM1-2-HR" "UKESM1-0-LL" "EC-Earth3-Veg-LR"
## [17] "HadGEM3-GC31-LL" "INM-CM5-0" "MRI-ESM2-0" "CanESM5-CanOE"
## [21] "ACCESS-CM2"
closestdist_subset3 <- mc_close3$suggested_gcms$k3
tictoc::toc()
## Closestdist algorithm subset: 21.779 sec elapsed
# Initialize an empty list to store the mean layers
mean_layers <- list()
# Loop through each layer index
for (j in caretSDM::get_predictor_names(i)) {
# Extract the i-th layer from each SpatRaster in the list
layer_stack <- lapply(s, function(x) x[[j]])
# Calculate the mean of the extracted layers
mean_layers[[j]] <- mean(rast(layer_stack))
}
# Combine the mean layers into a single SpatRaster object
mean_gcm <- rast(mean_layers)
# Transform to stars to use in caretSDM
mean_gcm <- stars::st_as_stars(mean_gcm)
names(mean_gcm) <- "mean_gcm"
Building Projections
Model projections were performed to each GCM set built in the previous
section. We projected models to future scenarios of climate change,
encompassing an extreme shared socioeconomic pathway (SSP5-8.5) to the
year 2090 to the given subsets of GCMs selected using
chooseGCM
. We calculated the mean of the GCM projections
(probability of occurrence) to obtain a final projection for each subset
(mean probability of occurrence). We also stored the time taken to
perform each projection. The ensemble of GCM projections was calculated
using the average.
scen <- stars::read_stars(
list.files("~/storage/WC_data/future/climate/wc2.1_10m",
full.names = T))[,,,c("bio01","bio02", "bio03","bio04", "bio05",
"bio06", "bio07", "bio12", "bio13", "bio14", "bio15")]
tictoc::tic("Projecting to all scenarios")
i_all <- i |>
caretSDM::add_scenarios() |>
caretSDM::add_scenarios(scen) |>
caretSDM::predict_sdm() |>
caretSDM::gcms_ensembles(gcms = all_gcms)
## s1_names s2_names
## 1 bio01 bio1
## 2 bio02 bio2
## 3 bio03 bio3
## 4 bio04 bio4
## 5 bio05 bio5
## 6 bio06 bio6
## 7 bio07 bio7
## 8 bio12 bio12
## 9 bio13 bio13
## 10 bio14 bio14
## 11 bio15 bio15
## [1] "Projecting: 1/24"
## [1] "Projecting: 2/24"
## [1] "Projecting: 3/24"
## [1] "Projecting: 4/24"
## [1] "Projecting: 5/24"
## [1] "Projecting: 6/24"
## [1] "Projecting: 7/24"
## [1] "Projecting: 8/24"
## [1] "Projecting: 9/24"
## [1] "Projecting: 10/24"
## [1] "Projecting: 11/24"
## [1] "Projecting: 12/24"
## [1] "Projecting: 13/24"
## [1] "Projecting: 14/24"
## [1] "Projecting: 15/24"
## [1] "Projecting: 16/24"
## [1] "Projecting: 17/24"
## [1] "Projecting: 18/24"
## [1] "Projecting: 19/24"
## [1] "Projecting: 20/24"
## [1] "Projecting: 21/24"
## [1] "Projecting: 22/24"
## [1] "Projecting: 23/24"
## [1] "Projecting: 24/24"
## [1] "Ensembling..."
## [1] "current"
## [1] "Araucaria angustifolia"
## [1] "ACCESS-CM2_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "ACCESS-ESM1-5_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "CanESM5-CanOE_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "CMCC-ESM2_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "CNRM-CM6-1_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "CNRM-CM6-1-HR_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "CNRM-ESM2-1_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "EC-Earth3-Veg_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "EC-Earth3-Veg-LR_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "FIO-ESM-2-0_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "GISS-E2-1-G_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "GISS-E2-1-H_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "HadGEM3-GC31-LL_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "INM-CM4-8_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "INM-CM5-0_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "IPSL-CM6A-LR_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "MIROC-ES2L_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "MIROC6_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "MPI-ESM1-2-HR_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "MPI-ESM1-2-LR_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "MRI-ESM2-0_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "UKESM1-0-LL_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "current"
## [1] "Araucaria angustifolia"
time_all <- tictoc::toc()
## Projecting to all scenarios: 2221.794 sec elapsed
tictoc::tic("Projecting to random_subset")
i_random <- i |>
caretSDM::add_scenarios() |>
caretSDM::add_scenarios(scen[paste0(random_subset, "_ssp585_2081-2100.tif")]) |>
caretSDM::predict_sdm() |>
caretSDM::gcms_ensembles(gcms = random_subset)
## s1_names s2_names
## 1 bio01 bio1
## 2 bio02 bio2
## 3 bio03 bio3
## 4 bio04 bio4
## 5 bio05 bio5
## 6 bio06 bio6
## 7 bio07 bio7
## 8 bio12 bio12
## 9 bio13 bio13
## 10 bio14 bio14
## 11 bio15 bio15
## [1] "Projecting: 1/5"
## [1] "Projecting: 2/5"
## [1] "Projecting: 3/5"
## [1] "Projecting: 4/5"
## [1] "Projecting: 5/5"
## [1] "Ensembling..."
## [1] "current"
## [1] "Araucaria angustifolia"
## [1] "HadGEM3-GC31-LL_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "MIROC-ES2L_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "GISS-E2-1-G_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "current"
## [1] "Araucaria angustifolia"
time_random <- tictoc::toc()
## Projecting to random_subset: 368.941 sec elapsed
tictoc::tic("Projecting to hclust_subset")
i_hclust <- i |>
caretSDM::add_scenarios() |>
caretSDM::add_scenarios(scen[paste0(hclust_subset, "_ssp585_2081-2100.tif")]) |>
caretSDM::predict_sdm() |>
caretSDM::gcms_ensembles(gcms = hclust_subset)
## s1_names s2_names
## 1 bio01 bio1
## 2 bio02 bio2
## 3 bio03 bio3
## 4 bio04 bio4
## 5 bio05 bio5
## 6 bio06 bio6
## 7 bio07 bio7
## 8 bio12 bio12
## 9 bio13 bio13
## 10 bio14 bio14
## 11 bio15 bio15
## [1] "Projecting: 1/5"
## [1] "Projecting: 2/5"
## [1] "Projecting: 3/5"
## [1] "Projecting: 4/5"
## [1] "Projecting: 5/5"
## [1] "Ensembling..."
## [1] "current"
## [1] "Araucaria angustifolia"
## [1] "ACCESS-ESM1-5_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "CanESM5-CanOE_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "MPI-ESM1-2-HR_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "current"
## [1] "Araucaria angustifolia"
time_hclust <- tictoc::toc()
## Projecting to hclust_subset: 366.447 sec elapsed
tictoc::tic("Projecting to kmeans_subset")
i_kmeans <- i |>
caretSDM::add_scenarios() |>
caretSDM::add_scenarios(scen[paste0(kmeans_subset, "_ssp585_2081-2100.tif")]) |>
caretSDM::predict_sdm() |>
caretSDM::gcms_ensembles(gcms = kmeans_subset)
## s1_names s2_names
## 1 bio01 bio1
## 2 bio02 bio2
## 3 bio03 bio3
## 4 bio04 bio4
## 5 bio05 bio5
## 6 bio06 bio6
## 7 bio07 bio7
## 8 bio12 bio12
## 9 bio13 bio13
## 10 bio14 bio14
## 11 bio15 bio15
## [1] "Projecting: 1/5"
## [1] "Projecting: 2/5"
## [1] "Projecting: 3/5"
## [1] "Projecting: 4/5"
## [1] "Projecting: 5/5"
## [1] "Ensembling..."
## [1] "current"
## [1] "Araucaria angustifolia"
## [1] "EC-Earth3-Veg-LR_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "UKESM1-0-LL_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "CNRM-ESM2-1_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "current"
## [1] "Araucaria angustifolia"
time_kmeans <- tictoc::toc()
## Projecting to kmeans_subset: 368.51 sec elapsed
tictoc::tic("Projecting to closestdist_subset")
i_closest <- i |>
caretSDM::add_scenarios() |>
caretSDM::add_scenarios(scen[paste0(closestdist_subset, "_ssp585_2081-2100.tif")]) |>
caretSDM::predict_sdm() |>
caretSDM::gcms_ensembles(gcms = closestdist_subset)
## s1_names s2_names
## 1 bio01 bio1
## 2 bio02 bio2
## 3 bio03 bio3
## 4 bio04 bio4
## 5 bio05 bio5
## 6 bio06 bio6
## 7 bio07 bio7
## 8 bio12 bio12
## 9 bio13 bio13
## 10 bio14 bio14
## 11 bio15 bio15
## [1] "Projecting: 1/6"
## [1] "Projecting: 2/6"
## [1] "Projecting: 3/6"
## [1] "Projecting: 4/6"
## [1] "Projecting: 5/6"
## [1] "Projecting: 6/6"
## [1] "Ensembling..."
## [1] "current"
## [1] "Araucaria angustifolia"
## [1] "ACCESS-ESM1-5_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "CMCC-ESM2_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "MIROC-ES2L_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "CNRM-CM6-1-HR_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "current"
## [1] "Araucaria angustifolia"
time_closest <- tictoc::toc()
## Projecting to closestdist_subset: 449.158 sec elapsed
tictoc::tic("Projecting to closestdist_subset3")
i_closest3 <- i |>
caretSDM::add_scenarios() |>
caretSDM::add_scenarios(scen[paste0(closestdist_subset3, "_ssp585_2081-2100.tif")]) |>
caretSDM::predict_sdm() |>
caretSDM::gcms_ensembles(gcms = closestdist_subset3)
## s1_names s2_names
## 1 bio01 bio1
## 2 bio02 bio2
## 3 bio03 bio3
## 4 bio04 bio4
## 5 bio05 bio5
## 6 bio06 bio6
## 7 bio07 bio7
## 8 bio12 bio12
## 9 bio13 bio13
## 10 bio14 bio14
## 11 bio15 bio15
## [1] "Projecting: 1/5"
## [1] "Projecting: 2/5"
## [1] "Projecting: 3/5"
## [1] "Projecting: 4/5"
## [1] "Projecting: 5/5"
## [1] "Ensembling..."
## [1] "current"
## [1] "Araucaria angustifolia"
## [1] "CNRM-CM6-1_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "FIO-ESM-2-0_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "MPI-ESM1-2-LR_ssp585_2081-2100.tif"
## [1] "Araucaria angustifolia"
## [1] "current"
## [1] "Araucaria angustifolia"
time_closest3 <- tictoc::toc()
## Projecting to closestdist_subset3: 355.5 sec elapsed
tictoc::tic("Projecting to mean_gcm")
i_mean_gcm <- i |>
caretSDM::add_scenarios() |>
caretSDM::add_scenarios(mean_gcm) |>
caretSDM::predict_sdm()
## [1] "Projecting: 1/3"
## [1] "Projecting: 2/3"
## [1] "Projecting: 3/3"
## [1] "Ensembling..."
## [1] "current"
## [1] "Araucaria angustifolia"
## [1] "mean_gcm"
## [1] "Araucaria angustifolia"
## [1] "current"
## [1] "Araucaria angustifolia"
time_mean_gcm <- tictoc::toc()
## Projecting to mean_gcm: 156.651 sec elapsed
Comparing Projections
To compare whether chooseGCM
performed better than
alternative selection methods, we built a Pearson’s correlation matrix
between the six sets of GCMs and built an Euclidean distance matrix.
Numeric results were compiled into a table for better comparison and
visualization.
df <- data.frame(
all = i_all$predictions$ensembles[,"_ssp585_2081-2100.tif"][[1]]$mean_occ_prob,
random = i_random$predictions$ensembles[,"_ssp585_2081-2100.tif"][[1]]$mean_occ_prob,
hclust = i_hclust$predictions$ensembles[,"_ssp585_2081-2100.tif"][[1]]$mean_occ_prob,
kmeans = i_kmeans$predictions$ensembles[,"_ssp585_2081-2100.tif"][[1]]$mean_occ_prob,
closest = i_closest$predictions$ensembles[,"_ssp585_2081-2100.tif"][[1]]$mean_occ_prob,
closest3 = i_closest3$predictions$ensembles[,"_ssp585_2081-2100.tif"][[1]]$mean_occ_prob,
mean_GCM = i_mean_gcm$predictions$ensembles[,"mean_gcm"][[1]]$mean_occ_prob
)
s_cor <- cor(df)
s_cor
## all random hclust kmeans closest closest3 mean_GCM
## all 1.0000000 0.7471318 0.7765558 0.7813567 0.9052983 0.8283943 0.8072922
## random 0.7471318 1.0000000 0.6441292 0.3849073 0.6531905 0.4522108 0.5170753
## hclust 0.7765558 0.6441292 1.0000000 0.4710737 0.6751460 0.4938375 0.6498367
## kmeans 0.7813567 0.3849073 0.4710737 1.0000000 0.6454123 0.7745283 0.7232116
## closest 0.9052983 0.6531905 0.6751460 0.6454123 1.0000000 0.7275182 0.6668466
## closest3 0.8283943 0.4522108 0.4938375 0.7745283 0.7275182 1.0000000 0.7380881
## mean_GCM 0.8072922 0.5170753 0.6498367 0.7232116 0.6668466 0.7380881 1.0000000
s_dist <- dist(t(df))
s_dist
## all random hclust kmeans closest closest3
## random 7.220488
## hclust 6.869828 9.166807
## kmeans 7.083849 12.293921 11.375874
## closest 4.586968 8.920814 8.695204 9.308493
## closest3 6.772702 11.955831 11.514598 7.848486 8.451620
## mean_GCM 8.992250 12.852741 11.122912 10.077909 10.861230 9.869753
df <- data.frame(Subset = c("Global",
"Random",
"Hclust",
"Kmeans",
"Closedist",
"Closedist3",
"MeanGCM"
),
Projection_Time = c(
as.numeric(time_all$toc - time_all$tic),
as.numeric(time_random$toc - time_random$tic),
as.numeric(time_hclust$toc - time_hclust$tic),
as.numeric(time_kmeans$toc - time_kmeans$tic),
as.numeric(time_closest$toc - time_closest$tic),
as.numeric(time_closest3$toc - time_closest3$tic),
as.numeric(time_mean_gcm$toc - time_mean_gcm$tic)
),
Time_saving = c(
100-as.numeric(time_all$toc - time_all$tic)*100/as.numeric(time_all$toc - time_all$tic),
100-as.numeric(time_random$toc - time_random$tic)*100/as.numeric(time_all$toc - time_all$tic),
100-as.numeric(time_hclust$toc - time_hclust$tic)*100/as.numeric(time_all$toc - time_all$tic),
100-as.numeric(time_kmeans$toc - time_kmeans$tic)*100/as.numeric(time_all$toc - time_all$tic),
100-as.numeric(time_closest$toc - time_closest$tic)*100/as.numeric(time_all$toc - time_all$tic),
100-as.numeric(time_closest3$toc - time_closest3$tic)*100/as.numeric(time_all$toc - time_all$tic),
100-as.numeric(time_mean_gcm$toc - time_mean_gcm$tic)*100/as.numeric(time_all$toc - time_all$tic)
),
Distance_from_Global = as.numeric(as.matrix(s_dist)["all",]),
SDMs_Correlation = as.numeric(as.matrix(s_cor)["all",])
)
Results
The final result of this analysis is summarized in the df
object. In this table we show how optimized our method is to reach a
high correlation with the baseline, saving computational time.
df
## Subset Projection_Time Time_saving Distance_from_Global SDMs_Correlation
## 1 Global 2221.794 0.00000 0.000000 1.0000000
## 2 Random 368.941 83.39446 7.220488 0.7471318
## 3 Hclust 366.447 83.50671 6.869828 0.7765558
## 4 Kmeans 368.510 83.41385 7.083849 0.7813567
## 5 Closedist 449.158 79.78399 4.586968 0.9052983
## 6 Closedist3 355.500 83.99942 6.772702 0.8283943
## 7 MeanGCM 156.651 92.94935 8.992250 0.8072922