
caretSDM Workflow for Species Distribution Modeling
Source:vignettes/articles/Araucaria.Rmd
Araucaria.Rmd
Introduction
caretSDM
is a R package that uses the powerful
caret
package as the main engine to obtain Species
Distribution Models. One of its main attributes is the strong
geoprocessing underlying its functions provided by stars
package. Here we show how to model species distributions using
caretSDM
through the function sdm_area
with a
polygon. We will also show how to apply a PCA in predictors and
scenarios to avoid multicolinearity. The aim of this modeling will be to
obtain the current and future distribution of Araucaria
angustifolia, a keystone tree species from South Brazil.
First, we need to open our library.
Pre-Processing
To obtain models, we will need climatic data and species records. To
easily obtain these data, we have two functions:
WorldClim_data
function downloads climatic variables from
WorldClim 2.1, a widely used open-source database; in the same way,
GBIF_data
function downloads species records from GBIF,
also a widely used open-source database. You can read more about them by
running in the console ?GBIF_data
and
?WorldClim_data
.
Obtaining species records
A easy way to get species data using caretSDM
is the
function GBIF_data
, which retrieves species records from
GBIF. Understandably, there are other sources of species data available,
as well as our own data that can be the result of field work. In this
sense, one can import to R it’s own data in multiple ways, but be sure
that the table must always have three columns: species, decimalLongitude
and decimalLatitude. GBIF_data
function can retrieve the
data ready to be included in caretSDM, thus if you have any doubt on how
to format your own data, use GBIF_data
function with the
parameter as_df = TRUE
to retrieve an example table. As
standard, GBIF_data
function sets
as_df = FALSE
, which makes the function return a
occurrences
object (more about that further below). An
example code for this step would be:
But we already have a occ
object included in the
package, which is the same output, but with filtered records to match
our study area. Note that coordinates are in a metric CRS (EPSG:
6933).
occ |> head()
#> species decimalLongitude decimalLatitude
#> 1 Araucaria angustifolia -5071809 -3259770
#> 2 Araucaria angustifolia -4983891 -3283348
#> 3 Araucaria angustifolia -4748996 -3142075
#> 4 Araucaria angustifolia -4883188 -3092641
#> 5 Araucaria angustifolia -4766386 -3227445
#> 6 Araucaria angustifolia -4755861 -3139607
Obtaining climatic data
For climatic data, we will first download and import current data,
which is used to build the models. WorldClim_data
function
has an argument to set the directory in which you want to save the
files. If you don’t set it, files will be saved in your working
directory (run getwd()
to find out your working directory)
in the folder “input_data/WorldClim_data_current/”. If period is set to
“future”, then it is saved in “input_data/WorldClim_data_future/”. We
could run this script with a smaller resolution, but as the aim here is
to show how the package works, we will use a resolution of 10
arc-minutes, which is very coarse, but quicker to download and run.
# Download current bioclimatic variables
WorldClim_data(path = NULL,
period = "current",
variable = "bioc",
resolution = 10)
# Import current bioclimatic variables to R
bioc <- read_stars(list.files("input_data/WorldClim_data_current/", full.names = T), along = "band", normalize_path = F)
As in the previous section, we already have a bioc
object included in the package, which is the same output, but masked to
match our study area and with fewer variables.
bioc
#> stars object with 3 dimensions and 1 attribute
#> attribute(s):
#> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#> current 14.58698 21.19678 298.9147 622.9417 1353.5 2368 1845
#> dimension(s):
#> from to offset delta refsys point values x/y
#> x 747 798 -180 0.1667 WGS 84 FALSE NULL [x]
#> y 670 706 90 -0.1667 WGS 84 FALSE NULL [y]
#> band 1 3 NA NA NA NA bio1 , bio4 , bio12
Defining the study area
A important step on model building in Species Distribution Models, is
the definition of accessible area (the M in BAM diagram). This area can
be, in Geographical Information Systems terms, as an example, the
delimitation of a habitat (polygon) or a river basin network (lines).
Another broadly used approach is the use of buffers around presences.
The buffer size translates the potential distribution capabilities of a
species. To educational purposes, we will use a simple polygon of Parana
state boundaries that is available in caretSDM as the
parana
object (see ?parana
for more
information on the data)..
parana
#> Simple feature collection with 1 feature and 4 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -54.61834 ymin: -26.71679 xmax: -48.02308 ymax: -22.51621
#> Geodetic CRS: WGS 84
#> GID0 CODIGOIB1 NOMEUF2 SIGLAUF3 geom
#> 1 19 41 PARANA PR MULTIPOLYGON (((-52.06416 -...
parana |> select_predictors(NOMEUF2) |> plot()
The sdm_area
function is responsible to create a grid to
build models, a key aspect of caretSDM workflow. With a grid built,
modelers can pass multiple rasters with different resolutions, CRSs and
extents. The package will be responsible to rescale, transform and crop
every raster to match the grid. The grid returned by the
sdm_area
function is from sdm_area
class, a
class that will also keep the environmental/climatic data (i.e.
“predictor variables”, “covariates”, “explanatory variables”, “features”
or “control variables”). With this class we will perform analysis using
only the predictors. The grid is built using mostly the first three
arguments: (1) a shape from sf
class, but rasters from
stars
, rasterStack
or SpatRaster
class are also allowed; (2) the cell size of the grid; and (3) the
Coordinate Reference System (CRS). Note that the cell size can be metric
or not depending on the CRS informed. It is important to inform a cell
size bigger than the coarser raster that will be used, otherwise
rescaling process may return empty cells. The rescaling can be performed
using GDAL (quicker but less precise) or the stars
package
(slower but more precise). The first will address values to cells by
calculating the mean (for continuous variables) or the median (for
categorical variables) of the values falling within the cell. The
approach using stars
will do the same thing, but weighting
for the area of each value within the cell. For other arguments meaning
see ?sdm_area
.
sa <- sdm_area(parana,
cell_size = 25000,
crs = 6933,
variables_selected = NULL,
gdal = TRUE,
crop_by = NULL,
lines_as_sdm_area = FALSE)
#> ! Making grid over study area is an expensive task. Please, be patient!
#> ℹ Using GDAL to make the grid and resample the variables.
sa
#> caretSDM
#> ...........................
#> Class : sdm_area
#> Extent : -5276744 -3295037 -4626744 -2795037 (xmin, xmax, ymin, ymax)
#> CRS : WGS 84 / NSIDC EASE-
#> Resolution : (25000, 25000) (x, y)
#> Number of Predictors : 4
#> Predictors Names : GID0, CODIGOIB1, NOMEUF2, SIGLAUF3
Note that the function returned four predictor variables
(Predictor Names
above). These “predictors” are actually
columns included in the parana
shape’s data table. One can
filter these variables using select_predictors
function,
but we will not do that here, once the package will automatically drop
them further. You can explore the grid generated and stored in the
sdm_area
object using the functions
mapview_grid()
or plot_grid()
.
plot_grid(sa)
Now that we have a study area, we can assign predictor variables to
it. To do that, we use the add_predictors
function, which
usually will only use the fist two arguments, which are the
sdm_area
build in the previous step and the
RasterStack
, SpatRaster
or stars
object with predictors data. Note that add_predictors
also
has a gdal
argument, which works as the previous one in
sdm_area
function.
sa <- add_predictors(sa,
bioc,
variables_selected = NULL,
gdal = TRUE)
#> ! Making grid over the study area is an expensive task. Please, be patient!
#> ℹ Using GDAL to make the grid and resample the variables.
sa
#> caretSDM
#> ...........................
#> Class : sdm_area
#> Extent : -5276744 -3295037 -4626744 -2795037 (xmin, xmax, ymin, ymax)
#> CRS : WGS 84 / NSIDC EASE-
#> Resolution : (25000, 25000) (x, y)
#> Number of Predictors : 7
#> Predictors Names : GID0, CODIGOIB1, NOMEUF2, SIGLAUF3, bio1, bio4, bio12
Predictors variables are used to train the models. After training the
models, we need to project models into scenarios. Currently, we don’t
have any scenario in our sdm_area
object. We can address
the predictors data as the current scenario by applying the function
add_scenario
without considering any other argument. This
happens because the argument pred_as_scen
is standarly set
to TRUE
.
add_scenarios(sa)
If we are aiming to project species distributions in other scenarios, we can download data and add in the same way we did for current data.
WorldClim_data(path = NULL,
period = "future",
variable = "bioc",
year = "2090",
gcm = c("ca", "mi"),
ssp = c("245","585"),
resolution = 10)
scen <- read_stars(list.files("input_data/WorldClim_data_future/", full.names = T), along = "band", normalize_path = F)
As with current bioclimatic data, we have already included the
scen
object in the package, which is the same output from
above, but masked to match our study area and with fewer variables (see
?scen
for more information on the data).
scen
#> stars object with 3 dimensions and 4 attributes
#> attribute(s):
#> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#> ca_ssp245_2090 18.4 26.100 296.50 570.0926 1188.975 2049.2 1908
#> ca_ssp585_2090 22.2 31.275 293.25 516.0384 1033.150 1862.2 1908
#> mi_ssp245_2090 16.3 23.000 314.65 660.9749 1426.800 2414.8 1908
#> mi_ssp585_2090 17.9 24.500 323.45 702.6153 1534.100 2585.0 1908
#> dimension(s):
#> from to offset delta refsys point values x/y
#> x 747 798 -180 0.1667 WGS 84 FALSE NULL [x]
#> y 670 706 90 -0.1667 WGS 84 FALSE NULL [y]
#> band 1 3 NA NA NA NA bio1 , bio4 , bio12
Now we can add the current and future scenarios at once in our
sdm_area
object. For the meaning on other parameters see
the help file at ?add_scenarios
. When adding scenarios, the
function will test if all variables are available in all scenarios,
otherwise it will filter predictors (see the warning below). See that we
also provide a stationary argument, where the modeler can inform
variables that do not change between scenarios. These variables can be,
e.g., soil variables.
sa <- add_scenarios(sa,
scen = scen,
scenarios_names = NULL,
pred_as_scen = TRUE,
variables_selected = NULL,
stationary = NULL)
#> Warning: Some variables in `variables_selected` are not present in `scen`.
#> ℹ Using only variables present in `scen`: bio1, bio4, and bio12
sa
#> caretSDM
#> ...........................
#> Class : sdm_area
#> Extent : -5276744 -3295037 -4626744 -2795037 (xmin, xmax, ymin, ymax)
#> CRS : WGS 84 / NSIDC EASE-
#> Resolution : (25000, 25000) (x, y)
#> Number of Predictors : 3
#> Predictors Names : bio1, bio4, bio12
#> Number of Scenarios : 5
#> Scenarios Names : ca_ssp245_2090, ca_ssp585_2090, mi_ssp245_2090, mi_ssp585_2090, current
It is common that modelers need to subset variables that will inform
models. This can be due to statistical artifacts that are common in
quarter bioclimatic variables, or a causation subset, aiming for those
variables with causality effect on species distribution. The user may
also want to change scenarios names, predictors names or retrieve
predictors data. For that there are a myriad of functions that can be
found in the package, most of them under the help files of the functions
?add_predictors
and ?add_scenarios
, but also
?select_predictors
.
Defining the occurrences set in the study area
As caretSDM
has a strong GIS background, it is necessary
to explicitly tell which CRS is your data in. This will assure that
every GIS transformation is correct. occurrences_sdm
function creates a occurrences class (i.e. “response variable”,
“target” or “label”) that will be used in occurrences’ transformations
and functions, as pseudoabsences generation. For a reference, GBIF data
is in crs = 4326, but our records stored in occ
object is
transformed to 6933 (see ?occ
for more information on the
data).
oc <- occurrences_sdm(occ, crs = 6933)
oc
#> caretSDM
#> .......................
#> Class : occurrences
#> Species Names : Araucaria angustifolia
#> Number of presences : 420
#> =================================
#> Data:
#> Simple feature collection with 6 features and 1 field
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: -5071809 ymin: -3283348 xmax: -4748996 ymax: -3092641
#> Projected CRS: WGS 84 / NSIDC EASE-Grid 2.0 Global
#> species geometry
#> 1 Araucaria angustifolia POINT (-5071809 -3259770)
#> 2 Araucaria angustifolia POINT (-4983891 -3283348)
#> 3 Araucaria angustifolia POINT (-4748996 -3142075)
#> 4 Araucaria angustifolia POINT (-4883188 -3092641)
#> 5 Araucaria angustifolia POINT (-4766386 -3227445)
#> 6 Araucaria angustifolia POINT (-4755861 -3139607)
plot_occurrences(oc)
This next step assigns occurrences into a study area, excluding records outside the study area or with NAs as predictors.
oc <- join_area(oc, sa)
#> Warning: Some records from `occ` do not fall in `pred`.
#> ℹ 2 elements from `occ` were excluded.
#> ℹ If this seems too much, check how `occ` and `pred` intersect.
The input_sdm
class
In caretSDM
we use multiple classes to perform our
analysis. Every time we perform a new analysis, objects keep the
information of what we did. Ideally, the workflow will have only one
object throughout it. The input_sdm
class is the key class
in the workflow, where every function will orbitate. That class puts
occurrences, predictors, scenarios, models and predictions together to
perform analysis that are only possible when two or more of these
classes are available. First, we create the object by informing the
occurrences and the sdm_area.
i <- input_sdm(oc, sa)
i
#> caretSDM
#> ...............................
#> Class : input_sdm
#> -------- Occurrences --------
#> Species Names : Araucaria angustifolia
#> Number of presences : 418
#> -------- Predictors ---------
#> Number of Predictors : 3
#> Predictors Names : bio1, bio4, bio12
#> --------- Scenarios ---------
#> Number of Scenarios : 5
#> Scenarios Names : ca_ssp245_2090 ca_ssp585_2090 mi_ssp245_2090 mi_ssp585_2090 current
Data cleaning routine
As the first step in our workflow with the input_sdm
object, we will clean our occurrences data by applying a group of
functions from the package CoordinateCleaner
. In this
function, we also provide a way to check for environmental duplicates,
by including a predictors object. This function also checks for records
in the sea if the species is terrestrial, but note that this can be
switched off if the studied species is not terrestrial. The way
caretSDM
works, we can always overwrite the main
input_sdm
object to update it. The function will return a
new object with all the previous information and the new information
obtained from the data_clean
function, note that at the end
of the Data Cleaning information there is the Duplicated Cell method.
This method is only possible when we have both the
occurrence
and predictors
data.
i <- data_clean(i,
capitals = TRUE,
centroids = TRUE,
duplicated = TRUE,
identical = TRUE,
institutions = TRUE,
invalid = TRUE,
terrestrial = TRUE)
#> Cell_ids identified, removing duplicated cell_id.
#> Testing country capitals
#> Removed 0 records.
#> Testing country centroids
#> Removed 0 records.
#> Testing duplicates
#> Removed 0 records.
#> Testing equal lat/lon
#> Removed 0 records.
#> Testing biodiversity institutions
#> Removed 1 records.
#> Testing coordinate validity
#> Removed 0 records.
#> Testing sea coordinates
#> Reading layer `ne_110m_land' from data source `/tmp/Rtmp4lnI3I/ne_110m_land.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 127 features and 3 fields
#> Geometry type: POLYGON
#> Dimension: XY
#> Bounding box: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
#> Geodetic CRS: WGS 84
#> Removed 0 records.
#> Predictors identified, procceding with grid filter (removing NA and duplicated data).
Removing multicolinearity from predictors’ data
There are two main methods in the SDM literature to consider
multicolinearity in predictors data. One is the use of VIFs, which in
caretSDM
is performed using
vif_predictors
function. There, users are able to perform
variables selection through usdm
package. The function is a
wrapper for usdm::vifcor
, where variables are kept given a
maximum threshold of colinearity. The standard is 0.5. Here is a example
code for demonstration:
i <- vif_predictors(i,
th = 0.5,
maxobservations = 5000,
variables_selected = NULL)
For this study, however, we will use the PCA approach to
multicolinearity, where we synthesize environmental variability into
PCA-axis and project these axis to the geographic space to use them as
predictors. pca_predictors
does not have arguments other
than the input_sdm
object. PCA-axis will be included in
predictors together with raw variables.
i <- pca_predictors(i, cumulative_proportion = 1)
i
#> caretSDM
#> ...............................
#> Class : input_sdm
#> -------- Occurrences --------
#> Species Names : Araucaria angustifolia
#> Number of presences : 84
#> Data Cleaning : NAs, Capitals, Centroids, Geographically Duplicated, Identical Lat/Long, Institutions, Invalid, Non-terrestrial, Duplicated Cell (grid)
#> -------- Predictors ---------
#> Number of Predictors : 6
#> Predictors Names : bio1, bio4, bio12, PC1, PC2, PC3
#> PCA-transformed variables : DONE
#> Cummulative proportion ( 1 ) : PC1, PC2, PC3
#> --------- Scenarios ---------
#> Number of Scenarios : 5
#> Scenarios Names : ca_ssp245_2090 ca_ssp585_2090 mi_ssp245_2090 mi_ssp585_2090 current
To better visualize PCA parameters, users can run
pca_summary
and get_pca_model
functions, which
are very self-explanatory.
pca_summary(i)
#> Importance of components:
#> PC1 PC2 PC3
#> Standard deviation 209.3679 17.15444 1.59698
#> Proportion of Variance 0.9933 0.00667 0.00006
#> Cumulative Proportion 0.9933 0.99994 1.00000
get_pca_model(i)
#> Standard deviations (1, .., p=3):
#> [1] 209.367889 17.154439 1.596985
#>
#> Rotation (n x k) = (3 x 3):
#> PC1 PC2 PC3
#> bio1 0.003623821 -0.006130769 -0.999974640
#> bio4 -0.996624181 -0.082040097 -0.003108698
#> bio12 -0.082018958 0.996610173 -0.006407371
Obtaining pseudoabsence data
Pseudoabsence data will be stored in the occurrences
object (inside the input_sdm
). To generate them, you must
inform some parameters. Probably one of the most important arguments in
this function is the method
. Currently, two methods are
implemented: a “random”, which takes random grid cells as
pseudoabsences; and a “bioclim” method, which creates a Surface Range
Envelope (SRE) using presence records, binarizes the projection of the
SRE using the th
threshold and then retrieves
pseudoabsences outside the envelope. The number of pseudoabsences
created can be changed using the n_pa
parameter. When set
to NULL, n_pa
will be equal the number of occurrences (to
avoid imbalance issues). The number of sets of pseudoabsences is
adjusted with the n_set
parameter in the function. The
argument variables_selected
will inform which variables you
want to use to build your pseudoabsences/models. This can either be a
vector of variables names or a previously performed selection
method.
i <- pseudoabsences(i,
method = "bioclim",
n_set = 10,
n_pa = NULL,
variables_selected = "pca",
th = 0)
i
#> caretSDM
#> ...............................
#> Class : input_sdm
#> -------- Occurrences --------
#> Species Names : Araucaria angustifolia
#> Number of presences : 84
#> Pseudoabsence methods :
#> Method to obtain PAs : bioclim
#> Number of PA sets : 10
#> Number of PAs in each set : 84
#> Data Cleaning : NAs, Capitals, Centroids, Geographically Duplicated, Identical Lat/Long, Institutions, Invalid, Non-terrestrial, Duplicated Cell (grid)
#> -------- Predictors ---------
#> Number of Predictors : 6
#> Predictors Names : bio1, bio4, bio12, PC1, PC2, PC3
#> PCA-transformed variables : DONE
#> Cummulative proportion ( 1 ) : PC1, PC2, PC3
#> --------- Scenarios ---------
#> Number of Scenarios : 5
#> Scenarios Names : ca_ssp245_2090 ca_ssp585_2090 mi_ssp245_2090 mi_ssp585_2090 current
Processing
Modeling species relationship with variables
With the occurrences and predictors data put together, we can pass to
the modeling. As the name suggests, caretSDM
uses the
caret
package underlying its modeling procedure. For those
who are not familiar, caret
is the easiest way to perform
Machine Learning analysis in R. It works by setting a modeling wrapper
to pass multiple packages and can provide a lot of automation regarding
algorithms fine-tuning, data spliting, pre-processing methods and
predictions. These automated functions from caret
can be
altered using the ctrl
argument in train_sdm
function. See ?caret::trainControl
for all options
available.
We show here how to use a repeated crossvalidation method, which is
defined through caret::trainControl
.
Note that, when you are using an algorithm for the first time, caret will ask you to install the relevant packages to properly run the algorithm.
ctrl_sdm <- caret::trainControl(method = "repeatedcv",
number = 4,
repeats = 1,
classProbs = TRUE,
returnResamp = "all",
summaryFunction = summary_sdm,
savePredictions = "all")
i <- train_sdm(i,
algo = c("naive_bayes", "kknn"),
variables_selected = "pca",
ctrl=ctrl_sdm) |> suppressWarnings()
#> Loading required package: ggplot2
#> Loading required package: lattice
#>
#> Attaching package: 'caret'
#> The following object is masked from 'package:caretSDM':
#>
#> predictors
i
#> caretSDM
#> ...............................
#> Class : input_sdm
#> -------- Occurrences --------
#> Species Names : Araucaria angustifolia
#> Number of presences : 84
#> Pseudoabsence methods :
#> Method to obtain PAs : bioclim
#> Number of PA sets : 10
#> Number of PAs in each set : 84
#> Data Cleaning : NAs, Capitals, Centroids, Geographically Duplicated, Identical Lat/Long, Institutions, Invalid, Non-terrestrial, Duplicated Cell (grid)
#> -------- Predictors ---------
#> Number of Predictors : 6
#> Predictors Names : bio1, bio4, bio12, PC1, PC2, PC3
#> PCA-transformed variables : DONE
#> Cummulative proportion ( 1 ) : PC1, PC2, PC3
#> --------- Scenarios ---------
#> Number of Scenarios : 5
#> Scenarios Names : ca_ssp245_2090 ca_ssp585_2090 mi_ssp245_2090 mi_ssp585_2090 current
#> ----------- Models ----------
#> Algorithms Names : naive_bayes kknn
#> Variables Names : PC1 PC2 PC3
#> Model Validation :
#> Method : repeatedcv
#> Number : 4
#> Metrics :
#> $`Araucaria angustifolia`
#> algo ROC TSS Sensitivity Specificity
#> 1 kknn 0.9716306 0.9012374 0.95815 0.942875
#> 2 naive_bayes 0.9855231 0.9069517 0.98800 0.921200
Post-Processing
Predicting species distribution in given scenarios
Now that we have our models, we can make predictions in new
scenarios. The function predict_sdm
incorporates also the
prediction of ensembles (ensembles=TRUE
is standard). The
function will only predict models that passes a given validation
threshold. This validation metric is set using metric
and
th
arguments. In the following example, metric is set to be
“ROC” and th is equal 0.9. This means that only models with ROC > 0.9
will be used in predictions and ensembles.
i <- predict_sdm(i,
metric = "ROC",
th = 0.9,
tp = "prob",
ensembles = TRUE)
#> [1] "Projecting: 1/5"
#> [1] "Projecting: 2/5"
#> [1] "Projecting: 3/5"
#> [1] "Projecting: 4/5"
#> [1] "Projecting: 5/5"
#> [1] "Ensembling..."
#> [1] "ca_ssp245_2090"
#> [1] "Araucaria angustifolia"
#> [1] "ca_ssp585_2090"
#> [1] "Araucaria angustifolia"
#> [1] "mi_ssp245_2090"
#> [1] "Araucaria angustifolia"
#> [1] "mi_ssp585_2090"
#> [1] "Araucaria angustifolia"
#> [1] "current"
#> [1] "Araucaria angustifolia"
i
#> caretSDM
#> ...............................
#> Class : input_sdm
#> -------- Occurrences --------
#> Species Names : Araucaria angustifolia
#> Number of presences : 84
#> Pseudoabsence methods :
#> Method to obtain PAs : bioclim
#> Number of PA sets : 10
#> Number of PAs in each set : 84
#> Data Cleaning : NAs, Capitals, Centroids, Geographically Duplicated, Identical Lat/Long, Institutions, Invalid, Non-terrestrial, Duplicated Cell (grid)
#> -------- Predictors ---------
#> Number of Predictors : 6
#> Predictors Names : bio1, bio4, bio12, PC1, PC2, PC3
#> PCA-transformed variables : DONE
#> Cummulative proportion ( 1 ) : PC1, PC2, PC3
#> --------- Scenarios ---------
#> Number of Scenarios : 5
#> Scenarios Names : ca_ssp245_2090 ca_ssp585_2090 mi_ssp245_2090 mi_ssp585_2090 current
#> ----------- Models ----------
#> Algorithms Names : naive_bayes kknn
#> Variables Names : PC1 PC2 PC3
#> Model Validation :
#> Method : repeatedcv
#> Number : 4
#> Metrics :
#> $`Araucaria angustifolia`
#> algo ROC TSS Sensitivity Specificity
#> 1 kknn 0.9716306 0.9012374 0.95815 0.942875
#> 2 naive_bayes 0.9855231 0.9069517 0.98800 0.921200
#>
#> -------- Predictions --------
#> Ensembles :
#> Scenarios : ca_ssp245_2090 ca_ssp585_2090 mi_ssp245_2090 mi_ssp585_2090 current
#> Methods : mean_occ_prob wmean_AUC committee_avg
#> Thresholds :
#> Method : threshold
#> Criteria : 0.9
In the above print, it is possible to see the “Methods” under the
“Predictions” section, which informs which ensemble types were made:
mean occurrence probability (mean_occ_prob
; a simple mean
between GCMs), mean occurrence probability weighted by AUC/ROC
(wmean_AUC
; AUC/ROC values are used as weights), and the
majority rule, or the committee average (committee_avg
; the
sum of binaries).
Besides the AUC/ROC metric, users can get every available metric by model using the following code before commit to “ROC”:
get_validation_metrics(i)
#> $`Araucaria angustifolia`
#> algo ROC TSS Sensitivity Specificity Pos Pred Value
#> m1.2 kknn 0.9815476 0.9511905 0.97600 0.97500 0.98800
#> m2.2 kknn 0.9530574 0.8898810 0.95200 0.93750 0.96525
#> m3.2 kknn 0.9870130 0.9188312 0.96425 0.95450 0.97675
#> m4.2 kknn 0.9642857 0.8652597 0.95225 0.91275 0.95475
#> m5.2 kknn 0.9707792 0.9188312 0.96425 0.95450 0.97550
#> m6.2 kknn 0.9909361 0.8979978 0.96400 0.93375 0.96475
#> m7.2 kknn 0.9696970 0.8796537 0.95225 0.92725 0.96575
#> m8.2 kknn 0.9623016 0.8996032 0.95225 0.94725 0.97675
#> m9.2 kknn 0.9626623 0.8841991 0.95225 0.93175 0.96750
#> m10.2 kknn 0.9740260 0.9069264 0.95200 0.95450 0.97600
#> m1.1 naive_bayes 0.9988095 0.9392857 0.98800 0.97500 0.98875
#> m2.1 naive_bayes 0.9871934 0.8990801 0.98800 0.91100 0.95575
#> m3.1 naive_bayes 0.9821429 0.9199134 0.98800 0.93175 0.96700
#> m4.1 naive_bayes 0.9731241 0.9218074 0.98800 0.93375 0.96700
#> m5.1 naive_bayes 0.9870130 0.9199134 0.98800 0.93175 0.96625
#> m6.1 naive_bayes 0.9848485 0.8971861 0.98800 0.90900 0.95500
#> m7.1 naive_bayes 0.9955628 0.8926407 0.98800 0.90450 0.95550
#> m8.1 naive_bayes 0.9714286 0.8853175 0.98800 0.89725 0.95500
#> m9.1 naive_bayes 0.9913420 0.8971861 0.98800 0.90900 0.95500
#> m10.1 naive_bayes 0.9837662 0.8971861 0.98800 0.90900 0.95475
#> Neg Pred Value Precision Recall F1 Prevalence Detection Rate
#> m1.2 0.95225 0.98800 0.97600 0.98200 0.68275 0.66675
#> m2.2 0.91500 0.96525 0.95200 0.95850 0.64600 0.61550
#> m3.2 0.93875 0.97675 0.96425 0.96975 0.66125 0.63800
#> m4.2 0.92725 0.95475 0.95225 0.95175 0.64600 0.61575
#> m5.2 0.94225 0.97550 0.96425 0.96925 0.65600 0.63250
#> m6.2 0.93575 0.96475 0.96400 0.96425 0.65100 0.62800
#> m7.2 0.91675 0.96575 0.95225 0.95850 0.66125 0.63025
#> m8.2 0.91450 0.97675 0.95225 0.96300 0.68275 0.65050
#> m9.2 0.91875 0.96750 0.95225 0.95800 0.65600 0.62475
#> m10.2 0.91300 0.97600 0.95200 0.96400 0.65600 0.62500
#> m1.1 0.97500 0.98875 0.98800 0.98225 0.68275 0.67475
#> m2.1 0.98075 0.95575 0.98800 0.97125 0.64600 0.63850
#> m3.1 0.97725 0.96700 0.98800 0.97700 0.66125 0.65350
#> m4.1 0.97925 0.96700 0.98800 0.97700 0.64600 0.63825
#> m5.1 0.97925 0.96625 0.98800 0.97675 0.65600 0.64825
#> m6.1 0.97925 0.95500 0.98800 0.97100 0.65100 0.64350
#> m7.1 0.97500 0.95550 0.98800 0.97125 0.66125 0.65350
#> m8.1 0.97500 0.95500 0.98800 0.97100 0.68275 0.67475
#> m9.1 0.97725 0.95500 0.98800 0.97100 0.65600 0.64825
#> m10.1 0.97500 0.95475 0.98800 0.97100 0.65600 0.64825
#> Detection Prevalence Balanced Accuracy Accuracy Kappa AccuracyLower
#> m1.2 0.67475 0.97550 0.97575 0.94500 0.84775
#> m2.2 0.63825 0.94500 0.94650 0.88325 0.80775
#> m3.2 0.65375 0.95950 0.96125 0.91425 0.82750
#> m4.2 0.64700 0.93300 0.93900 0.86800 0.79750
#> m5.2 0.65625 0.95950 0.96100 0.91575 0.83250
#> m6.2 0.65900 0.94925 0.95375 0.89775 0.81650
#> m7.2 0.65425 0.93975 0.94475 0.87625 0.80300
#> m8.2 0.67525 0.95000 0.95100 0.89000 0.81050
#> m9.2 0.64825 0.94225 0.94525 0.87875 0.81050
#> m10.2 0.64050 0.95350 0.95350 0.89650 0.81500
#> m1.1 0.69100 0.96950 0.97575 0.94400 0.84775
#> m2.1 0.68475 0.94950 0.96175 0.91325 0.82850
#> m3.1 0.70050 0.96000 0.96875 0.92800 0.83850
#> m4.1 0.68500 0.96075 0.96925 0.93000 0.84150
#> m5.1 0.68800 0.96025 0.96900 0.92975 0.83800
#> m6.1 0.70600 0.94875 0.96125 0.91125 0.82900
#> m7.1 0.70900 0.94625 0.96025 0.90800 0.82950
#> m8.1 0.73175 0.94250 0.95925 0.90350 0.82200
#> m9.1 0.70350 0.94875 0.96125 0.91125 0.82825
#> m10.1 0.68750 0.94875 0.96100 0.91150 0.82925
#> AccuracyUpper AccuracyNull AccuracyPValue McnemarPValue Positive Negative
#> m1.2 0.99775 0.68275 0.00025 1.0000000 21 9.75
#> m2.2 0.99300 0.64600 0.00000 1.0000000 21 11.50
#> m3.2 0.99575 0.66125 0.00000 0.8266667 21 10.75
#> m4.2 0.99000 0.64600 0.00000 0.8120000 21 11.50
#> m5.2 0.99100 0.65600 0.00125 0.8085000 21 11.00
#> m6.2 0.99575 0.65100 0.00000 1.0000000 21 11.25
#> m7.2 0.99250 0.66125 0.00050 0.8700000 21 10.75
#> m8.2 0.99275 0.68275 0.00125 0.8120000 21 9.75
#> m9.2 0.98625 0.65600 0.00150 0.4325000 21 11.00
#> m10.2 0.99550 0.65600 0.00000 1.0000000 21 11.00
#> m1.1 0.99775 0.68275 0.00025 1.0000000 21 9.75
#> m2.1 0.99725 0.64600 0.00050 0.8700000 21 11.50
#> m3.1 0.99750 0.66125 0.00025 0.8266667 21 10.75
#> m4.1 0.99750 0.64600 0.00025 0.8700000 21 11.50
#> m5.1 0.99900 0.65600 0.00075 1.0000000 21 11.00
#> m6.1 0.99575 0.65100 0.00125 0.8266667 21 11.25
#> m7.1 0.99300 0.66125 0.00100 0.8120000 21 10.75
#> m8.1 0.99575 0.68275 0.00350 0.8266667 21 9.75
#> m9.1 0.99575 0.65600 0.00150 0.8266667 21 11.00
#> m10.1 0.99450 0.65600 0.00150 1.0000000 21 11.00
#> True Positive False Positive True Negative False Negative ROCSD
#> m1.2 20.50 0.50 9.50 0.25 0.03690476
#> m2.2 20.00 1.00 10.75 0.75 0.04854347
#> m3.2 20.25 0.75 10.25 0.50 0.02317804
#> m4.2 20.00 1.00 10.50 1.00 0.04123930
#> m5.2 20.25 0.75 10.50 0.75 0.05844156
#> m6.2 20.25 0.75 10.50 1.00 0.02121863
#> m7.2 20.00 1.00 10.00 0.75 0.04310929
#> m8.2 20.00 1.00 9.25 0.75 0.04365079
#> m9.2 20.00 1.00 10.25 0.75 0.04319524
#> m10.2 20.00 1.00 10.50 0.50 0.03061068
#> m1.1 20.75 0.75 9.50 0.50 0.03260978
#> m2.1 20.75 0.50 10.50 1.75 0.02995821
#> m3.1 20.75 0.25 10.00 1.50 0.03571429
#> m4.1 20.75 0.25 10.75 1.50 0.06400608
#> m5.1 20.75 0.75 10.25 1.75 0.06734350
#> m6.1 20.75 0.25 10.25 2.00 0.03683365
#> m7.1 20.75 0.25 9.75 1.75 0.05945451
#> m8.1 20.75 0.25 8.75 1.75 0.04757273
#> m9.1 20.75 0.50 10.00 2.00 0.04990887
#> m10.1 20.75 0.75 10.00 1.75 0.02835966
#> TSSSD SensitivitySD SpecificitySD Pos Pred ValueSD Neg Pred ValueSD
#> m1.2 0.06959281 0.02771281 0.05000000 0.02400000 0.055259238
#> m2.2 0.07978559 0.00000000 0.07990202 0.04379783 0.004000000
#> m3.2 0.05822739 0.04552197 0.05253887 0.02687471 0.075256783
#> m4.2 0.05841149 0.06741105 0.06831483 0.03715172 0.095062699
#> m5.2 0.12182898 0.07150000 0.08712587 0.04273172 0.115500000
#> m6.2 0.05607812 0.02400000 0.07440878 0.03561250 0.042999031
#> m7.2 0.08416843 0.03878466 0.09506270 0.04233497 0.068178076
#> m8.2 0.07138005 0.06741105 0.06107577 0.02687471 0.110213429
#> m9.2 0.15208612 0.06741105 0.13650000 0.06500000 0.102811721
#> m10.2 0.05248639 0.00000000 0.05253887 0.02771281 0.004618802
#> m1.1 0.07407785 0.04552197 0.05773503 0.02687471 0.081785797
#> m2.1 0.09094773 0.02771281 0.07955082 0.03988734 0.055259238
#> m3.1 0.11356935 0.02400000 0.11748049 0.05353504 0.045500000
#> m4.1 0.07705737 0.02400000 0.08651927 0.04178516 0.050000000
#> m5.1 0.06926407 0.02400000 0.04550000 0.02300000 0.050000000
#> m6.1 0.12042809 0.02400000 0.13062669 0.05353504 0.045500000
#> m7.1 0.12454900 0.02400000 0.11051848 0.05141012 0.055500000
#> m8.1 0.11440690 0.02400000 0.09298880 0.04064788 0.062500000
#> m9.1 0.09566349 0.02771281 0.07430119 0.03558089 0.061075773
#> m10.1 0.13482607 0.02400000 0.15541209 0.06790864 0.050000000
#> PrecisionSD RecallSD F1SD PrevalenceSD Detection RateSD
#> m1.2 0.02400000 0.02771281 0.02297825 0.01150000 0.02681262
#> m2.2 0.04379783 0.00000000 0.02211334 0.01154701 0.01096966
#> m3.2 0.02687471 0.04552197 0.02361320 0.01050000 0.03628590
#> m4.2 0.03715172 0.06741105 0.02929590 0.01154701 0.05272808
#> m5.2 0.04273172 0.07150000 0.04750000 0.00000000 0.04700000
#> m6.2 0.03561250 0.02400000 0.01415097 0.01000000 0.02070427
#> m7.2 0.04233497 0.03878466 0.02321637 0.01050000 0.03442262
#> m8.2 0.02687471 0.06741105 0.03309582 0.01150000 0.04657252
#> m9.2 0.06500000 0.06741105 0.04883305 0.00000000 0.04431234
#> m10.2 0.02771281 0.00000000 0.01385641 0.00000000 0.00000000
#> m1.1 0.02687471 0.04552197 0.02289651 0.01150000 0.03069745
#> m2.1 0.03988734 0.02771281 0.02844146 0.01154701 0.02362908
#> m3.1 0.05353504 0.02400000 0.02839014 0.01050000 0.02142429
#> m4.1 0.04178516 0.02400000 0.02238117 0.01154701 0.01291962
#> m5.1 0.02300000 0.02400000 0.02350000 0.00000000 0.01550000
#> m6.1 0.05353504 0.02400000 0.02661297 0.01000000 0.02500000
#> m7.1 0.05141012 0.02400000 0.03473111 0.01050000 0.02142429
#> m8.1 0.04064788 0.02400000 0.03205724 0.01150000 0.02260347
#> m9.1 0.03558089 0.02771281 0.02957899 0.00000000 0.01789786
#> m10.1 0.06790864 0.02400000 0.02940521 0.00000000 0.01550000
#> Detection PrevalenceSD Balanced AccuracySD AccuracySD KappaSD
#> m1.2 0.02260347 0.03488553 0.03107384 0.07068239
#> m2.2 0.01985573 0.03969047 0.02872281 0.06294641
#> m3.2 0.04198710 0.02903446 0.02968024 0.06490698
#> m4.2 0.06871681 0.02934848 0.03464102 0.07066824
#> m5.2 0.05103185 0.06097540 0.05979618 0.12851816
#> m6.2 0.04200794 0.02794489 0.01761391 0.03797697
#> m7.2 0.05850000 0.04195533 0.03040148 0.06717328
#> m8.2 0.06300992 0.03550587 0.04210305 0.09070465
#> m9.2 0.06471669 0.07594022 0.06447416 0.14527990
#> m10.2 0.01789786 0.02598076 0.01789786 0.04099187
#> m1.1 0.04026889 0.03711244 0.03107384 0.07103051
#> m2.1 0.04771443 0.04548535 0.03795941 0.08521884
#> m3.1 0.04213075 0.05666863 0.03934039 0.09241392
#> m4.1 0.04210602 0.03855191 0.03029851 0.06811999
#> m5.1 0.03150000 0.03500000 0.03150000 0.07050000
#> m6.1 0.06060253 0.06050551 0.03954217 0.09901347
#> m7.1 0.04057503 0.06236652 0.04740165 0.10866462
#> m8.1 0.03512241 0.05718974 0.04549267 0.10986507
#> m9.1 0.03020348 0.04796092 0.04053805 0.09350401
#> m10.1 0.06743145 0.06748086 0.04053805 0.10062637
#> AccuracyLowerSD AccuracyUpperSD AccuracyNullSD AccuracyPValueSD
#> m1.2 0.04818281 0.003862210 0.01150000 0.0005000000
#> m2.2 0.03873306 0.008485281 0.01154701 0.0000000000
#> m3.2 0.04579301 0.004349329 0.01050000 0.0000000000
#> m4.2 0.04676537 0.010392305 0.01154701 0.0000000000
#> m5.2 0.08748714 0.017339742 0.00000000 0.0025000000
#> m6.2 0.02494661 0.003774917 0.01000000 0.0000000000
#> m7.2 0.04300388 0.008962886 0.01050000 0.0005773503
#> m8.2 0.06376258 0.009215024 0.01150000 0.0015000000
#> m9.2 0.09437690 0.017017148 0.00000000 0.0023804761
#> m10.2 0.02655811 0.004041452 0.00000000 0.0000000000
#> m1.1 0.04818281 0.003862210 0.01150000 0.0005000000
#> m2.1 0.05109468 0.014545904 0.01154701 0.0010000000
#> m3.1 0.05845511 0.008246211 0.01050000 0.0005000000
#> m4.1 0.04287968 0.008958236 0.01154701 0.0005000000
#> m5.1 0.04400000 0.009500000 0.00000000 0.0005000000
#> m6.1 0.05369978 0.015152008 0.01000000 0.0025000000
#> m7.1 0.07251896 0.010969655 0.01050000 0.0014142136
#> m8.1 0.06061559 0.016663333 0.01150000 0.0063508530
#> m9.1 0.05502424 0.014899664 0.00000000 0.0023804761
#> m10.1 0.05844299 0.014899664 0.00000000 0.0023804761
#> McnemarPValueSD PositiveSD NegativeSD True PositiveSD False PositiveSD
#> m1.2 0.0000000 0 0.5000000 0.5773503 0.5773503
#> m2.2 0.0000000 0 0.5773503 0.0000000 0.0000000
#> m3.2 0.3002221 0 0.5000000 0.9574271 0.9574271
#> m4.2 0.3760000 0 0.5773503 1.4142136 1.4142136
#> m5.2 0.2708219 0 0.0000000 1.5000000 1.5000000
#> m6.2 0.2600000 0 0.5000000 0.5000000 0.5000000
#> m7.2 0.2600000 0 0.5000000 0.8164966 0.8164966
#> m8.2 0.4341674 0 0.5000000 1.4142136 1.4142136
#> m9.2 0.2609224 0 0.0000000 1.4142136 1.4142136
#> m10.2 0.0000000 0 0.0000000 0.0000000 0.0000000
#> m1.1 0.3002221 0 0.5000000 0.9574271 0.9574271
#> m2.1 0.2666063 0 0.5773503 0.5773503 0.5773503
#> m3.1 0.3850818 0 0.5000000 0.5000000 0.5000000
#> m4.1 0.3002221 0 0.5773503 0.5000000 0.5000000
#> m5.1 0.0000000 0 0.0000000 0.5000000 0.5000000
#> m6.1 0.4243061 0 0.5000000 0.5000000 0.5000000
#> m7.1 0.3760000 0 0.5000000 0.5000000 0.5000000
#> m8.1 0.3002221 0 0.5000000 0.5000000 0.5000000
#> m9.1 0.3002221 0 0.0000000 0.5773503 0.5773503
#> m10.1 0.4330000 0 0.0000000 0.5000000 0.5000000
#> True NegativeSD False NegativeSD
#> m1.2 0.5773503 0.5000000
#> m2.2 0.5000000 0.9574271
#> m3.2 0.5000000 0.5773503
#> m4.2 1.0000000 0.8164966
#> m5.2 0.9574271 0.9574271
#> m6.2 0.9574271 0.8164966
#> m7.2 1.4142136 0.9574271
#> m8.2 0.9574271 0.5773503
#> m9.2 1.5000000 1.5000000
#> m10.2 0.5773503 0.5773503
#> m1.1 0.5773503 0.5773503
#> m2.1 1.2909944 0.9574271
#> m3.1 0.9574271 1.2909944
#> m4.1 1.2583057 0.9574271
#> m5.1 0.5000000 0.5000000
#> m6.1 1.7078251 1.4142136
#> m7.1 1.5000000 1.1547005
#> m8.1 0.9574271 0.9574271
#> m9.1 0.8164966 0.8164966
#> m10.1 1.7078251 1.7078251
Otherwise, the mean validation metric values per algorithm can also be obtained with the following code:
mean_validation_metrics(i)
#> $`Araucaria angustifolia`
#> # A tibble: 2 × 53
#> algo ROC TSS Sensitivity Specificity `Pos Pred Value` `Neg Pred Value`
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 kknn 0.972 0.901 0.958 0.943 0.971 0.927
#> 2 naive_b… 0.986 0.907 0.988 0.921 0.962 0.977
#> # ℹ 46 more variables: Precision <dbl>, Recall <dbl>, F1 <dbl>,
#> # Prevalence <dbl>, `Detection Rate` <dbl>, `Detection Prevalence` <dbl>,
#> # `Balanced Accuracy` <dbl>, Accuracy <dbl>, Kappa <dbl>,
#> # AccuracyLower <dbl>, AccuracyUpper <dbl>, AccuracyNull <dbl>,
#> # AccuracyPValue <dbl>, McnemarPValue <dbl>, Positive <dbl>, Negative <dbl>,
#> # `True Positive` <dbl>, `False Positive` <dbl>, `True Negative` <dbl>,
#> # `False Negative` <dbl>, ROCSD <dbl>, TSSSD <dbl>, SensitivitySD <dbl>, …
After building predictions, it is possible to ensemble GCMs using
gcms_ensembles
function and informing in the parameter
gcms
which part of scenarios_names(i)
should
be used to ensemble gcms. In this example, scenarios names are:
c("ca_ssp245_2090", "ca_ssp585_2090", "mi_ssp245_2090", "mi_ssp585_2090")
.
Thus, if we set the parameter to c("ca", "mi")
the function
searches through scenarios names for "ca"
and
"mi"
and remove these parts of scenarios names. What
remains, in the example, is:
c("_ssp245_2090", "_ssp585_2090", "_ssp245_2090", "_ssp585_2090")
.
Then, the function ensembles scenarios with the same new names (note
that, by removing the gcms abbreviation, the remaining name repeats
itself two times). At the end, ensembles will be named after the new
names generated in this last step and are included in object
i
scenarios.
i <- gcms_ensembles(i, gcms = c("ca", "mi"))
#> New names:
#> New names:
#> • `cell_id` -> `cell_id...1`
#> • `mean_occ_prob` -> `mean_occ_prob...2`
#> • `wmean_AUC` -> `wmean_AUC...3`
#> • `committee_avg` -> `committee_avg...4`
#> • `cell_id` -> `cell_id...5`
#> • `mean_occ_prob` -> `mean_occ_prob...6`
#> • `wmean_AUC` -> `wmean_AUC...7`
#> • `committee_avg` -> `committee_avg...8`
i
#> caretSDM
#> ...............................
#> Class : input_sdm
#> -------- Occurrences --------
#> Species Names : Araucaria angustifolia
#> Number of presences : 84
#> Pseudoabsence methods :
#> Method to obtain PAs : bioclim
#> Number of PA sets : 10
#> Number of PAs in each set : 84
#> Data Cleaning : NAs, Capitals, Centroids, Geographically Duplicated, Identical Lat/Long, Institutions, Invalid, Non-terrestrial, Duplicated Cell (grid)
#> -------- Predictors ---------
#> Number of Predictors : 6
#> Predictors Names : bio1, bio4, bio12, PC1, PC2, PC3
#> PCA-transformed variables : DONE
#> Cummulative proportion ( 1 ) : PC1, PC2, PC3
#> --------- Scenarios ---------
#> Number of Scenarios : 5
#> Scenarios Names : ca_ssp245_2090 ca_ssp585_2090 mi_ssp245_2090 mi_ssp585_2090 current
#> ----------- Models ----------
#> Algorithms Names : naive_bayes kknn
#> Variables Names : PC1 PC2 PC3
#> Model Validation :
#> Method : repeatedcv
#> Number : 4
#> Metrics :
#> $`Araucaria angustifolia`
#> algo ROC TSS Sensitivity Specificity
#> 1 kknn 0.9716306 0.9012374 0.95815 0.942875
#> 2 naive_bayes 0.9855231 0.9069517 0.98800 0.921200
#>
#> -------- Predictions --------
#> Ensembles :
#> Scenarios : ca_ssp245_2090 ca_ssp585_2090 mi_ssp245_2090 mi_ssp585_2090 current _ssp245_2090 _ssp585_2090
#> Methods : mean_occ_prob wmean_AUC committee_avg
#> Thresholds :
#> Method : threshold
#> Criteria : 0.9
Note that now the section “Predictions” has two scenarios called _ssp245_2090 and _ssp585_2090, which are the GCM’s ensembles that we have calculated.
Plotting results
To plot results, we prepared plot and mapview functions. Here we
present only the plot versions due to mapview limitations for markdown,
but we encourage users to use the mapview alternatives every time it is
possible. To do that, simply alternate the “plot” portion of functions
to “mapview”. As an example, plot_occurrences
has its
counterpart function mapview_occurrences
with the same set
of arguments an functioning. For plot_predictions, we can set some
parameters to control what is being plotted. Probably the most important
parameter is the scenario
, which user can change to plot
every different scenario projected. If you are modeling more than one
species you can inform the correct species to be plotted using the
spp_name
parameter and if you are wealling to debate
separate projections you can plot them informing the model
id
(see row names of get_validation_metrics
above to retrieve models ids).
plot_predictions(i,
spp_name = NULL,
scenario = "current",
id = NULL,
ensemble = TRUE,
ensemble_type = "mean_occ_prob")
plot_predictions(i,
spp_name = NULL,
scenario = "_ssp245_2090",
id = NULL,
ensemble = TRUE,
ensemble_type = "mean_occ_prob")
Another plot widely used in SDM studies is the Partial Dependence
Plot, which informs the response curves to each variable. Here we are
using PCA axes as predictors, so there is not much sense in plotting
these curves, but if someone want to do that, it is possible through the
pdp_sdm
function.
pdp_sdm(i)
#> `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
Writing results
To export caretSDM
objects and outputs from R you can
use the write functions. For all possibilities see the help file
?write_ensembles
. We encourage users to use standard path
configuration, which organizes outputs in a straightforward fashion.
Common functions are the following:
write_occurrences(i, path = "results/occurrences.csv", grid = FALSE)
write_pseudoabsences(i, path = "results/pseudoabsences", ext = ".csv", centroid = FALSE)
write_grid(i, path = "results/grid_study_area.gpkg", centroid = FALSE)
write_ensembles(i, path = "results/ensembles", ext = ".tif")
Conclusion
This vignette demonstrates how to build Species Distribution Models
using caretSDM
. This vignette aimed to terrestrial uses
highlights the use of the package using a grid in the sdm_area.
Alternative to that can be seen in vignettes(“Salminus”, “caretSDM”)
where we build SDMs for a fish species using river lines in a
simplefeatures object instead of cells in a grid.
end_time <- Sys.time()
end_time - start_time
#> Time difference of 1.999732 mins