Calculates ensemble predictions for species distribution models using custom or implemented methods.
Usage
ensemble_sdm(m,
scen = NULL,
method = "average",
metric = NULL,
fun = NULL
)
get_ensembles(i)
add_ensembles(e1, e2)Arguments
- m
A
input_sdmor amodelsobject.- scen
A
scenariosobject orNULL. IfNULLandmis ainput_sdmwith a scenarios slot, it will be used.- method
Character or a function. Which ensembles should be calculated? See details.
- metric
Character. Used with
method = "weighted_average": Which metric should be used to weight predictions? If NULL- fun
Function. If
method = "committee_average", the function will be used to binarize the data. It will receive caret's train object and must return a numeric value (the threshold, see details).- i
A
input_sdmor apredictionsobject.- e1
A
ensemblesobject.- e2
A
ensemblesobject.
Details
ensembles could be set to three different strategies OR a custom function.
The three implemented strategies are:
average is the mean occurrence probability, which is a simple mean of predictions;
weighted_average is the same average, but weighted by a metric, which needs to be
set using argument metric (see mean_validation_metrics for the metrics available).
committee_average is the committee average, as known as majority rule, where predictions
are binarized and then a mean is obtained. To binarize predictions, user can set a custom
function in the fun argument to calculate a threshold for each model. Standardly, the
committee average uses the caret::thresholder function to find the threshold that
maximizes the sum of sensitivity and specificity (through caretSDM:::.MaxSeSp).
Custom function (fun) must use the argument mod, which is the model output from
caret package (see get_models) and must return a numeric value (see example).
method can also be set to a custom function, which must receive the argument pred_mat,
which is a matrix of predictions (columns are models and rows are cells) and return a vector of
predictions (one value per cell). See the median example below for a custom function.
get_predictions returns the list of all predictions to all scenarios, all species,
all algorithms and all repetitions. Useful for those who wish to implement their own ensemble
methods.
get_ensembles returns a matrix of data.frames, where each column is a
scenario and each row is a species.
scenarios_names returns the scenarios names in a sdm_area or input_sdm
object.
get_scenarios_data returns the data from scenarios in a sdm_area or
input_sdm object.
Examples
if (interactive()) {
# Create sdm_area object:
set.seed(1)
sa <- sdm_area(parana, cell_size = 100000, crs = 6933)
# Include predictors:
sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio12"))
# Include scenarios:
sa <- add_scenarios(sa)
# Create occurrences:
oc <- occurrences_sdm(occ, crs = 6933) |> join_area(sa)
# Create input_sdm:
i <- input_sdm(oc, sa)
# Pseudoabsence generation:
i <- pseudoabsences(i, method="random", n_set=2)
# Custom trainControl:
ctrl_sdm <- caret::trainControl(method = "boot",
number = 1,
repeats = 1,
classProbs = TRUE,
returnResamp = "all",
summaryFunction = summary_sdm,
savePredictions = "all")
# Train models:
i <- train_sdm(i, algo = c("naive_bayes"), ctrl=ctrl_sdm) |>
suppressWarnings()
# Predict models:
i <- predict_sdm(i, th = 0.8)
# Ensemble:
i <- ensemble_sdm(i, method = "average")
i
}
# Example from a custom function to obtain the threshold that maximizes
# the sensitivity plus specificity:
MaxSeSp <- function(mod) {
th <- caret::thresholder(mod,
threshold = seq(0, 1, by = 0.001),
final = TRUE,
statistics = c("Sensitivity", "Specificity")
)
th <- th$prob_threshold[which.max(th$Sensitivity + th$Specificity)]
if (length(th) > 1) mean(th) else th
}
# Example from a custom function to obtain ensembles using the median instead of the mean:
median_ensemble <- function(pred_mat) {
apply(pred_mat, 1, median, na.rm = TRUE)
}
