Ensemble Species Distribution Models

Calculates ensemble predictions for species distribution models using custom or implemented methods.

Usage

ensemble_sdm(m,
            scen = NULL,
            method = "average",
            metric = NULL,
            fun = NULL
            )

get_ensembles(
  i,
  type = "matrix",
  spp_name = NULL,
  scenario = NULL,
  ensemble_type = NULL
)

add_ensembles(e1, e2)

Arguments

m: A input_sdm or a models object.
scen: A scenarios object or NULL. If NULL and m is a input_sdm with a scenarios slot, it will be used.
method: Character or a function. Which ensembles should be calculated? See details.
metric: Character. Used with method = "weighted_average": Which metric should be used to weight predictions? If NULL
fun: Function. If method = "committee_average", the function will be used to binarize the data. It will receive caret's train object and must return a numeric value (the threshold, see details).
i: A input_sdm or a predictions object.
type: Character. Output format desired. One of "matrix", "sf", "stars", "raster", or "rast". Defaults to "matrix".
spp_name: Character or NULL. Name of the species to retrieve ensembles for. Defaults to the first available species if NULL.
scenario: Character or NULL. Name of the scenario to retrieve ensembles for. Defaults to the first available scenario if NULL.
ensemble_type: Character or NULL. The ensemble method to use for retrieval. Must be a subset of the methods stored in i$ensembles$method. Defaults to the first method if NULL.
e1: A ensembles object.
e2: A ensembles object.

Value

A input_sdm or a predictions object.

Details

ensembles could be set to three different strategies OR a custom function. The three implemented strategies are: average is the mean occurrence probability, which is a simple mean of predictions; weighted_average is the same average, but weighted by a metric, which needs to be set using argument metric (see mean_validation_metrics for the metrics available). committee_average is the committee average, as known as majority rule, where predictions are binarized and then a mean is obtained. To binarize predictions, user can set a custom function in the fun argument to calculate a threshold for each model. Standardly, the committee average uses the caret::thresholder function to find the threshold that maximizes the sum of sensitivity and specificity (through caretSDM:::.MaxSeSp). Custom function (fun) must use the argument mod, which is the model output from caret package (see get_models) and must return a numeric value (see example). method can also be set to a custom function, which must receive the argument pred_mat, which is a matrix of predictions (columns are models and rows are cells) and return a vector of predictions (one value per cell). See the median example below for a custom function.

get_predictions returns the list of all predictions to all scenarios, all species, all algorithms and all repetitions. Useful for those who wish to implement their own ensemble methods.

get_ensembles returns a matrix of data.frames, where each column is a scenario and each row is a species.

scenarios_names returns the scenarios names in a sdm_area or input_sdm object.

get_scenarios_data returns the data from scenarios in a sdm_area or input_sdm object.

Author

Luíz Fernando Esser (luizesser@gmail.com) https://luizfesser.wordpress.com

Examples

if (interactive()) {
  # Create sdm_area object:
  set.seed(1)
  sa <- sdm_area(parana, cell_size = 100000, output_crs = 6933)

  # Include predictors:
  sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio12"))

  # Include scenarios:
  sa <- add_scenarios(sa)

  # Create occurrences:
  oc <- occurrences_sdm(occ, occ_crs = 6933)

  # Create input_sdm:
  i <- input_sdm(oc, sa)

  # Pseudoabsence generation:
  i <- pseudoabsences(i, method = "random", n_set = 2)

  # Custom trainControl:
  ctrl_sdm <- caret::trainControl(
    method = "boot",
    number = 1,
    repeats = 1,
    classProbs = TRUE,
    returnResamp = "all",
    summaryFunction = summary_sdm,
    savePredictions = "all"
  )

  # Train models:
  i <- train_sdm(i, algo = c("naive_bayes"), ctrl = ctrl_sdm) |>
    suppressWarnings()

  # Predict models:
  i <- predict_sdm(i, th = 0.8)

  # Ensemble:
  i <- ensemble_sdm(i, method = "average")
  i
}

# Example from a custom function to obtain the threshold that maximizes
# the sensitivity plus specificity:
MaxSeSp <- function(mod) {
  th <- caret::thresholder(mod,
    threshold = seq(0, 1, by = 0.001),
    final = TRUE,
    statistics = c("Sensitivity", "Specificity")
  )
  th <- th$prob_threshold[which.max(th$Sensitivity + th$Specificity)]
  if (length(th) > 1) mean(th) else th
}

# Example from a custom function to obtain ensembles using the median instead of the mean:
median_ensemble <- function(pred_mat) {
  apply(pred_mat, 1, median, na.rm = TRUE)
}