SDM model predictability issue #362

Raquel-RuizDiaz · 2024-08-20T16:38:04Z

Raquel-RuizDiaz
Aug 20, 2024

Hi!

I’m building delta-gamma species distribution models using sdmTMB for three species. The residuals distribution looks good, and the models explain about 65% of the deviance compared to a null model. However, the model predictability is quite low for two of the species. When assessing predictability, I found correlations of 0.3 and 0.4 for two models and 0.7 for the other, between predictions and observations for the same period. These correlations decrease further when performing out-of-sample cross-validations.

My main question is: what correlation level is generally considered acceptable for species distribution models?

The binomial component of my model performs well, with correlations around 0.8. It seems that the gamma component might be the issue, which makes sense given that predicting biomass with a model that only includes temperature and depth may not yield high correlations. I’m wondering if these correlation values are good enough, or if I should consider using a binomial model instead.

Thanks!

ericward-noaa · 2024-08-20T18:01:18Z

ericward-noaa
Aug 20, 2024
Collaborator

A challenge in working with delta models is that you need to interpret both the presence-absence and positive pieces of the model.

For the presence-absence component, rather than correlations, it might be a good idea to look at the classification error, AUC (see pROC), or other metrics. For something like classification error, you want to see a model give classification rates better than 50% (a coin flip) -- but how much better the model needs to be depends on the data inputs / modeling objectives / etc. For example, sparse data with lots of sampling / measurement errors would be difficult to generate high classification rates.

For the positive model, it's ok to use correlations -- if you used a Gamma family with log link, it might be worth looking at the correlation between log(response) and the predictions from your model. You could start with a null model (no covariates) and build up, including covariates like temperature to see if that increases the correlation. Ideally this would be done with some sort of cross validation.

0 replies

Raquel-RuizDiaz · 2024-08-21T11:10:41Z

Raquel-RuizDiaz
Aug 21, 2024
Author

Thank you so much for your helpful answer. I was initially assessing correlations using the combined estimates instead of focusing on just the gamma estimates. The correlations are much better now.

If I may ask another quick question: I'm projecting biomass to the end of the century using climate model scenarios and running 100 simulations to account for uncertainty as follows:

preds_future_IPSL126 <- predict(fit_hist, IPSL_grid126, nsim = 100L)
I then calculate the mean and standard deviation like this:

IPSL_grid126$se <- apply(preds_future_IPSL126, 1, sd)
IPSL_grid126$est <- apply(preds_future_IPSL126, 1, mean)

My question is: when I calculate these values, I end up with a single set of estimates (est) rather than separate estimates for the binomial (est1) and gamma (est2) components. I'm unsure whether I need to transform these new estimates and the standard errors. Should I apply exp() since the gamma model uses a log link function, or should I leave them as they are?

I really appreciate your help with this.

0 replies

ericward-noaa · 2024-08-21T12:35:01Z

ericward-noaa
Aug 21, 2024
Collaborator

Yes -- good question. For delta models, you want to use the model argument in the call to predict(). So for your code, it'd be

preds_future_IPSL126_binom <- predict(fit_hist, IPSL_grid126, nsim = 100L, model = 1) and
preds_future_IPSL126_pos <- predict(fit_hist, IPSL_grid126, nsim = 100L, model = 2) for the positive piece. You can then inverse logit (plogis()) and exponentiate these. When you don't specify the model for delta families, predict() returns the combined prediction

1 reply

Raquel-RuizDiaz Aug 21, 2024
Author

Ohh I see. That makes sense. Thanks so much for the help!!!

seananderson · 2024-08-21T22:08:09Z

seananderson
Aug 21, 2024
Maintainer

@Raquel-RuizDiaz in addition to being able to predict the two parts separately (and then combine them if you'd like), in answer to this:

I'm unsure whether I need to transform these [delta-model simulation-based] new estimates and the standard errors.

Yes, the overall values from a delta-gamma model are returned in log-space by default. You can exp() them and take quantiles (or add and subtract SDs in log space) if you want to turn those into biomass or density estimates.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SDM model predictability issue #362

{{title}}

Replies: 4 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

SDM model predictability issue #362

Raquel-RuizDiaz Aug 20, 2024

Replies: 4 comments · 1 reply

ericward-noaa Aug 20, 2024 Collaborator

Raquel-RuizDiaz Aug 21, 2024 Author

ericward-noaa Aug 21, 2024 Collaborator

Raquel-RuizDiaz Aug 21, 2024 Author

seananderson Aug 21, 2024 Maintainer

Raquel-RuizDiaz
Aug 20, 2024

Replies: 4 comments 1 reply

ericward-noaa
Aug 20, 2024
Collaborator

Raquel-RuizDiaz
Aug 21, 2024
Author

ericward-noaa
Aug 21, 2024
Collaborator

Raquel-RuizDiaz Aug 21, 2024
Author

seananderson
Aug 21, 2024
Maintainer