Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where to implement spatial resampling methods #989

Open
tiemvanderdeure opened this issue Sep 28, 2024 · 2 comments
Open

Where to implement spatial resampling methods #989

tiemvanderdeure opened this issue Sep 28, 2024 · 2 comments

Comments

@tiemvanderdeure
Copy link

In my field (ecology/species distribution modelling) it is very common to use spatial resampling, and I've written some spatial ResamplingStrategys, such as spatial cross-validation, and am considering where to share that code. I'm considering to either:

  • Add them to SpeciesDistributionModels.jl (which is still under development)
  • Make a separate package that SDM.jl would depend on
  • Add them here as an extension to GeometryOps.jl

The problem with the last option is that right now it's not really possible to pass additional information (such as the point location) of data to machine. I'm hacking around this in SDM.jl by calling train_test_pairs directly.

I would like to hear what others think about this?

@ablaom
Copy link
Member

ablaom commented Sep 30, 2024

Thanks @tiemvanderdeure for posing this interesting question. I'm trying to understand the required interface points better but have not done spatial resampling before. A ResamplingStrategy can have parameters. Is there a reason the "point location" cannot be one of these? Or are you saying it is needed by fit (in which case it is a hyperparameter??). Could you say a little more on this point?

@tiemvanderdeure
Copy link
Author

It's wouldn't be needed by fit, only by evaluate!.

In my field, observations might be locations where a species was/wasn't found. One then extracts information about these points, like climate, land use, distance to a road, etc, and fits a model based on these. The spatial resampling is used to make sure the model learned something about the species and not just the random spatial patterns.

So every row in X would have a point location, and a spatial resampling strategy would use these locations, e.g. to construct a grid and cross-validate grid cells instead of observations.

If points are a parameter in the ResamplingStrategy then it could only be used for one particular X and y, which defeats the purpose a little bit.

But the more I think about it, the more I realize that this might require quite a lot of changes to the interface to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants