Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dist_func in skf.Variogram can support callables #130

Open
Megil-Zesty opened this issue Oct 20, 2022 · 4 comments
Open

dist_func in skf.Variogram can support callables #130

Megil-Zesty opened this issue Oct 20, 2022 · 4 comments

Comments

@Megil-Zesty
Copy link

Megil-Zesty commented Oct 20, 2022

The documentation in init (and online) suggests that we have to use one of the predefined distance functions; however, pdist supports passing functions (instead of just names of common distance functions).

I was able to pass a function in the initialization of the Variogram which enabled me to handle calculating distance between Shapely Point objects containing lat/lon coordinates.

Overall, super happy about this! Fixing the documentation may help others in the future.

@Megil-Zesty
Copy link
Author

Megil-Zesty commented Oct 20, 2022

Using your own function can cause issues with the OrdianryKrigging in the init method.
The function "_remove_duplicated_coordinates" assumed the 'coordinates' do not contain objects (because it relies on Numpy's unique function).

Overriding this solved that issue, but presented another in the next line. Setting the self.dict_metric the "dist_func" property produced by the Variogram.describe() throws an error as the class is not aware of the custom distance function I used in Variogram. To fix this, I overrode the init method, simply changed "self.dist_metric = variogram["dist_func"]" on line 127 to "self.dist_metric = self.distance_btwn_coords2" (which is the name of my custom distance function) and then added the distance funciton to my child class.

from skgstat.Variogram import Variogram
from skgstat.MetricSpace import MetricSpace, MetricSpacePair

class ZestyOrdinaryKriging(OrdinaryKriging):
    def __init__(
                self,
                variogram,
                min_points=5,
                max_points=15,
                mode='exact',
                precision=100,
                solver='inv',
                n_jobs=1,
                perf=False,
                sparse=False,
                coordinates=None,
                values=None
        ):
        """
        Overrriding init method due to issue with "self.dist_metric = variogram["dist_func"]"
        This was caused by passing a function in the variogram class instead of using a pre-defined
        distance metric (typically refenced by a string)
        """
        # store arguments to the instance

        if isinstance(variogram, Variogram):
            if coordinates is None:
                coordinates = variogram.coordinates
            if values is None:
                values = variogram.values
            variogram_descr = variogram.describe()
            if variogram_descr["model"] == "harmonize":
                variogram_descr["model"] = variogram._build_harmonized_model()
            variogram = variogram_descr

        self.sparse = sparse

        # general attributes
        self._minp = min_points
        self._maxp = max_points
        self.min_points = min_points
        self.max_points = max_points

        # general settings
        self.n_jobs = n_jobs
        self.perf = perf

        self.range = variogram['effective_range']
        self.nugget = variogram['nugget']
        self.sill = variogram['sill']
        self.dist_metric = self.distance_btwn_coords2

        # coordinates and semivariance function
        if not isinstance(coordinates, MetricSpace):
            coordinates, values = self._remove_duplicated_coordinates(coordinates, values)
            coordinates = MetricSpace(coordinates.copy(), self.dist_metric, self.range if self.sparse else None)
        else:
            assert self.dist_metric == coordinates.dist_metric, "Distance metric of variogram differs from distance metric of coordinates"
            assert coordinates.max_dist is None or coordinates.max_dist == self.range, "Sparse coordinates must have max_dist == variogram.effective_range"
        self.values = values.copy()
        self.coords = coordinates
        self.gamma_model = Variogram.fitted_model_function(**variogram)
        self.z = None

        # calculation mode; self.range has to be initialized
        self._mode = mode
        self._precision = precision
        self._prec_dist = None
        self._prec_g = None
        self.mode = mode
        self.precision = precision

        # solver settings
        self._solver = solver
        self._solve = None
        self.solver = solver

        # initialize error counter
        self.singular_error = 0
        self.no_points_error = 0
        self.ill_matrix = 0

        # performance counter
        if self.perf:
            self.perf_dist = list()
            self.perf_mat = list()
            self.perf_solv = list()
        
    @classmethod
    def _remove_duplicated_coordinates(cls, coords, values):
        """
        Override this method to handle Shapely Point Coordinates.
        
        Extract the coordinates and values
        The coordinates array is checked for duplicates and only the
        first instance of a duplicate is used. Duplicated coordinates
        would result in duplicated rows in the variogram matrix and
        make it singular.
        """
        c = coords
        v = values
        unique_idx = cls.unique_indices(coords)
        
        return c[unique_idx], v[unique_idx]
    
    def unique_indices(arr):
        unique_vals = []
        unique_indices = []
        for idx, val in enumerate(arr):
            if val[0] not in unique_vals:
                unique_vals.append(val)
                unique_indices.append(idx)
        del unique_vals
        return unique_indices
    
    def distance_btwn_coords2(p_i, p_j):
        p_i=p_i[0]
        p_j=p_j[0]
        return geodesic((p_i.y, p_i.x), (p_j.y, p_j.x)).km

@mmaelicke
Copy link
Owner

Hey, thanks for sharing all this.
The missing docs were on purpose, as I did never systematically test and check the usage of custom distance functions throughout the library.
I'll keep this issue open for others as reference, until I can think of a more abstract way, how OrdinaryKriging can generally handle custom distance functions.
Best

Mirko

@redhog
Copy link
Collaborator

redhog commented Oct 24, 2022

What is the advantage of a custom distance function over projecting to a cartesian plane @Megil-Zesty ? Specifically in GIS applications, reprojecting to some UTM zone should be preferable, unless you're working with extremely large areas (a whole continent or so)? If for nothing else: function calls and custom math for pairs of points is going to be much slower than the builtin l2 norm + a call to pyproj with the vector of points at the beginning.

@Megil-Zesty
Copy link
Author

@redhog I didn't consider the performance drop from using a custom distance function - I originally went that route because I expected it to be simpler than reprojecting from WGS84 (though it obviously ended up causing issues downstream within the Krigging module).

I've since tried projecting to UTM and it is much faster and gives the proper units that I was looking for in the Variogram. Thanks for the suggestion! I'm coming from a background outside of geospatial analysis so this was new to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants