Merge pull request #407 from pybop-team/382b-minkowski-cost

Including exponent in Minkowski cost
pybop-team · Jul 15, 2024 · 5491702 · 5491702
2 parents 64312ae + 73f5477
commit 5491702
Show file tree

Hide file tree

Showing 5 changed files with 86 additions and 34 deletions.
diff --git a/examples/notebooks/comparing_cost_functions.ipynb b/examples/notebooks/comparing_cost_functions.ipynb
@@ -6,7 +6,7 @@
    "source": [
     "## Investigating different cost functions\n",
     "\n",
-    "In this notebook, we take a look at the different cost function offered in PyBOP. Cost functions conventionally construct a distance metric between two mathematics sets (vectors), which is then used within PyBOP's optimisation algorthims. \n",
+    "In this notebook, we take a look at the different fitting cost functions offered in PyBOP. Cost functions for fitting problems conventionally describe the distance between two points (the target and the prediction) which is to be minimised via PyBOP's optimisation algorithms. \n",
     "\n",
     "First, we install and import the required packages below."
    ]
@@ -62,7 +62,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "For this notebook, we need to construct parameters, a model and a problem class before we can compare differing cost functions. We start with two parameters, but this is an arbituary selection and can be expanded given the model and data in question."
+    "For this notebook, we need to construct parameters, a model and a problem class before we can compare differing cost functions. We start with two parameters, but this is an arbitrary selection and can be expanded given the model and data in question."
    ]
   },
   {
@@ -89,7 +89,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Next, we will construct the Single Particle Model (SPM) with the Chen2020 parameter set, but like the above, this is an arbitruary selection and can be replaced with any PyBOP model."
+    "Next, we will construct the Single Particle Model (SPM) with the Chen2020 parameter set, but like the above, this is an arbitrary selection and can be replaced with any PyBOP model."
    ]
   },
   {
@@ -163,7 +163,7 @@
    "source": [
     "### Sum of Square Errors and Root Mean Squared Error\n",
     "\n",
-    "First, let's start with the easiest cost functions, the sum of squared errors (SSE) and the root mean squared error (RMSE). Constructing these classes is very concise in PyBOP, and only requires the problem class."
+    "First, let's start with two commonly-used cost functions: the sum of squared errors (SSE) and the root mean squared error (RMSE). Constructing these classes is very concise in PyBOP, and only requires the problem class."
    ]
   },
   {
@@ -180,7 +180,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Now, we can investigate how these functions differ when fitting the parameters. To acquire the distance metric for each of these, we can simply use the constructed class in a call method, such as:"
+    "Now, we can investigate how these functions differ when fitting the parameters. To acquire the cost value for each of these, we can simply use the call method of the constructed class, such as:"
    ]
   },
   {
@@ -316,7 +316,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "In this situation, it's clear that gradient of the SSE cost is much higher than the RMSE. This can be helpful for certain optimisation algorithms, specifically towards improving convergence performance within a predefine number of iterations. However, with incorrect hyperparameter values this can also result in the algorithm not converging due to sampling locations outside of the \"cost valley\"."
+    "In this situation, it's clear that the curvature of the SSE cost is greater than that of the RMSE. This can improve the rate of convergence for certain optimisation algorithms. However, with incorrect hyperparameter values, larger gradients can also result in the algorithm not converging due to sampling locations outside of the \"cost valley\", e.g. infeasible parameter values."
    ]
   },
   {
@@ -325,14 +325,19 @@
    "source": [
     "### Minkowski Cost\n",
     "\n",
-    "Next, we will investigate using the Minkowski cost function. This cost function provides a general formation of the above two cost functions, allowing for hyper parameter calibration on the cost function itself. The Minkowski cost takes the form,\n",
+    "Next, we will investigate using the Minkowski distance. The Minkowski cost takes a general form, which allows for hyperparameter calibration on the cost function itself, given by\n",
     "\n",
-    "$\\mathcal{L} = \\displaystyle\\sum_{1}^N  (\\hat{y}-y)^p$\n",
+    "$\\mathcal{L_p} = \\displaystyle \\Big(\\sum_i  |\\hat{y_i}-y_i|^p\\Big)^{1/p}$\n",
     "\n",
-    "For p = 1, this becomes L1Norm  \n",
-    "For p = 2, this becomes L2Norm (SSE)\n",
+    "where p ≥ 0 is the order of the Minkowski distance.\n",
     "\n",
-    "PyBOP offers a Minkowski class, which we will construct below. This class has an optional argument of `p` which designates the order in the above equation. This value can be a float, with the only requirement that it is not negative. First, let's reconstruct the SSE function with a `p` value of 2."
+    "For $p = 1$, it is the Manhattan distance.\n",
+    "For $p = 2$, it is the Euclidean distance.\n",
+    "For $p ≥ 1$, the Minkowski distance is a metric, but for $0<p<1$, note that the Minkowski distance is not a metric because the triangle inequality does not hold [[1]](https://en.wikipedia.org/wiki/Minkowski_distance), [[2]](https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.minkowski.html#scipy.spatial.distance.minkowski).\n",
+    "\n",
+    "The RMSE can be computed by dividing the Euclidean distance by the square root of the length of the target vector, while the SSE is the square of the Euclidean distance.\n",
+    "\n",
+    "PyBOP offers a Minkowski class, which we will construct below. This class has an optional argument of `p` which designates the order in the above equation. This value can be a float, with the only requirement that it is not negative. First, let's confirm the relationship between the SSE, RMSE and the Minkowski distance with a `p` value of 2."
    ]
   },
   {
@@ -365,16 +370,36 @@
     "    y_minkowski.append(cost_minkowski([7.56e-05, i]))\n",
     "\n",
     "fig = go.Figure()\n",
-    "fig.add_trace(go.Scatter(x=x_range, y=y_SSE, mode=\"lines\", name=\"SSE\"))\n",
-    "fig.add_trace(go.Scatter(x=x_range, y=y_minkowski, mode=\"lines\", name=\"Minkowski\"))\n",
+    "fig.add_trace(\n",
+    "    go.Scatter(\n",
+    "        x=x_range,\n",
+    "        y=np.asarray(y_RMSE) * np.sqrt(len(t_eval)),\n",
+    "        mode=\"lines\",\n",
+    "        name=\"RMSE*N\",\n",
+    "    )\n",
+    ")\n",
+    "fig.add_trace(\n",
+    "    go.Scatter(\n",
+    "        x=x_range,\n",
+    "        y=np.sqrt(y_SSE),\n",
+    "        mode=\"lines\",\n",
+    "        line=dict(dash=\"dash\"),\n",
+    "        name=\"sqrt(SSE)\",\n",
+    "    )\n",
+    ")\n",
+    "fig.add_trace(\n",
+    "    go.Scatter(\n",
+    "        x=x_range, y=y_minkowski, mode=\"lines\", line=dict(dash=\"dot\"), name=\"Minkowski\"\n",
+    "    )\n",
+    ")\n",
     "fig.show()"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "As expected, these cost functions are equivalent. Now, let take a look at how the Minkowski cost changes for different orders of `p`."
+    "As expected, these lines lie on top of one another. Now, let's take a look at how the Minkowski cost changes for different orders, `p`."
    ]
   },
   {
@@ -414,16 +439,17 @@
     "    fig.add_trace(\n",
     "        go.Scatter(x=x_range, y=y_minkowski[k], mode=\"lines\", name=f\"Minkowski {_}\")\n",
     "    )\n",
+    "fig.update_yaxes(range=[0, np.max(y_minkowski[2])])\n",
     "fig.show()"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "As seen above, the Minkowski cost function allows for a variety of different distance metrics to be created. This provides users with another hyper parameter for to calibrate for optimisation algorithm convergence. This addition does expand the global search space, and should be carefully considered before deciding upon.\n",
+    "As seen above, the Minkowski cost allows for a range of different cost functions to be created. This provides users with another hyperparameter to calibrate for optimisation algorithm convergence. This addition does expand the global search space, and should be carefully considered before deciding upon.\n",
     "\n",
-    "In this notebook, we've shown the different distance metrics (cost functions) offered in PyBOP. Selection between these functions can effect the identified parameters in the case that the optimiser hyperparameter values are not properly calibrated. "
+    "In this notebook, we've shown the different fitting cost functions offered in PyBOP. Selection between these functions can affect the optimisation result in the case that the optimiser hyperparameter values are not properly calibrated. "
    ]
   }
  ],

diff --git a/pybop/costs/fitting_costs.py b/pybop/costs/fitting_costs.py
@@ -172,34 +172,43 @@ def _evaluateS1(self, inputs: Inputs):
 class Minkowski(BaseCost):
     """
     The Minkowski distance is a generalisation of several distance metrics,
-    including Euclidean and Manhattan distances. It is defined as:
+    including the Euclidean and Manhattan distances. It is defined as:
 
     .. math::
-        L_p(x, y) = (\\sum_i |x_i - y_i|^p)
+        L_p(x, y) = ( \\sum_i |x_i - y_i|^p )^(1/p)
 
-    where p ≥ 1 is the order of the Minkowski metric.
+    where p > 0 is the order of the Minkowski distance. For p ≥ 1, the
+    Minkowski distance is a metric. For 0 < p < 1, it is not a metric, as it
+    does not satisfy the triangle inequality, although a metric can be
+    obtained by removing the (1/p) exponent.
 
     Special cases:
 
     * p = 1: Manhattan distance
     * p = 2: Euclidean distance
-    * p → ∞: Chebyshev distance
+    * p → ∞: Chebyshev distance (not implemented as yet)
 
     This class implements the Minkowski distance as a cost function for
     optimisation problems, allowing for flexible distance-based optimisation
     across various problem domains.
 
-    Attributes:
-        p (float): The order of the Minkowski metric.
+    Attributes
+    ----------
+    p : float, optional
+        The order of the Minkowski distance.
     """
 
     def __init__(self, problem, p: float = 2.0):
         super().__init__(problem)
         if p < 0:
             raise ValueError(
-                "The order of the Minkowski metric must be greater than 0."
+                "The order of the Minkowski distance must be greater than 0."
             )
-        self.p = p
+        elif not np.isfinite(p):
+            raise ValueError(
+                "For p = infinity, an implementation of the Chebyshev distance is required."
+            )
+        self.p = float(p)
 
     def _evaluate(self, inputs: Inputs, grad=None):
         """
@@ -222,6 +231,7 @@ def _evaluate(self, inputs: Inputs, grad=None):
         e = np.asarray(
             [
                 np.sum(np.abs(prediction[signal] - self._target[signal]) ** self.p)
+                ** (1 / self.p)
                 for signal in self.signal
             ]
         )
@@ -253,10 +263,20 @@ def _evaluateS1(self, inputs):
             return np.inf, self._de * np.ones(self.n_parameters)
 
         r = np.asarray([y[signal] - self._target[signal] for signal in self.signal])
-        e = np.sum(np.sum(np.abs(r) ** self.p))
-        de = self.p * np.sum(np.sum(r ** (self.p - 1) * dy.T, axis=2), axis=1)
+        e = np.asarray(
+            [
+                np.sum(np.abs(y[signal] - self._target[signal]) ** self.p)
+                ** (1 / self.p)
+                for signal in self.signal
+            ]
+        )
+        de = np.sum(
+            np.sum(r ** (self.p - 1) * dy.T, axis=2)
+            / (e ** (self.p - 1) + np.finfo(float).eps),
+            axis=1,
+        )
 
-        return e, de
+        return np.sum(e), de
 
 
 class SumofPower(BaseCost):

diff --git a/pybop/parameters/parameter.py b/pybop/parameters/parameter.py
@@ -51,7 +51,7 @@ def __init__(
         self.set_bounds(bounds)
         self.margin = 1e-4
 
-    def rvs(self, n_samples, random_state=None):
+    def rvs(self, n_samples: int = 1, random_state=None):
         """
         Draw random samples from the parameter's prior distribution.
 
@@ -61,7 +61,7 @@ def rvs(self, n_samples, random_state=None):
         Parameters
         ----------
         n_samples : int
-            The number of samples to draw.
+            The number of samples to draw (default: 1).
 
         Returns
         -------
@@ -332,7 +332,7 @@ def rvs(self, n_samples: int = 1) -> list:
         Parameters
         ----------
         n_samples : int
-            The number of samples to draw.
+            The number of samples to draw (default: 1).
 
         Returns
         -------
@@ -417,7 +417,7 @@ def get_bounds_for_plotly(self):
         bounds : numpy.ndarray
             An array of shape (n_parameters, 2) containing the bounds for each parameter.
         """
-        bounds = np.empty((len(self), 2))
+        bounds = np.zeros((len(self), 2))
 
         for i, param in enumerate(self.param.values()):
             if param.applied_prior_bounds:
@@ -427,7 +427,7 @@ def get_bounds_for_plotly(self):
                     UserWarning,
                     stacklevel=2,
                 )
-            elif param.bounds is not None:
+            if param.bounds is not None:
                 bounds[i] = param.bounds
             else:
                 raise ValueError("All parameters require bounds for plotting.")

diff --git a/pybop/plotting/plot2d.py b/pybop/plotting/plot2d.py
@@ -1,5 +1,6 @@
 import sys
 import warnings
+from typing import Union
 
 import numpy as np
 from scipy.interpolate import griddata
@@ -10,7 +11,7 @@
 def plot2d(
     cost_or_optim,
     gradient: bool = False,
-    bounds: np.ndarray = None,
+    bounds: Union[np.ndarray, None] = None,
     steps: int = 10,
     show: bool = True,
     use_optim_log: bool = False,

diff --git a/tests/unit/test_cost.py b/tests/unit/test_cost.py
@@ -231,8 +231,13 @@ def test_costs(self, cost):
     @pytest.mark.unit
     def test_minkowski(self, problem):
         # Incorrect order
-        with pytest.raises(ValueError, match="The order of the Minkowski metric"):
+        with pytest.raises(ValueError, match="The order of the Minkowski distance"):
             pybop.Minkowski(problem, p=-1)
+        with pytest.raises(
+            ValueError,
+            match="For p = infinity, an implementation of the Chebyshev distance is required.",
+        ):
+            pybop.Minkowski(problem, p=np.inf)
 
     @pytest.mark.unit
     def test_sumofpower(self, problem):