Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplement linear methods using recursive method #12

Merged
merged 18 commits into from
Aug 21, 2024
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 9 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,15 @@ Cubic interpolations require two more degrees of freedom per dimension, which re
Similar to the linear methods, depending on implementation, the constant term can vary by orders of magnitude,
as can the RAM usage.

| Method | RAM | Interp. Cost (Best Case) | Interp. Cost (Worst Case) | Extrap. Cost (Worst Case) |
|-------------------------------|-----------|--------------------------|-----------------------------------------|------------------------------------------------|
| multilinear::regular | O(ndims) | O(2^ndims * ndims) | O(2^ndims * ndims) | O(2^ndims + ndims^2) |
| multilinear::rectilinear | O(ndims) | O(2^ndims * ndims) | O(ndims * (2^ndims + log2(gridsize))) | O(ndims * (2^ndims + ndims + log2(gridsize))) |
| multicubic::regular | O(ndims) | O(4^ndims) | O(4^ndims) | O(4^ndims) |
| multicubic::rectilinear | O(ndims) | O(4^ndims) | O(4^ndims) + ndims * log2(gridsize) | O(4^ndims) + ndims * log2(gridsize) |
Rectilinear methods perform a bisection search to find the relevant grid cell, which takes
a worst-case number of iterations of log2(number of grid elements).

| Method | RAM | Interp. / Extrap. Cost |
|-------------------------------|-----------|------------------------------|
| multilinear::regular | O(ndims) | O(2^ndims) |
| multilinear::rectilinear | O(ndims) | O(2^ndims) + log2(gridsize) |
| multicubic::regular | O(ndims) | O(4^ndims) |
| multicubic::rectilinear | O(ndims) | O(4^ndims) + log2(gridsize) |

# Example: Multilinear and Multicubic w/ Regular Grid
```rust
Expand Down
62 changes: 55 additions & 7 deletions interpn/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,60 @@
# Changelog
All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## 0.4.3 - 2024-08-03

## [Unreleased]
## Added

## [0.1.0](https://github.com/jlogan03/interpn/releases/tag/v0.1.0) - 2023-11-22
* Implement `interpn_alloc` function for each method, which allocates a Vec for the output

### Other
- Squash early development
## Changed

* Use recursive method to evaluate multilinear interpolation instead of hypercube method
* This makes extrapolation cost consistent with interpolation cost, and reduces nominal perf scaling
* Shows about 2x slower perf in micro-benchmarks, but about 2x faster in end-to-end benchmarks after the Python bindings
* Need to improve benchmarking strategy to better capture perf in real-life usage
* Reduce repeated documentation

## 0.4.2 - 2024-05-12

## Added

* Implement cubic rectilinear method

## 0.4.1 - 2024-05-06

## Fixed

* Fix grid cell index selection to properly center the grid cell s.t. t=0 corresponds to index 1

## Added

* Add test of cubic method against sine function to capture potential problems not visible when testing against linear and quadratic functions

## 0.4.0 - 2024-05-05

## Changes

* Implement cubic interpolation for regular grid
* Continuous first derivative everywhere
* Option to clamp to linear extrapolation to prevent unwanted extrapolation of local curvature
* Test linearized method against linear function in both interpolation and extrapolation
* Test un-linearized method against quadratic function in both interpolation and extrapolation

## 0.3.0 - 2023-12-17

## Changed

* Remove initial guess for cell index from rectilinear method
* Collapse some loops
* Remove support for negative step sizes for regular grid in favor of reducing number of abs() calls
* Remove some saturating sub calls that are not needed now that degenerate grids are not supported
* Get indexing dimension product in the same way for rectilinear method as for regular grid method
* Use better initial value for folds
* Update docs
* Use optimizations for tests because it's faster overall

## 0.2.0 - 2023-12-06

## Changed

* Propagate Result everywhere that unwrap or assert! was being used
2 changes: 1 addition & 1 deletion interpn/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "interpn"
version = "0.4.2"
version = "0.4.3"
edition = "2021"
authors = ["James Logan <jlogan03@gmail.com>"]
license = "MIT OR Apache-2.0"
Expand Down
15 changes: 9 additions & 6 deletions interpn/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,15 @@ Cubic interpolations require two more degrees of freedom per dimension, which re
Similar to the linear methods, depending on implementation, the constant term can vary by orders of magnitude,
as can the RAM usage.

| Method | RAM | Interp. Cost (Best Case) | Interp. Cost (Worst Case) | Extrap. Cost (Worst Case) |
|-------------------------------|-----------|--------------------------|-----------------------------------------|------------------------------------------------|
| multilinear::regular | O(ndims) | O(2^ndims * ndims) | O(2^ndims * ndims) | O(2^ndims + ndims^2) |
| multilinear::rectilinear | O(ndims) | O(2^ndims * ndims) | O(ndims * (2^ndims + log2(gridsize))) | O(ndims * (2^ndims + ndims + log2(gridsize))) |
| multicubic::regular | O(ndims) | O(4^ndims) | O(4^ndims) | O(4^ndims) |
| multicubic::rectilinear | O(ndims) | O(4^ndims) | O(4^ndims) + ndims * log2(gridsize) | O(4^ndims) + ndims * log2(gridsize) |
Rectilinear methods perform a bisection search to find the relevant grid cell, which takes
a worst-case number of iterations of log2(number of grid elements).

| Method | RAM | Interp. / Extrap. Cost |
|-------------------------------|-----------|------------------------------|
| multilinear::regular | O(ndims) | O(2^ndims) |
| multilinear::rectilinear | O(ndims) | O(2^ndims) + log2(gridsize) |
| multicubic::regular | O(ndims) | O(4^ndims) |
| multicubic::rectilinear | O(ndims) | O(4^ndims) + log2(gridsize) |

# Example: Multilinear and Multicubic w/ Regular Grid
```rust
Expand Down
15 changes: 9 additions & 6 deletions interpn/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,15 @@
//! Similar to the linear methods, depending on implementation, the constant term can vary by orders of magnitude,
//! as can the RAM usage.
//!
//! | Method | RAM | Interp. Cost (Best Case) | Interp. Cost (Worst Case) | Extrap. Cost (Worst Case) |
//! |-------------------------------|-----------|--------------------------|-----------------------------------------|------------------------------------------------|
//! | multilinear::regular | O(ndims) | O(2^ndims * ndims) | O(2^ndims * ndims) | O(2^ndims + ndims^2) |
//! | multilinear::rectilinear | O(ndims) | O(2^ndims * ndims) | O(ndims * (2^ndims + log2(gridsize))) | O(ndims * (2^ndims + ndims + log2(gridsize))) |
//! | multicubic::regular | O(ndims) | O(4^ndims) | O(4^ndims) | O(4^ndims) |
//! | multicubic::rectilinear | O(ndims) | O(4^ndims) | O(4^ndims) + ndims * log2(gridsize) | O(4^ndims) + ndims * log2(gridsize) |
//! Rectilinear methods perform a bisection search to find the relevant grid cell, which takes
//! a worst-case number of iterations of log2(number of grid elements).
//!
//! | Method | RAM | Interp. / Extrap. Cost |
//! |-------------------------------|-----------|------------------------------|
//! | multilinear::regular | O(ndims) | O(2^ndims) |
//! | multilinear::rectilinear | O(ndims) | O(2^ndims) + log2(gridsize) |
//! | multicubic::regular | O(ndims) | O(4^ndims) |
//! | multicubic::rectilinear | O(ndims) | O(4^ndims) + log2(gridsize) |
//!
//! # Example: Multilinear and Multicubic w/ Regular Grid
//! ```rust
Expand Down
5 changes: 3 additions & 2 deletions interpn/src/multicubic/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,9 @@
//!
//! For interior points, this method gives the same result as
//! a Hermite spline interpolation, with continuous first derivatives
//! across the cell boundaries. Under extrapolation, the higher-order
//! terms are dropped and linear extrapolation is used.
//! across the cell boundaries. Under extrapolation, either quadratic
//! or linear extrapolation is used depending on configuration, and
//! both maintain continuous first derivatives everywhere.
//!
//! The solution on interior points is notably different from B-splines,
//! where the exact values of the first and second derivatives are taken
Expand Down
114 changes: 44 additions & 70 deletions interpn/src/multicubic/rectilinear.rs
Original file line number Diff line number Diff line change
@@ -1,47 +1,5 @@
//! An arbitrary-dimensional multicubic interpolator / extrapolator on a rectilinear grid.
//!
//! On interior points, a hermite spline is used, with the derivative at each
//! grid point matched to a second-order central difference. This allows the
//! interpolant to reproduce a quadratic function exactly, and to approximate
//! others with minimal overshoot and wobble.
//!
//! At the grid boundary, a natural spline boundary condition is applied,
//! meaning the third derivative of the interpolant is constrainted to zero
//! at the last grid point, with the result that the interpolant is quadratic
//! on the last interval before the boundary.
//!
//! With "linearize_extrapolation" set, extrapolation is linear on the extrapolated
//! dimensions, holding the same derivative as the natural boundary condition produces
//! at the last grid point. Otherwise, the last grid cell's spline function is continued,
//! producing a quadratic extrapolation.
//!
//! This effectively gives a gradual decrease in the order of the interpolant
//! as the observation point approaches then leaves the grid:
//!
//! out out
//! ---|---|---|---|---|---|--- Grid
//! 2 2 3 3 3 2 2 Order of interpolant between grid points
//! 1 1 Extrapolation with linearize_extrapolation
//!
//! Assumes C-style ordering of vals (z(x0, y0), z(x0, y1), ..., z(x0, yn), z(x1, y0), ...).
//!
//! Operation Complexity
//! * O(4^ndims) for interpolation and extrapolation in all regions.
//!
//! Memory Complexity
//! * Peak stack usage is O(MAXDIMS), which is minimally O(ndims).
//! * While evaluation is recursive, the recursion has constant
//! max depth of MAXDIMS, which provides a guarantee on peak
//! memory usage.
//!
//! Timing
//! * Timing determinism very tight, but is not exact due to the
//! differences in calculations (but not complexity) between
//! interpolation and extrapolation.
//! * An interpolation-only variant of this algorithm could achieve
//! near-deterministic timing, but would produce incorrect results
//! when evaluated at off-grid points.
//!
//! ```rust
//! use interpn::multicubic::rectilinear;
//!
Expand All @@ -63,16 +21,58 @@
//! // Storage for output
//! let mut out = [0.0; 2];
//!
//! // Do interpolation
//! // Do interpolation, allocating for the output for convenience
//! let linearize_extrapolation = false;
//! rectilinear::interpn(grids, &z, linearize_extrapolation, &obs, &mut out).unwrap();
//! rectilinear::interpn_alloc(grids, &z, linearize_extrapolation, &obs).unwrap();
//! ```
//!
//! References
//! * A. E. P. Veldman and K. Rinzema, “Playing with nonuniform grids”.
//! https://pure.rug.nl/ws/portalfiles/portal/3332271/1992JEngMathVeldman.pdf
use num_traits::Float;

/// Evaluate multicubic interpolation on a regular grid in up to 8 dimensions.
/// Assumes C-style ordering of vals (z(x0, y0), z(x0, y1), ..., z(x0, yn), z(x1, y0), ...).
///
/// This is a convenience function; best performance will be achieved by using the exact right
/// number for the MAXDIMS parameter, as this will slightly reduce compute and storage overhead,
/// and the underlying method can be extended to more than this function's limit of 8 dimensions.
/// The limit of 8 dimensions was chosen for no more specific reason than to reduce unit test times.
///
/// While this method initializes the interpolator struct on every call, the overhead of doing this
/// is minimal even when using it to evaluate one observation point at a time.
#[inline(always)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably don't want inline(always) here. In fact inline(always) is probably best avoided pretty much always.

Inlining in LLVM happens "leaves-first", which is backwards from how most of our intuition would expect it to happen. The LLVM inliner on its own would probably decide to inline the ::new() call and the .interp() call into this function.

Then the #[inline(always)] will force it to inline this function into all call sites, and you'll end up with the contents of MulticubicRectilinear::new() and MulticubicRectilinear::interp() inlined all over the place separately. That's probably not desirable.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - each instance of inlining directives in here was originally checked in criterion benchmarks, but I'm finding lately that the benchmarks end up compiled somewhat differently from downstream usages. I'll check how it looks in end-to-end benchmarks via the python bindings with more reasonable inlining annotations.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The perf gains from gratuitous inlining do appear to be an artifact of benchmarking. This snip (benchmarks running after switching from less-inlined version of both linear calcs back to unmodified) would indicate a significant perf regression for the specific case of evaluating one point at a time, which is an important case for differentiation:

image

However, end-to-end benchmarks through the python bindings do not show any significant degradation for the case of 1 observation point:

image

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rectilinear method doesn't show any significant change here, possibly because the core library bisection search is fairly low in the inlining tree, and is marked inline(never)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No identifiable change in end-to-end benchmarks in python after removing all inlining annotations, even for the single-observation-point cases that see 2x slowdown in rust benchmarks. This should help embedded usages as well, since opt-level=s builds will behave more as expected

image

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: monomorphization bloat - definitely a thing that I think about. In this case, the only generics are on the data type and max dimensionality. The only types in common use that implement Float are f32 and f64 in core, plus f16 and f128 available in crates, and I don't anticipate seeing many uses of MAXDIMS other than the default of 8 since that's already an unreasonably large number of dimensions and doesn't affect perf noticeably.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think #[inline] is fine if you have leaf code you expect to be hot. It's really just #[inline(always)] especially on large non-leaf functions that ends up being problematic.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - in this case, having just #[inline] doesn't produce any change in performance (even in the hypersensitive micro benchmarks), probably because the hot leaf functions are already small enough that they get inlined without adjusting the weights. So, might as well remove the manual annotations and just leave it to the compiler

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this a little more - even in an opt-level=s build, you'd still want those functions to inline because they're tiny and the effect on perf would be really punishing if they didn't get inlined. So I'll sprinkle a few #[inline] back in there just in case

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sprinkled some #[inline] on the chain of functions at the very bottom of the recursion for each interpolator type. No effect on benchmarks at all at opt-level=3, but might help keep something silly from happening in a different build someday

pub fn interpn<T: Float>(
grids: &[&[T]],
vals: &[T],
linearize_extrapolation: bool,
obs: &[&[T]],
out: &mut [T],
) -> Result<(), &'static str> {
MulticubicRectilinear::<'_, T, 8>::new(grids, vals, linearize_extrapolation)?
.interp(obs, out)?;
Ok(())
}

/// Evaluate interpolant, allocating a new Vec for the output.
///
/// For best results, use the `interpn` function with preallocated output;
/// allocation has a significant performance cost, and should be used sparingly.
#[cfg(feature = "std")]
pub fn interpn_alloc<T: Float>(
grids: &[&[T]],
vals: &[T],
linearize_extrapolation: bool,
obs: &[&[T]],
) -> Result<Vec<T>, &'static str> {
let mut out = vec![T::zero(); obs[0].len()];
interpn(grids, vals, linearize_extrapolation, obs, &mut out)?;
Ok(out)
}

// We can use the same rectilinear-grid method again
pub use crate::multilinear::rectilinear::check_bounds;

#[derive(Clone, Copy, PartialEq)]
enum Saturation {
None,
Expand Down Expand Up @@ -552,32 +552,6 @@ fn index_arr<T: Copy>(loc: &[usize], dimprod: &[usize], data: &[T]) -> T {
data[i]
}

/// Evaluate multicubic interpolation on a regular grid in up to 8 dimensions.
/// Assumes C-style ordering of vals (z(x0, y0), z(x0, y1), ..., z(x0, yn), z(x1, y0), ...).
///
/// This is a convenience function; best performance will be achieved by using the exact right
/// number for the MAXDIMS parameter, as this will slightly reduce compute and storage overhead,
/// and the underlying method can be extended to more than this function's limit of 8 dimensions.
/// The limit of 8 dimensions was chosen for no more specific reason than to reduce unit test times.
///
/// While this method initializes the interpolator struct on every call, the overhead of doing this
/// is minimal even when using it to evaluate one observation point at a time.
#[inline(always)]
pub fn interpn<T: Float>(
grids: &[&[T]],
vals: &[T],
linearize_extrapolation: bool,
obs: &[&[T]],
out: &mut [T],
) -> Result<(), &'static str> {
MulticubicRectilinear::<'_, T, 8>::new(grids, vals, linearize_extrapolation)?
.interp(obs, out)?;
Ok(())
}

// We can use the same rectilinear-grid method again
pub use crate::multilinear::rectilinear::check_bounds;

#[cfg(test)]
mod test {
use super::interpn;
Expand Down
Loading
Loading