Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add KDE bandwidth selectors using biased or unbiased cross-validation #2384

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

sethaxen
Copy link
Member

@sethaxen sethaxen commented Sep 19, 2024

Description

As discussed on Slack, the existing bandwidth selection methods for kde oversmooth for draws from multimodal distributions with well-separated modes. This PR adds new bw options "ucv" and "bcv", which used unbiased or biased LOO-CV to select the bandwidth.

Example

import numpy as np
import matplotlib.pyplot as plt
import arviz as az
az.style.use('arviz-doc')

rng = np.random.default_rng(123)
x = np.concatenate([rng.normal(0, 1, 1000), rng.normal(60, 1, 1000)])
fig, ax = plt.subplots()
az.plot_kde(x, ax=ax, plot_kwargs={"color": "k"}, label="default")
az.plot_kde(x, ax=ax, bw='ucv', plot_kwargs={"color": "C0"}, label="UCV")
az.plot_kde(x, ax=ax, bw='bcv', plot_kwargs={"color": "C1"}, label="BCV")
ax.set_xlabel("x")
ax.set_ylabel("Density")
ax.legend()

kde

Checklist

  • Does the PR follow official
    PR format?
  • Has included a sample plot to visually illustrate the changes? (only for plot-related functions)
  • Is the new feature properly documented with an example?
  • Does the PR include new or updated tests to cover the new feature (using pytest fixture pattern)?
  • Is the code style correct (follows pylint and black guidelines)?
  • Is the new feature listed in the New features
    section of the changelog?

📚 Documentation preview 📚: https://arviz--2384.org.readthedocs.build/en/2384/

@sethaxen
Copy link
Member Author

sethaxen commented Sep 20, 2024

Here are some example plots showing how KDEs with UCV and BCV bandwidths differ from those with the default bandwidth. In general, both methods seem to work pretty well when the original density does not have bounded support. BCV tends to smooth a little less than the default, while UCV smoothes even less. Personally I would still prefer the default or maybe BCV over UCV in almost every case except for the multimodal one in the OP:
normal_kde
student-t_kde
exponential_kde
lognormal_kde
uniform_kde
beta_kde

In terms of performance, the CV-based bandwidth selection methods are significantly faster than the default, since _bw_isj is quite slow, but probably also because they tend to select lower bandwidths, which allows for fewer convolutions with the kernel.

In [1]: import arviz as az

In [2]: import numpy as np

In [3]: x = np.random.normal(0, 1, 10_000);

In [4]: %time [az.kde(x) for _ in range(1_000)];
CPU times: user 2.52 s, sys: 9.99 ms, total: 2.53 s
Wall time: 2.55 s

In [5]: %time [az.kde(x, bw='scott') for _ in range(1_000)];
CPU times: user 106 ms, sys: 0 ns, total: 106 ms
Wall time: 106 ms

In [6]: %time [az.kde(x, bw='ucv') for _ in range(1_000)];
CPU times: user 544 ms, sys: 228 μs, total: 544 ms
Wall time: 543 ms

In [7]: %time [az.kde(x, bw='bcv') for _ in range(1_000)];
CPU times: user 577 ms, sys: 73 μs, total: 577 ms
Wall time: 577 ms

@sethaxen sethaxen changed the title [WIP] Add KDE bandwidth selectors using biased or unbiased cross-validation Add KDE bandwidth selectors using biased or unbiased cross-validation Sep 20, 2024
@sethaxen sethaxen marked this pull request as ready for review September 20, 2024 19:17
@sethaxen
Copy link
Member Author

Remaining pylint errors seem to be in code not touched in this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant