-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weighted KDE #26
Weighted KDE #26
Conversation
src/univariate.jl
Outdated
end | ||
end | ||
|
||
# returns an un-convolved KDE | ||
UnivariateKDE(midpoints, grid) | ||
end | ||
|
||
function tabulate(data::RealVector, midpoints::Range) | ||
weights = ones(data) | ||
tabulate(data, weights, midpoints) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could do something like tabulate(data, midpoints; weights=ones(data))
, which makes the weighting more apparent (since it's a keyword argument) and removes the need for two separate methods like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you say below, better use the WeightsVec
type than a keyword argument for consistency.
Since this package uses StatsBase, I'd recommend setting up the weighting using |
To avoid allocating a vector of ones for unit weights, we could use a EDIT: you could also add |
npoints = length(midpoints) | ||
s = step(midpoints) | ||
|
||
# Set up a grid for discretized data | ||
grid = zeros(Float64, npoints) | ||
ainc = 1.0 / (ndata*s*s) | ||
ainc = 1.0 / (sum(weights)*s*s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you switch to WeightsVec
, you can use its sum
field to avoid recomputing its sum.
Could you also add tests? |
@axsk Are you planning to finish this PR? I also need to to calculate a weighted KDE. |
@lstagner @ararslan @nalimilan |
@axsk I got a little impatient and implemented it myself. It wasn't super difficult. I also did the bivariate case. Heres the code |
So how do we proceed? :) While I like using |
src/univariate.jl
Outdated
|
||
typealias Weights Union{UniformWeights, RealVector, WeightVec} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could have the length be a part of the type parameter so that its slightly more efficient.
type UniformWeights{N} end
UniformWeights(N) = UniformWeights{N}()
Base.sum(x::UniformWeights) = 1.0
Base.getindex{N}(x::UniformWeights{N}, i) = 1/N
Base.length{N}(x::UniformWeights{N}) = N
values{N}(x::UniformWeights{N}) = fill(x[1], N)
eltype(x::UniformWeights) = eltype(x[1])
isempty(x::UniformWeights) = false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
I think its all done, but the tests. |
Bump |
I still want this to be included. The only thing holding it back are some tests. @axsk perhaps you should just mimic the current tests to get started. |
Also, how about letting weights be negative? That would be useful when dealing with SparseGrids, for example. |
Negative weights, if they are currently possible, will not be possible soon in StatsBase. |
That's fine, and less interesting than just adding weights in the first place. A simple ad hoc means of implementing negative weights is to just divide your data into positive and negative weights, create two KDEs, and then use the sum of the respective weights to take a weighted difference of the pdf evaluations. Not saying this should be included in the package, just that if anyone really wants to make plots or something, it's easy enough to work around. Is adding tests to improve coverage all that remains to be done here? |
getting rid of the 4 covarge misses to fix coverage degress.
Tests are done, coverage improved, should I squash it all to one? |
Now that the tests have been added and all the checks have passed can this please get merged? I use this functionality daily and it would be great if didn't have to use a different branch. |
Bump @nalimilan @simonbyrne |
🎉 Happy Anniversary!!! 🎉 This PR has been open for 1 whole year!!! This is a major accomplishment. It has really beaten the odds. Honestly, I thought it was finished well over a month ago but I was proven wrong. Here's to another year. 🥂 |
Thanks @axsk |
I'm trying to make use of the of the weighted KDE functionality described above. I can't seem to get it to work despite having run the following code, which is cut and paste from test/univariate.jl: using Distributions
using KernelDensity
import KernelDensity: kde_range
r = kde_range((-2.0,2.0), 128)
X = [0.0]
D = Normal
k6 = kde(X,r;kernel=D, weights=ones(X)/length(X))
k1 = kde([0.0, 1.], r, bandwidth=1, weights=[0,1]) I'm getting the following error message:
Any ideas why this isn't working? I've only just started using Julia, so apologies in advance if this is a stupid question |
The weights keyword accepts a The following works.
|
Thanks for the help, but it's still not behaving: `MethodError: no method matching kde(::Array{Float64,1}, ::StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}; kernel=Distributions.Normal, weights=[1.0]) Stacktrace: |
Well it works on 0.5.1 (I haven't updated to 0.6 yet) Edit: Actually I forgot I'm using my own version of this functionality. Edit^2: Your example works on 0.5.1 using this branch (without using Weights as is needed on my branch). |
Also it could be that master hasn't been tagged yet. So try doing |
Yes, there have been no tags since this was merged |
I've just checked out the latest version and it works on Julia 0.6.0. Thanks for the help |
Could we tag this version? |
In a project of mine I need a weighted KDE.
I achieved this by weighting the increase in the tabulate method.
Unfortunately in the current version I need to allocate an array for the weights, which is not necessary with uniform weights. Any suggestions on how to tackle this?