Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weighted KDE #26

Merged
merged 12 commits into from
Jun 27, 2017
Merged

Weighted KDE #26

merged 12 commits into from
Jun 27, 2017

Conversation

axsk
Copy link
Contributor

@axsk axsk commented Jun 27, 2016

In a project of mine I need a weighted KDE.
I achieved this by weighting the increase in the tabulate method.

Unfortunately in the current version I need to allocate an array for the weights, which is not necessary with uniform weights. Any suggestions on how to tackle this?

end
end

# returns an un-convolved KDE
UnivariateKDE(midpoints, grid)
end

function tabulate(data::RealVector, midpoints::Range)
weights = ones(data)
tabulate(data, weights, midpoints)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could do something like tabulate(data, midpoints; weights=ones(data)), which makes the weighting more apparent (since it's a keyword argument) and removes the need for two separate methods like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you say below, better use the WeightsVec type than a keyword argument for consistency.

@ararslan
Copy link
Member

ararslan commented Jul 7, 2016

Since this package uses StatsBase, I'd recommend setting up the weighting using WeightVecs from StatsBase, or at least providing another method that uses them. Also, if you're adding weighted methods for univariate, I'd recommend adding similar functionality to bivariate.

@nalimilan
Copy link
Member

nalimilan commented Jul 7, 2016

To avoid allocating a vector of ones for unit weights, we could use a Ones/UnitWeights type as described at JuliaStats/StatsBase.jl#135.

EDIT: you could also add UniformWeights since you appear to need them.

npoints = length(midpoints)
s = step(midpoints)

# Set up a grid for discretized data
grid = zeros(Float64, npoints)
ainc = 1.0 / (ndata*s*s)
ainc = 1.0 / (sum(weights)*s*s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you switch to WeightsVec, you can use its sum field to avoid recomputing its sum.

@nalimilan
Copy link
Member

Could you also add tests?

@axsk axsk mentioned this pull request Jul 8, 2016
@lstagner
Copy link

@axsk Are you planning to finish this PR? I also need to to calculate a weighted KDE.

@axsk
Copy link
Contributor Author

axsk commented Oct 13, 2016

@lstagner
Hey, I would like to finish this PR, but I still don't know how to handle the uniform weights properly.
I could work around the uniform array allocation by creating a new UniformWeight type, doing the job (but I think this belongs rather to StatsBase, maybe the next step :))
Edit: done

@ararslan
I agree that having a bivariate weighted KDE would be nice, but I would like to leave that for someone else so far..
Edit: well, if @lstagner said it wasn't that difficulat I had to try it, went fast :)

@nalimilan
How do you imagine the tests, I can't think of a simple example I can calculate in my head, so basically I could just test runnability.

@lstagner
Copy link

@axsk I got a little impatient and implemented it myself. It wasn't super difficult. I also did the bivariate case.

Heres the code
lstagner@b48fb2d

@axsk
Copy link
Contributor Author

axsk commented Oct 13, 2016

So how do we proceed? :)

While I like using StatsBase.WeightsVec internally, I think the API should still allow plain Vectors.
I will push my suggestion :)


typealias Weights Union{UniformWeights, RealVector, WeightVec}


Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could have the length be a part of the type parameter so that its slightly more efficient.

type UniformWeights{N} end
UniformWeights(N) = UniformWeights{N}()

Base.sum(x::UniformWeights) = 1.0
Base.getindex{N}(x::UniformWeights{N}, i) = 1/N
Base.length{N}(x::UniformWeights{N}) = N
values{N}(x::UniformWeights{N}) = fill(x[1], N)
eltype(x::UniformWeights) = eltype(x[1])
isempty(x::UniformWeights) = false

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@axsk
Copy link
Contributor Author

axsk commented Oct 13, 2016

I think its all done, but the tests.
Any recommendations on those?

@lstagner
Copy link

Bump

@lstagner
Copy link

I still want this to be included. The only thing holding it back are some tests. @axsk perhaps you should just mimic the current tests to get started.

@chriselrod
Copy link
Contributor

Also, how about letting weights be negative?

That would be useful when dealing with SparseGrids, for example.

@ararslan
Copy link
Member

ararslan commented May 4, 2017

Negative weights, if they are currently possible, will not be possible soon in StatsBase.

@chriselrod
Copy link
Contributor

chriselrod commented May 4, 2017

That's fine, and less interesting than just adding weights in the first place.

A simple ad hoc means of implementing negative weights is to just divide your data into positive and negative weights, create two KDEs, and then use the sum of the respective weights to take a weighted difference of the pdf evaluations. Not saying this should be included in the package, just that if anyone really wants to make plots or something, it's easy enough to work around.

Is adding tests to improve coverage all that remains to be done here?

@axsk axsk changed the title RFC: Weighted KDE Weighted KDE May 18, 2017
@axsk
Copy link
Contributor Author

axsk commented May 18, 2017

Tests are done, coverage improved, should I squash it all to one?

@lstagner
Copy link

Now that the tests have been added and all the checks have passed can this please get merged? I use this functionality daily and it would be great if didn't have to use a different branch.

@lstagner
Copy link

lstagner commented Jun 7, 2017

Bump @nalimilan @simonbyrne

@lstagner
Copy link

🎉 Happy Anniversary!!! 🎉

This PR has been open for 1 whole year!!! This is a major accomplishment. It has really beaten the odds. Honestly, I thought it was finished well over a month ago but I was proven wrong. Here's to another year. 🥂

@simonbyrne simonbyrne merged commit 4bcdc23 into JuliaStats:master Jun 27, 2017
@simonbyrne
Copy link
Member

Thanks @axsk

@remoore
Copy link

remoore commented Aug 24, 2017

I'm trying to make use of the of the weighted KDE functionality described above. I can't seem to get it to work despite having run the following code, which is cut and paste from test/univariate.jl:

using Distributions
using KernelDensity

import KernelDensity: kde_range

r = kde_range((-2.0,2.0), 128)
X = [0.0]   
D = Normal
k6 = kde(X,r;kernel=D, weights=ones(X)/length(X))
k1 = kde([0.0, 1.], r, bandwidth=1, weights=[0,1])

I'm getting the following error message:

MethodError: no method matching kde(::Array{Float64,1}, ::StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}; kernel=Distributions.Normal, weights=[1.0])
Closest candidates are:
  kde(::AbstractArray{T,1} where T<:Real, ::Range; bandwidth, kernel) at /Users/remoore/.julia/v0.6/KernelDensity/src/univariate.jl:136 got unsupported keyword argument "weights"
  kde(::AbstractArray{T,1} where T<:Real, ::Range, ::Distributions.Distribution{Distributions.Univariate,S} where S<:Distributions.ValueSupport) at /Users/remoore/.julia/v0.6/KernelDensity/src/univariate.jl:123 got unsupported keyword arguments "kernel", "weights"
  kde(::AbstractArray{T,1} where T<:Real; bandwidth, kernel, npoints, boundary) at /Users/remoore/.julia/v0.6/KernelDensity/src/univariate.jl:143 got unsupported keyword argument "weights"
  ...

Stacktrace:
 [1] (::KernelDensity.#kw##kde)(::Array{Any,1}, ::KernelDensity.#kde, ::Array{Float64,1}, ::StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}) at ./<missing>:0
 [2] include_string(::String, ::String) at ./loading.jl:515

Any ideas why this isn't working?

I've only just started using Julia, so apologies in advance if this is a stupid question

@lstagner
Copy link

The weights keyword accepts a Weights (formally WeightsVec) type from StatsBase

The following works.

using Distributions
using KernelDensity

import KernelDensity: kde_range
import StatsBase: Weights

r = kde_range((-2.0,2.0), 128)
X = [0.0]   
D = Normal
k6 = kde(X,r;kernel=D, weights=Weights(ones(X)/length(X)))
k1 = kde([0.0, 1.], r, bandwidth=1, weights=Weights([0,1]))

@remoore
Copy link

remoore commented Aug 24, 2017

Thanks for the help, but it's still not behaving:

`MethodError: no method matching kde(::Array{Float64,1}, ::StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}; kernel=Distributions.Normal, weights=[1.0])
Closest candidates are:
kde(::AbstractArray{T,1} where T<:Real, ::Range; bandwidth, kernel) at /Users/remoore/.julia/v0.6/KernelDensity/src/univariate.jl:136 got unsupported keyword argument "weights"
kde(::AbstractArray{T,1} where T<:Real, ::Range, ::Distributions.Distribution{Distributions.Univariate,S} where S<:Distributions.ValueSupport) at /Users/remoore/.julia/v0.6/KernelDensity/src/univariate.jl:123 got unsupported keyword arguments "kernel", "weights"
kde(::AbstractArray{T,1} where T<:Real; bandwidth, kernel, npoints, boundary) at /Users/remoore/.julia/v0.6/KernelDensity/src/univariate.jl:143 got unsupported keyword argument "weights"
...

Stacktrace:
[1] (::KernelDensity.#kw##kde)(::Array{Any,1}, ::KernelDensity.#kde, ::Array{Float64,1}, ::StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}) at ./:0
[2] include_string(::String, ::String) at ./loading.jl:515`

@lstagner
Copy link

lstagner commented Aug 24, 2017

Well it works on 0.5.1 (I haven't updated to 0.6 yet)

Edit: Actually I forgot I'm using my own version of this functionality.

Edit^2: Your example works on 0.5.1 using this branch (without using Weights as is needed on my branch).

@lstagner
Copy link

Also it could be that master hasn't been tagged yet. So try doing Pkg.checkout("KernelDensity") to the latest version.

@ararslan
Copy link
Member

Yes, there have been no tags since this was merged

@remoore
Copy link

remoore commented Aug 24, 2017

I've just checked out the latest version and it works on Julia 0.6.0.

Thanks for the help

@axsk
Copy link
Contributor Author

axsk commented Oct 12, 2017

Could we tag this version?

@ararslan
Copy link
Member

JuliaLang/METADATA.jl#11541

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants