Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow numeric Vectors as weights #335

Closed
xiaodaigh opened this issue Jan 1, 2018 · 8 comments
Closed

Allow numeric Vectors as weights #335

xiaodaigh opened this issue Jan 1, 2018 · 8 comments

Comments

@xiaodaigh
Copy link
Contributor

When I first started using countmap I was confused about how to do a count with weights. I thought countmap(x, wgt) would do where wgt is a numeric vector. But actually we need countmap(x, weights(wgt)).

When I looked into the documentation there are 3 types of weights: they are Analytical, Frequency and Probability; these seem to be useful. But before reading the source code I only thought of weights as a numeric weight for each value and I suspect a lot of people would think like that too.

So do you think it's ok to add this signature?
countmap(Vector{T}, Vectr{S}) where {T, S<:Number}

@rofinn
Copy link
Member

rofinn commented Jan 1, 2018

Quickly scanning the code it looks like we could probably just change the existing signatures.

line 355

countmap(x::AbstractArray{T}, wv::AbstractWeights{W}) where {T,W}

to

countmap(x::AbstractArray{T}, wv::AbstractVector{W}) where {T,W<:Real}

line 320

addcounts!(cm::Dict{T}, x::AbstractArray{T}, wv::AbstractWeights{W}) where {T,W}

to

addcounts!(cm::Dict{T}, x::AbstractArray{T}, wv::AbstractVector{W}) where {T,W<:Real}

FWIW, I'm not sure it makes sense to call this method with anything other than FrequencyWeights (or maybe an AbstractVector{Int}).

@xiaodaigh
Copy link
Contributor Author

Ok. I think keeping other signatures are fine. But the one that makes the most sense to me are just a straight weight vector of numerics

@rofinn
Copy link
Member

rofinn commented Jan 2, 2018

Right, but doesn't the domain of the numerics matter (e.g., passing a vector of floating point values doesn't really make sense)?

@xiaodaigh
Copy link
Contributor Author

yeah we associate count a with something a discrete. But i think it's ok have negative and floating point weights. Some may want to use it.

@nalimilan
Copy link
Member

FWIW, I'm not sure it makes sense to call this method with anything other than FrequencyWeights (or maybe an AbstractVector{Int}).

Actually, that's a complex issue. It can make sense to show the sum of analytic or probability weights, even if that's not necessarily an integer number. It's useful in particular to compute proportions. Stata's tabulate command appears to compute the sum of analytic weights (after scaling them to sum to the number of observations). OTOH, it does not appear to support probability weights.

Given that treating weights as frequency weights is more natural in most cases (i.e. for frequency weights, probability weights and when the type isn't specified), I wonder whether it's worth having a special case for analytic weights.

@rofinn
Copy link
Member

rofinn commented Jan 2, 2018

Hmmm, seems weird to support analytic weights and not probability weights, but I'd be fine with special casing them and assuming frequency weights otherwise.

@ssfrr
Copy link

ssfrr commented Jan 14, 2019

Came across this with sample. As an example, I want to sample from x = [1, 2, 3], with weights w = [1, 2, 1], i.e. 2 should be twice as likely as 1 or 3. it would be nice to just do sample(x, w).

As it stands it took a bit of time to look at the docs, understand what the different types of AbstractWeights do, and figure out that I could do sample(x, weights(w)).

@nalimilan
Copy link
Member

See also discussion at #443.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants