Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proportionmap accepts iterators #855

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/counts.jl
Original file line number Diff line number Diff line change
Expand Up @@ -450,5 +450,5 @@ Return a dictionary mapping each unique value in `x` to its proportion in `x`.
If a vector of weights `wv` is provided, the proportion of weights is computed rather
than the proportion of raw counts.
"""
proportionmap(x::AbstractArray) = _normalize_countmap(countmap(x), length(x))
proportionmap(x) = _normalize_countmap(countmap(x), length(collect(x)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just count the total number of elements when building the countmap? It seems inefficient to materialize x only to obtain its length if we already iterate through it anyway.

Copy link
Author

@ArunS-tack ArunS-tack Mar 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something around sum(values(countmap(x))? But I think that's memory inefficient even though it doesn't iterate again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I thought counting directly inside of countmap. But probably sum(values, countmap(x)) would still be more efficient than using collect(x) if x is an iterator with a large number of elements.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

julia> @btime proportionmap(skipmissing(a)) 8.625 μs (27 allocations: 146.67 KiB) Dict{Int64, Float64} with 4 entries: 4 => 0.25 2 => 0.25 3 => 0.25 1 => 0.25

julia> @btime proportionmap(skipmissing(a)) 316.667 ns (9 allocations: 1.08 KiB) Dict{Int64, Float64} with 4 entries: 4 => 0.25 2 => 0.25 3 => 0.25 1 => 0.25

Looks like a significant improvement 🧐

proportionmap(x::AbstractArray, wv::AbstractWeights) = _normalize_countmap(countmap(x, wv), sum(wv))
7 changes: 7 additions & 0 deletions test/counts.jl
Original file line number Diff line number Diff line change
Expand Up @@ -209,3 +209,10 @@ if VERSION >= v"1.9.0-DEV"
# countmap and proportionmap only support the :dict algorithm for weighted sums.
end
end

@testset "proportionmap with iterator" begin
a = [1, 2, 3, 4]
b =[true, true ,false, false, true, false]
@test proportionmap(skipmissing(a)) == Dict(1 => 0.25, 2 => 0.25, 3 => 0.25, 4 => 0.25)
@test proportionmap(skipmissing(b)) == Dict(true => 0.5, false => 0.5)
end