Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Distribution: MvDiscreteNonParametric #1424

Open
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

davibarreira
Copy link
Contributor

I've created a new distribution which implements the DiscreteNonParametric distribution to the multivariate case.
The distribution followed the guidelines for new distributions (one can sample, calculate mean, var, cov, etc).
If this PR is accepted, I'll submit another PR with documentation.

Highlights:

  • mvdiscretenonparametric.jl - Has the implementation of the new MvDiscreteNonParametric distribution;
  • Dependency on ArraysOfArrays - MvDiscreteNonParametric uses ArraysOfArarys in order to properly work with vector of vectors, where each vector is a "sample" (point-mass).

The new distribution does not have a cdf yet. I'm working on a more efficient way to perform the computations instead of naively sorting and comparing everything.

@@ -0,0 +1,153 @@
struct MvDiscreteNonParametric{T <: Real,P <: Real,Ts <: ArrayOfSimilarArrays{T},Ps <: AbstractVector{P}} <: DiscreteMultivariateDistribution
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this more general and change it to something like

Suggested change
struct MvDiscreteNonParametric{T <: Real,P <: Real,Ts <: ArrayOfSimilarArrays{T},Ps <: AbstractVector{P}} <: DiscreteMultivariateDistribution
struct GeneralDiscreteNonParametric{VF,T,P<:Real,Ts<:AbstractVector{T},Ps<:AbstractVector{P}} <: Distribution{VF,Discrete}

? Then it could also be used instead of DiscreteNonParametric for the univariate case and we could define the alias

const DiscreteNonParametric{T<:Real,P<:Real,Ts<:AbstractVector{T},Ps<:AbstractVector{P}} = GeneralDiscreteNonParametric{Univariate,T,P,Ts,Ps}

to make it a non-breaking change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And one could also define the alias

Suggested change
struct MvDiscreteNonParametric{T <: Real,P <: Real,Ts <: ArrayOfSimilarArrays{T},Ps <: AbstractVector{P}} <: DiscreteMultivariateDistribution
const MvDiscreteNonParametric{T<:AbstractVector{<:Real},P<:Real,Ts<:AbstractVector{T},Ps<:AbstractVector{P}} = GeneralDiscreteNonParametric{Multivariate,T,P,Ts,Ps}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand on how the GeneralDiscreteNonParametric shoudl work? It's not clear to me how this VF work, for example, I tried doing the following:

struct GeneralDiscreteNonParametric{VF,T,P<:Real,Ts<:AbstractVector{T},Ps<:AbstractVector{P}} <: Distribution{VF,Discrete}

    support::Ts
    p::Ps

    function GeneralDiscreteNonParametric{VF,T,P,Ts,Ps}(support::Ts,
        p::Ps) where {T,P <: Real,Ts <: AbstractVector{<:AbstractVector{T}},Ps <: AbstractVector{P}}

        length(support) == length(p) || error("length of `support` and `p` must be equal")
        isprobvec(p) || error("`p` must be a probability vector")
        allunique(support) || error("`support` must contain only unique value")
        new{VF,T,P,Ts,Ps}(support, p)
    end
end

But of course I get an error, since VF is not defined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, should the GeneralDiscreteNonParametric dispatch to MvDiscreteNonParametric and DiscreteNonParametric? Would something like CategoricalDistributions.jl be inside GeneralDiscreteNonParametric?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devmotion , should I leave the functions for MvDiscreteNonParametric or should I drop it and just do everything inside the GeneralMvDiscreteNonParametric?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But of course I get an error, since VF is not defined.

You have to provide it, like

struct GeneralDiscreteNonParametric{VF,T,P<:Real,Ts<:AbstractVector{T},Ps<:AbstractVector{P}} <: Distribution{VF,Discrete}
    support::Ts
    p::Ps

    function GeneralDiscreteNonParametric{VF,T,P,Ts,Ps}(support::Ts,
        p::Ps; check_args=true) where {VF,T,P<: Real,Ts<:AbstractVector{T},Ps<:AbstractVector{P}}
        if check_args
            length(support) == length(p) || error("length of `support` and `p` must be equal")
            isprobvec(p) || error("`p` must be a probability vector")
            allunique(support) || error("`support` must contain only unique values")
        end
        new{VF,T,P,Ts,Ps}(support, p)
    end
end

Then you could define a univariate discrete non-parametric distribution with GeneralDiscreteNonParametric{Univariate,Float64,Float64,Vector{Float64},Vector{Float64}}([-10.0, -3.4, 2.1, 3.5], [0.4, 0.2, 0.1, 0.1]). But, of course, users shouldn't have to do this. Therefore we could use the aliases (also for the multivariate case) and define outer constructors such as

function DiscreteNonParametric(support::AbstractVector{<:Real}, p::AbstractVector{<:Real}; issorted=false)
    _p = issorted ? p : sort(p)
    return DiscreteNonParametric{eltype(support),eltype(_p),typeof(support),typeof(_p)}(support, _p)
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides Multivariate and Univariate, what kind of distribution you intend to keep under GeneralDiscreteNonParametric? Categorical Distributions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are already a type alias of DiscreteNonParametric, so if DiscreteNonParametric works they should just work automatically.

Comment on lines 32 to 33
for i in Base.OneTo(size(m, 2))
_rand!(rng, smp, view(m, :, i))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you revert the unrelated changes?

Project.toml Outdated
@@ -4,6 +4,7 @@ authors = ["JuliaStats"]
version = "0.25.29"

[deps]
ArraysOfArrays = "65a8f2f4-9b39-5baf-92e2-a9cc46fdf018"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the implementation has to or should use ArraysOfArrays explicitly - I guess you can just work with generic AbstractVector{<:AbstractVector}, and then one can e.g. provide such Array wrappers from ArraysOfArrays if desired.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. But then there are some significant adjustments I have to make, cause I've relied on ArraysOfArrrays to do stuff like convert matrices to vector of vectors. What do you suggest in order to dispatch them?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should remove the matrix dispatch (even regardless of ArraysOfArrays) as data can be organized in rows or columns - but this is not clear if just a matrix is provided. I think users should just use eachrow/eachcol (in a future Julia version or with Compat: JuliaLang/julia#32310), ArraysOfArrays or ColVecs/RowVecs in KernelFunctions (I wanted to wait with extracting them a bit since I was hoping for the Julia PR).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll drop the matrix dispatch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that computations of weighted mean, var and cov depend on ArraysOfArrays. Do you know a workaround?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They all work for vectors of vectors (of course, ArraysOfArrays, ColVecs or EachCol may implement more optimized versions of mean etc., but the default fallback works always):

julia> using Statistics

julia> x = [rand(5) for _ in 1:5];

julia> mean(x)
5-element Vector{Float64}:
 0.2609279080845831
 0.5184741034528969
 0.523898105188404
 0.6408166253497452
 0.33381236831836236

julia> var(x)
5-element Vector{Float64}:
 0.019751971071567172
 0.07655721742732527
 0.06221826760951234
 0.022761714823869204
 0.10639711437483416

julia> cov(x)
5×5 Matrix{Float64}:
  0.019752    -0.0182541  -0.00331084   0.0142255  -0.0231928
 -0.0182541    0.0765572  -0.0500437   -0.0402498   0.0677088
 -0.00331084  -0.0500437   0.0622183    0.0214676  -0.0425375
  0.0142255   -0.0402498   0.0214676    0.0227617  -0.0405306
 -0.0231928    0.0677088  -0.0425375   -0.0405306   0.106397

Copy link
Contributor Author

@davibarreira davibarreira Nov 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the weighted version. I ended up writing a for-loop. Don't know if it's the most efficient way.

mean(x, weights(p))

does not work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I didn't realize that you need the weighted ones. They are defined in StatsBase, there are some issues about exactly the vector of vectors case: JuliaStats/StatsBase.jl#534 and JuliaStats/StatsBase.jl#518

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is what I came up with:

function cov(d::MvDiscreteNonParametric)
    x = hcat(support(d)...)
    p = probs(d)
    return cov(x, Weights(p, one(eltype(p))), 2,corrected = false)
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devmotion , are you fine with this implementation?

@codecov-commenter
Copy link

codecov-commenter commented Nov 16, 2021

Codecov Report

Merging #1424 (288401b) into master (7670bed) will decrease coverage by 0.51%.
The diff coverage is 89.47%.

❗ Current head 288401b differs from pull request most recent head eeba4f2. Consider uploading reports for the commit eeba4f2 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1424      +/-   ##
==========================================
- Coverage   84.42%   83.91%   -0.52%     
==========================================
  Files         124      123       -1     
  Lines        7456     7038     -418     
==========================================
- Hits         6295     5906     -389     
+ Misses       1161     1132      -29     
Impacted Files Coverage Δ
src/multivariates.jl 70.66% <45.45%> (+32.20%) ⬆️
src/multivariate/mvdiscretenonparametric.jl 100.00% <100.00%> (ø)
src/samplers/multinomial.jl 51.72% <0.00%> (-32.90%) ⬇️
src/matrixvariates.jl 76.81% <0.00%> (-18.43%) ⬇️
src/common.jl 66.66% <0.00%> (-12.93%) ⬇️
src/utils.jl 90.47% <0.00%> (-9.53%) ⬇️
src/multivariate/dirichlet.jl 64.89% <0.00%> (-7.17%) ⬇️
src/univariate/continuous/studentizedrange.jl 72.72% <0.00%> (-2.28%) ⬇️
src/univariate/continuous/epanechnikov.jl 58.33% <0.00%> (-2.20%) ⬇️
src/univariate/continuous/noncentralchisq.jl 78.94% <0.00%> (-2.01%) ⬇️
... and 90 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7670bed...eeba4f2. Read the comment docs.

"""
Base.size(d::MvDiscreteNonParametric) = size(flatview(d.support)')

function _rand!(rng::AbstractRNG, d::MvDiscreteNonParametric, x::AbstractVector{T}) where T <: Real
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is basically the same as the univariate implementation, it would be good to merge them.

@davibarreira
Copy link
Contributor Author

@devmotion, can I run the format from the terminal here in the same way we do in OptimalTransport.jl? My editor sometimes "auto-formats" stuff -.- , so I end up messing the previous formatting.

@devmotion
Copy link
Member

No, we don't use a specific style here (yet). (BTW in general running the JuliaFormatter won't necessarily revert all of your changes.)

@davibarreira
Copy link
Contributor Author

No, we don't use a specific style here (yet). (BTW in general running the JuliaFormatter won't necessarily revert all of your changes.)

Thanks. I ended up restoring the originals and doing the changes again, avoiding the auto-formatting.

@davibarreira
Copy link
Contributor Author

@devmotion , I've implemented some of your suggestions (I still have to adjust DiscreteNonParametric to be used with GeneralNonParametric), but I'm having a hard time to get the sampler to work for MvDiscreteNonParametric. The rand!() function simply does not work. I mean, when I do:

x = collect(eachrow(rand(10,3)))
p = normalize!(rand(10),1)
μ = MvDiscreteNonParametric(x,p)
rand(μ)

it's working. But if I do

rand(μ,1)

I get instead:

3×2 Matrix{SubArray{Float64, 1, Matrix{Float64}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}}, true}}:
 [0.0969373, 0.623776, 0.447577]  [0.824966, 0.174585, 0.805307]
 [0.824966, 0.174585, 0.805307]   [0.0969373, 0.623776, 0.447577]
 [0.562016, 0.439165, 0.146882]   [0.562016, 0.439165, 0.146882]

@devmotion
Copy link
Member

The eltype definition is incorrect. It should return the element type of the samples for arrayvariate distributions (ie., e.g., Float64 if the samples are AbstractVector{Float64}) but currently in the multivariate case you define it as a vector.

@davibarreira
Copy link
Contributor Author

I've modified the code to

Base.eltype(::Type{<:MvDiscreteNonParametric{T}}) where T = Base.eltype(T)

Now, this returns for example Float64, but the rand(mu, 1) now does not work. I get an errror:
DimensionMismatch("inconsistent array dimensions")

Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definition of size is not correct, it can just be removed.



Base.length(d::MvDiscreteNonParametric) = length(first(d.support))
Base.size(d::MvDiscreteNonParametric) = (length(d), length(d.support))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong and defined automatically based on length:

Suggested change
Base.size(d::MvDiscreteNonParametric) = (length(d), length(d.support))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done. But I'm still getting the erro :/, but instead it's:

StackOverflowError:

Stacktrace:
     [1] rand!(rng::Random._GLOBAL_RNG, s::MvDiscreteNonParametric{SubArray{Float64, 1, Matrix{Float64}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}}, true}, Float64, Vector{SubArray{Float64, 1, Matrix{Float64}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}}, true}}, Vector{Float64}}, x::SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true})
       @ Distributions ~/MEGA/EMAp/Distributions.jl/src/genericrand.jl:83
     [2] _rand!(rng::Random._GLOBAL_RNG, s::MvDiscreteNonParametric{SubArray{Float64, 1, Matrix{Float64}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}}, true}, Float64, Vector{SubArray{Float64, 1, Matrix{Float64}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}}, true}}, Vector{Float64}}, x::SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true})
       @ Distributions ~/MEGA/EMAp/Distributions.jl/src/genericrand.jl:117
     [3] rand!(rng::Random._GLOBAL_RNG, s::MvDiscreteNonParametric{SubArray{Float64, 1, Matrix{Float64}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}}, true}, Float64, Vector{SubArray{Float64, 1, Matrix{Float64}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}}, true}}, Vector{Float64}}, x::SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true})
       @ Distributions ~/MEGA/EMAp/Distributions.jl/src/genericrand.jl:91
--- the last 2 lines are repeated 14426 more times ---
 [28856] _rand!(rng::Random._GLOBAL_RNG, s::MvDiscreteNonParametric{SubArray{Float64, 1, Matrix{Float64}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}}, true}, Float64, Vector{SubArray{Float64, 1, Matrix{Float64}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}}, true}}, Vector{Float64}}, x::SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true})
       @ Distributions ~/MEGA/EMAp/Distributions.jl/src/genericrand.jl:117
 [28857] rand!
       @ ~/MEGA/EMAp/Distributions.jl/src/genericrand.jl:91 [inlined]
 [28858] _rand!(rng::Random._GLOBAL_RNG, s::MvDiscreteNonParametric{SubArray{Float64, 1, Matrix{Float64}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}}, true}, Float64, Vector{SubArray{Float64, 1, Matrix{Float64}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}}, true}}, Vector{Float64}}, x::Matrix{Float64})
       @ Distributions ~/MEGA/EMAp/Distributions.jl/src/genericrand.jl:117
 [28859] rand!
       @ ~/MEGA/EMAp/Distributions.jl/src/genericrand.jl:108 [inlined]
 [28860] rand
       @ ~/MEGA/EMAp/Distributions.jl/src/multivariates.jl:21 [inlined]
 [28861] rand(s::MvDiscreteNonParametric{SubArray{Float64, 1, Matrix{Float64}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}}, true}, Float64, Vector{SubArray{Float64, 1, Matrix{Float64}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}}, true}}, Vector{Float64}}, dims::Int64)
       @ Distributions ~/MEGA/EMAp/Distributions.jl/src/genericrand.jl:22
 [28862] top-level scope
       @ In[31]:1
 [28863] eval
       @ ./boot.jl:373 [inlined]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What did you run? And did you use the code in this PR with only size removed? Or did you make any additional changes?

Copy link
Contributor Author

@davibarreira davibarreira Dec 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had added some changes. The _rand! was missing. I added it back, and it worked. Tks

@davibarreira
Copy link
Contributor Author

davibarreira commented Dec 21, 2021

@devmotion , this new commit has possibly all your corrections with the exception of incorporating the DiscreteNonParametric inside the GeneralNonParametric. The MvDiscreteNonParametric should be working. Should we do the modifications of DiscreteNonParametric in this PR?

@@ -98,6 +98,7 @@ export
FullNormalCanon,
Gamma,
DiscreteNonParametric,
GeneralDiscreteNonParametric,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe let's keep it unexported until everything is figured out?

Suggested change
GeneralDiscreteNonParametric,

Comment on lines +20 to +31
function rand(rng::AbstractRNG, d::GeneralDiscreteNonParametric)
x = support(d)
p = probs(d)
n = length(p)
draw = rand(rng, float(eltype(p)))
cp = p[1]
i = 1
while cp <= draw && i < n
@inbounds cp += p[i +=1]
end
return x[i]
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this method is correct, it messes with the rand dispatches. Maybe either change it to the univariate case only (i.e., use ::GeneralDiscreteNonParametric{Univariate})

Suggested change
function rand(rng::AbstractRNG, d::GeneralDiscreteNonParametric)
x = support(d)
p = probs(d)
n = length(p)
draw = rand(rng, float(eltype(p)))
cp = p[1]
i = 1
while cp <= draw && i < n
@inbounds cp += p[i +=1]
end
return x[i]
end
function rand(rng::AbstractRNG, d::GeneralDiscreteNonParametric{Univariate})
x = support(d)
p = probs(d)
n = length(p)
draw = rand(rng, float(eltype(p)))
cp = p[1]
i = 1
while cp <= draw && i < n
@inbounds cp += p[i +=1]
end
return x[i]
end

or let's remove it for now and add it in a follow-up PR that updates DiscreteNonParametric:

Suggested change
function rand(rng::AbstractRNG, d::GeneralDiscreteNonParametric)
x = support(d)
p = probs(d)
n = length(p)
draw = rand(rng, float(eltype(p)))
cp = p[1]
i = 1
while cp <= draw && i < n
@inbounds cp += p[i +=1]
end
return x[i]
end

Comment on lines +33 to +36
"""
support(d::MvDiscreteNonParametric)
Get a sorted AbstractVector defining the support of `d`.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for a docstring, support is already documented (and I don't think it is sorted here actually?)

Suggested change
"""
support(d::MvDiscreteNonParametric)
Get a sorted AbstractVector defining the support of `d`.
"""

Comment on lines +40 to +41
probs(d::MvDiscreteNonParametric)
Get the vector of probabilities associated with the support of `d`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
probs(d::MvDiscreteNonParametric)
Get the vector of probabilities associated with the support of `d`.
probs(d::GeneralDiscreteNonParametric)
Return the vector of probabilities associated with the support of `d`.

Comment on lines +45 to +46

Base.length(d::GeneralDiscreteNonParametric) = length(first(d.support))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

length should be defined for the multivariate case only, for other cases we need size:

Suggested change
Base.length(d::GeneralDiscreteNonParametric) = length(first(d.support))
Base.length(d::GeneralDiscreteNonParametric{Multivariate}) = length(first(d.support))
Base.size(d::GeneralDiscreteNonParametric{<:ArrayLikeVariate}) = size(first(d.support))

IMO it is better to be a bit conservative here and only define it for ArrayLikeVariates.

```
"""
function MvDiscreteNonParametric(
support::AbstractArray{<:AbstractVector{<:Real}},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be

Suggested change
support::AbstractArray{<:AbstractVector{<:Real}},
support::AbstractVector{<:AbstractVector{<:Real}},

in the multivariate case, and

Suggested change
support::AbstractArray{<:AbstractVector{<:Real}},
support::AbstractVector{<:AbstractArray{<:Real,N}},

only in the more general case of ArrayLikeVariate{N}.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I just use the second option and change the function to GeneralDiscreteNonParametric?

"""
function MvDiscreteNonParametric(
support::AbstractArray{<:AbstractVector{<:Real}},
p::AbstractVector{<:Real} = fill(inv(length(support)), length(support)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we should add a default definition for p. In any case, we should use the non-allocating Fill instead.

Suggested change
p::AbstractVector{<:Real} = fill(inv(length(support)), length(support)),
p::AbstractVector{<:Real} = Fill(inv(length(support)), length(support)),

)
end

Base.eltype(::Type{<:MvDiscreteNonParametric{T}}) where T = Base.eltype(T)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be defined more generally:

Suggested change
Base.eltype(::Type{<:MvDiscreteNonParametric{T}}) where T = Base.eltype(T)
Base.eltype(::Type{<:GeneralDiscreteNonParametric{<:ArrayLikeVariate,T}}) where T = eltype(T)

Comment on lines +41 to +55
function mean(d::MvDiscreteNonParametric)
return StatsBase.mean(hcat(d.support...), Weights(d.p, one(eltype(d.p))),dims=2)
end

function var(d::MvDiscreteNonParametric)
x = hcat(support(d)...)
p = probs(d)
return StatsBase.var(x, Weights(p, one(eltype(p))), 2,corrected = false)
end

function cov(d::MvDiscreteNonParametric)
x = hcat(support(d)...)
p = probs(d)
return cov(x, Weights(p, one(eltype(p))), 2,corrected = false)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I requested to add support for vectors of vectors to StatsBase but it might take a bit until it is available. IMO it is fine to have possibly less optimized implementations until then. However, we should avoid splatting and use

Suggested change
function mean(d::MvDiscreteNonParametric)
return StatsBase.mean(hcat(d.support...), Weights(d.p, one(eltype(d.p))),dims=2)
end
function var(d::MvDiscreteNonParametric)
x = hcat(support(d)...)
p = probs(d)
return StatsBase.var(x, Weights(p, one(eltype(p))), 2,corrected = false)
end
function cov(d::MvDiscreteNonParametric)
x = hcat(support(d)...)
p = probs(d)
return cov(x, Weights(p, one(eltype(p))), 2,corrected = false)
end
function mean(d::MvDiscreteNonParametric)
return mean(reduce(hcat, d.support), Weights(d.p, one(eltype(d.p))); dims=2)
end
function var(d::MvDiscreteNonParametric)
x = reduce(hcat, support(d))
p = probs(d)
return var(x, Weights(p, one(eltype(p))), 2; corrected = false)
end
function cov(d::MvDiscreteNonParametric)
x = reduce(hcat, support(d))
p = probs(d)
return cov(x, Weights(p, one(eltype(p))), 2; corrected = false)
end

Comment on lines -72 to -84
function rand(rng::AbstractRNG, d::DiscreteNonParametric)
x = support(d)
p = probs(d)
n = length(p)
draw = rand(rng, float(eltype(p)))
cp = p[1]
i = 1
while cp <= draw && i < n
@inbounds cp += p[i +=1]
end
return x[i]
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this was removed accidentally? I guess it would be cleaner to not make any changes to DiscreteNonParametric in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. my bad

@davibarreira
Copy link
Contributor Author

@juliohm , this might interest you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants