Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add skipmissings #111

Merged
merged 8 commits into from
Apr 3, 2020
Merged

Add skipmissings #111

merged 8 commits into from
Apr 3, 2020

Conversation

pdeffebach
Copy link
Contributor

This PR adds functionality for multiskipmissings as discussed in Julia issue 30596 here.

The motivation for this PR was noticing that

cor(x, y)

has no missing value support.

julia> x = allowmissing(rand(10)); y = allowmissing(rand(10));

julia> x[1] = missing; y[2] = missing;

julia> cor(x, y)
missing

In order for this to work, you need to collect the non-missing matching values of x and y. To do that I implement the function multiskipmissing, which returns a Tuple of iterators.

julia> x = [1, 2, missing, 4]; y = [1, 2, 3, missing];
julia> tx, ty = multiskipmissing(x, y)
julia> for t in tx
       @show t
       end
t = 1
t = 2

In the above example, each tx and ty are MultiSkipMissing objects which have zip(x, y) as a field. I then implemented skipmissing from Base with some changed logic such that instead of checking if the array of interest has a missing element, I check if any elements of the ith element of zip(x, y) are missing.

I this PR is not complete. I still can't get collect to work, so the original goal is not yet met.

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing this! Can you add a few tests?

I wonder whether this couldn't be called skipmissings. That would be simpler and easier to discover. what do you think?

EDIT: could you run some benchmarks, e.g. comparing with iterating over the input vectors when there are no missing values?

src/Missings.jl Outdated
y = iterate(itr.c, ntuple(_ -> first(state), width(itr)))
y === nothing && return nothing
item, state = y
while any(ismissing, item)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can avoid repeating the iterate stuff by doing while true with any(ismissing, item) || break.

Also I suspect that for performance any(x -> x ===missing, item) will be better due to inference limitations. And since that's what skipmissing does, better be consistent anyway.

src/Missings.jl Outdated Show resolved Hide resolved
src/Missings.jl Outdated
```
"""
function multiskipmissing(args...)
return ((MultiSkipMissing(args[i], zip(args...), i) for i in 1:length(args))...,)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return ((MultiSkipMissing(args[i], zip(args...), i) for i in 1:length(args))...,)
ntuple(i -> MultiSkipMissing(args[i], zip(args...), i), length(args))

Could also use the short function syntax since that's a one-liner..

src/Missings.jl Outdated
# Examples
```
julia> x = [1, 2, missing, 4]; y = [1, 2, 3, missing];
julia> tx, ty = multiskipmissing(x, y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to copy the output, which is interesting for users BTW. Also add a line break before (basically, just copy/paste the real output).

src/Missings.jl Outdated
```
julia> x = [1, 2, missing, 4]; y = [1, 2, 3, missing];
julia> tx, ty = multiskipmissing(x, y)
julia> for t in tx
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably simpler to show collect(t)?

@pdeffebach
Copy link
Contributor Author

I responded to comments.

I also ported the mapreduce implementation for skipmissing over to this PR. But the implementation relies on all the objects involved in the SkipMissings object to be indexable, which is not known to the compiler given the way the object is currently constructed.

Ideally the type of SkipMissings is something like

struct SkipMissings{T, C} where T where C ::Base.Iterators.Zip{<:Tuple}
    c::C
    I::Int
end

Then we use itr.I to peer into itr.c when needed to get information about the array of interest etc.

And we can dispatch on the elements of the Tuple in the Zip, but I'm not sure how to do that.

@nalimilan
Copy link
Member

Ideally the type of SkipMissings is something like

struct SkipMissings{T, C} where T where C ::Base.Iterators.Zip{<:Tuple}
    c::C
    I::Int
end

Then we use itr.I to peer into itr.c when needed to get information about the array of interest etc.

And we can dispatch on the elements of the Tuple in the Zip, but I'm not sure how to do that.

I'm not sure what prevents you from doing this. Can you show what you have tried?

@pdeffebach
Copy link
Contributor Author

I have updated this PR with a new implementation. Now the SkipMissings object contains only two fields, a Zip and an Int. When getting values, I use some gymnastics based off of the type signature to get elements from the array of interest. This requires a few more accessor functions which complicate the code a bit so I'm not sure it's worth it.

Additionally, I have updated the type signatures for the optimized reduce implementations. These are complicated pieces of code so if you don't think they are useful, let me know and we can worry about them in a later PR.

src/Missings.jl Outdated Show resolved Hide resolved
Project.toml Outdated
DataAPI = "1.1"
julia = "1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert this.

src/Missings.jl Outdated

# Returns nothing when the input contains only missing values, and Some(x) otherwise
@noinline function Base.mapreduce_impl(f, op, itr::SkipMissings{<:AbstractArray},
ifirst::Integer, ilast::Integer, blksize::Int)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix indentation.

src/Missings.jl Outdated
"""
skipmissings(args...)


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

src/Missings.jl Outdated
return ntuple(i -> SkipMissings{typeof(z), i}(z, i), length(args))
end

struct SkipMissings{C, I}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
struct SkipMissings{C, I}
struct SkipMissings{C}

Maybe also note this Z?

src/Missings.jl Outdated
Base._mapreduce(f, op, Base.IndexStyle(_inner_itr(itr)), Base.eltype(itr.z) <: Tuple{Vararg{Union{Missing, Any}}} ?
itr : _inner_itr(itr))

function Base._mapreduce(f, op, ::IndexLinear, itr::SkipMissings{C, I}) where C <: ZipofArrays where I
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
function Base._mapreduce(f, op, ::IndexLinear, itr::SkipMissings{C, I}) where C <: ZipofArrays where I
function Base._mapreduce(f, op, ::IndexLinear, itr::SkipMissings{C}) where C <: ZipofArrays

src/Missings.jl Outdated
Base.mapreduce_impl(f, op, A, ifirst, ilast, Base.pairwise_blocksize(f, op))

# Returns nothing when the input contains only missing values, and Some(x) otherwise
@noinline function Base.mapreduce_impl(f, op, itr::SkipMissings{C, I},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@noinline function Base.mapreduce_impl(f, op, itr::SkipMissings{C, I},
@noinline function Base.mapreduce_impl(f, op, itr::SkipMissings{C},

src/Missings.jl Outdated

# Returns nothing when the input contains only missing values, and Some(x) otherwise
@noinline function Base.mapreduce_impl(f, op, itr::SkipMissings{C, I},
ifirst::Integer, ilast::Integer, blksize::Int) where C <: ZipofArrays where I
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ifirst::Integer, ilast::Integer, blksize::Int) where C <: ZipofArrays where I
ifirst::Integer, ilast::Integer, blksize::Int) where C <: ZipofArrays

test/runtests.jl Show resolved Hide resolved
test/runtests.jl Outdated
@@ -75,6 +75,20 @@ struct CubeRooter end
@test collect(x) == [1, 2, 4]
@test collect(x) isa Vector{Int}

x = [1, 2, missing, 4]
y = [1, 2, 3, missing]
z = [missing, missing, 3, 4]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
z = [missing, missing, 3, 4]
z = [missing, missing, 3.1, 4.5]

And then test the different eltypes.

@pdeffebach
Copy link
Contributor Author

Thanks for the feedback! Two issues:

First, I don't understand how we can get rid of the I in the type signature of SkipMissings. Otherwise we don't know where to look for eltype etc. when all we have is the type SkipMissings and not an instance of the type.

Second, we don't have inference when compared to skipmissing in Base.

julia> x = [1, 2, missing, 4];
julia> y = ["a", "b", "c", missing];
julia> sx = skipmissing(x);
julia> mx, my = skipmissings(x, y);
julia> @code_warntype sum(sx)
# Variables
#   #self#::Core.Compiler.Const(sum, false)
#   a::Base.SkipMissing{Array{Union{Missing, Int64},1}}
#
# Body::Union{Missing, Int64}
# 1 ─ %1 = Base.sum(Base.identity, a)::Union{Missing, Int64}
# └──      return %1

julia> @code_warntype sum(mx)
# Variables
#   #self#::Core.Compiler.Const(sum, false)
#   a::Missings.SkipMissings{Base.Iterators.Zip{Tuple{Array{Union{Missing, Int64},1},Array{Union{Missing, String},1}}},1}
# 
# Body::Any
# 1 ─ %1 = Base.sum(Base.identity, a)::Any
# └──      return %1

I'm not sure how to fix it. Is the issue with my const ZipofArrays alias? Sorry that I wasn't able to figure this out.

@nalimilan
Copy link
Member

OK, I see. The problem is likely that k is just a value, not a type parameter, so it's not known at compile time. I think you either have to use I instead of k, or store the element type as a type parameter. The latter is probably better as it will avoid recompiling functions for each value of k, but probably better check the performance of both approaches. If that's not enough, you can add a type assertion to the first value returned by iterate (just like you did to getindex).

Something that needs to be checked is what happens when you iterate over all iterators returned by skipmissings (e.g. using for (x, y) in zip(skipmissings(args...)...) or cor(skipmissings(args...)...)): for performance it is essential that the compiler is able to avoid doing the iteration multiple times for each input. Maybe a special method for SkipMissings{Zip{Tuple{<:AbstractVector}}} with @inbounds will be needed for that.

@pdeffebach
Copy link
Contributor Author

Thanks for the suggestions, and sorry for taking so long to get back to this.

i have added the index of the relevant vector as a type parameter and added lots of type asserts. Still no luck. Let me know if you have any new ideas.

Otherwise I suppose we can iterate the vector separately. But that might have it's own performance drawbacks.

I understand your other point and will work on it soon.

src/Missings.jl Outdated

function Base.iterate(itr::SkipMissings, state = 1)
y = iterate(itr.z, ntuple(_ -> state, _width(itr)))
function Base.iterate(itr::SkipMissings{Z, I, T}, state = 1)::Tuple{eltype(T), Union{Int, Nothing}} where {Z, I, T}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@code_warntype sum(mx) from your previous post seems to be OK if I remove this failing type assert:

Suggested change
function Base.iterate(itr::SkipMissings{Z, I, T}, state = 1)::Tuple{eltype(T), Union{Int, Nothing}} where {Z, I, T}
function Base.iterate(itr::SkipMissings{Z, I, T}, state = 1) where {Z, I, T}

or if I fix it:

Suggested change
function Base.iterate(itr::SkipMissings{Z, I, T}, state = 1)::Tuple{eltype(T), Union{Int, Nothing}} where {Z, I, T}
function Base.iterate(itr::SkipMissings{Z, I, T}, state = 1)::Union{Tuple{eltype(T), Int}, Nothing} where {Z, I, T}

@pdeffebach
Copy link
Contributor Author

Still failing for me. I'm on 1.3 still. I will download the 1.5 and see if that's the issue.

@pdeffebach
Copy link
Contributor Author

The code correctly infers on 1.4-rc1! Now I will work on the zip propblem.

I would assume we have a "less performant" release for 1.0-1.3 and then people in 1.4 just get the added benefit of having this code work better.

@pdeffebach
Copy link
Contributor Author

Maybe a special method for SkipMissings{Zip{Tuple{<:AbstractVector}}} with @inbounds will be needed for that.

I understand the Zip argument and how that could be expensive. But can you describe what kind of improvement we can do for cor(mx, my) and similar functions? Looking at the code in Statistics, cor doesn't use zip at all.

To be fair, Skipmissing objects can't be used with cor because that just takes abstract vectors. Maybe we need some sort of collect function that works with many skipmissings objects?

@nalimilan
Copy link
Member

nalimilan commented Feb 18, 2020

The code correctly infers on 1.4-rc1! Now I will work on the zip propblem.

I would assume we have a "less performant" release for 1.0-1.3 and then people in 1.4 just get the added benefit of having this code work better.

OK, cool. Then yes, let's not worry about performance on older releases, at least for now. Please remove any type asserts that are not needed on Julia 1.4 (and anyway they are not enough on older releases).

I understand the Zip argument and how that could be expensive. But can you describe what kind of improvement we can do for cor(mx, my) and similar functions? Looking at the code in Statistics, cor doesn't use zip at all.

By Zip I didn't mean that we should optimize zip, just that we could/should optimize SkipMissings when it gets passed only AbstractArrays (the most common case). But actually I'm not sure what's needed, so better first benchmark the overhead of skipmissings, e.g. compared to passing vectors directly when they don't contain missing values at all. Maybe it's going to be fast already.

There are two different paths to optimize: iterate, and eachindex+getindex. For iterate, it would be possible to override the generic definition with a more specific one for AbstractArray arguments, which would do for i in eachindex(itr), and would then be able to use @inbounds when indexing the wrapped array. I'm not sure that would help, but it could be worth trying.

For the eachindex+getindex path, it will be a good idea anyway to put@inline before the getindex definition, and inside it check the bounds manually in a @boundscheck block, plus add @inbounds everywhere the index is used after that. eachindex already has @inbounds, but you must ensure that it's propagated to _anymissing, so add @inline to it too. See https://docs.julialang.org/en/latest/devdocs/boundscheck/ for details.

To be fair, Skipmissing objects can't be used with cor because that just takes abstract vectors. Maybe we need some sort of collect function that works with many skipmissings objects?

Ah, indeed. We can probably extend that since the two objects barely need to be iterable and have the same length. Anyway, for now you can just test performance with a custom function, like this:

function testperfzip(X, Y)
    r = zero(eltype(x)) * zero(eltype(x))
    for (x, y) in zip(skipmissings(X, Y)...)
        r += x * y
    end
    r
end

@pdeffebach
Copy link
Contributor Author

I benchmarked this a bit yesterday and it turns out it's enormously slow. I will spend some time next week uncovering the source of the allocations. I think the reason has to do with iterate allocating and not something like eachindex and @inbounds type improvements.

@pdeffebach
Copy link
Contributor Author

Good news! I think this is about on par with what I was expecting, after some re-factoring.

x = map(1:100) do _
    rand() < .2 ? missing : rand()
end

y = map(1:100) do _
    rand() < .2 ? missing : rand()
end

mx, my = skipmissings(x, y)
cx = collect(mx)
cy = collect(my)

function testperfzip(X, Y)
    r = zero(eltype(x)) * zero(eltype(x))
        for (x, y) in zip(X, Y)
            r += x * y
         end
         r
end

julia> @btime testperfzip($cx, $cy)
  1.692 μs (139 allocations: 2.17 KiB)
19.163942918070717

julia> @btime testperfzip($mx, $my)
  33.437 μs (579 allocations: 11.20 KiB)
19.163942918070717

@nalimilan
Copy link
Member

Cool. At least it looks like the overhead is reasonable. But have you tried with a larger vector? It's not clear whether the number of allocations increases with the length (it shouldn't).

Also, it would be interesting to check with vectors which don't contain any missing values: there the overhead should be low or non-existent, and that's a first goal to reach. Can you push your latest changes?

@pdeffebach
Copy link
Contributor Author

I think this is ready for another review.

My performance improvement was a shift from using Zip internally to just using a Tuple of all the other iterators and writing some helper functions which perform better when checking if anything is missing.

There is one big performance bottleneck and that is my use of

const SkipMissingsofArrays = SkipMissings{V, T} where {V <: AbstractArray, T <: Tuple{Vararg{AbstractArray}}}

Which I then use to do the type signature of functions that need arrays as inputs, in particular mapreduce. Julia isn't doing what I am expecting it to do here. I have

julia> mx isa Missings.SkipMissingsofArrays
julia> true

but mapreduce defaults to any, so something isn't working. Let me know if you have any thoughts.

@nalimilan
Copy link
Member

Which I then use to do the type signature of functions that need arrays as inputs, in particular mapreduce. Julia isn't doing what I am expecting it to do here. I have

julia> mx isa Missings.SkipMissingsofArrays
julia> true

but mapreduce defaults to any, so something isn't working. Let me know if you have any thoughts.

Have you checked that Base.IndexStyle returns IndexLinear()?

@pdeffebach
Copy link
Contributor Author

I added a method for Base.IndexStyle. It works now! I'm not 100% sure that's what made it work or there was some other error but it works now.

In fact, @code_warntype works better than skipmissing.

I think this PR is ready for a final review and then I would like to either

  1. Overload zip for SkipMissings so that we don't have to perform a double check every time we iterate when two arguments are together, which is a common occurence.
  2. Create a SkipMissingsZip that returns an iterator of tuples.

Both can be done in a separate PR.

@pdeffebach
Copy link
Contributor Author

Nevermind. It was working and now it isn't. I'm not sure what changed and need to investigate further.

@pdeffebach
Copy link
Contributor Author

@nalimilan I think I need you to take a look at it, or I should ask Discourse / Slack for help because I'm not understanding the lack of inference. I believe the issue is with Base.mapreduce_impl defined for SkipMissings.

The error has something to do with Some(something(v)) and Some is where the type inference breaks, I think.

@pdeffebach
Copy link
Contributor Author

posted on discourse here

@pdeffebach
Copy link
Contributor Author

pdeffebach commented Mar 11, 2020

Here is a gist for the new performance behavior. Performance still isn't great, I think. Since sum is good, iterate is working okay. But zip is slow, meaning there could be mileage out of a customized zip implementation.

===========
Proportion missing: 0.2
============

Sum:
==========
N = 100: skipmissings
    2.9539166667e-07 seconds

N = 100: vector
    1.5250752257e-08 seconds

N = 100: manual sum with view
    5.9066480447e-07 seconds

N = 100: manual sum with index
    4.5000000000e-07 seconds

N = 10_000: skipmissings
    2.9712000000e-05 seconds

N = 10_000: vector
    1.0934000000e-06 seconds

N = 10_000: manual sum with view
    2.7132000000e-05 seconds

N = 10_000: manual sum with index
    2.3014000000e-05 seconds


Zip
==========
N = 100: skipmissings
    6.8684000000e-07 seconds

N = 100: vector
    9.9984978541e-08 seconds

N = 100: manual skipping missings
    2.2794827586e-07 seconds

N = 10_000: skipmissings
    1.0735400000e-04 seconds

N = 10_000, vector
    1.1815000000e-05 seconds

N = 10_000: manual skipping missings
    4.7137000000e-05 seconds


============
Proportion missing: 0
============

Sum:
==========
N = 100: skipmissings
    1.7022041420e-07 seconds

N = 100: vector
    1.2069138277e-08 seconds

N = 100: manual sum with view
    4.9769072165e-07 seconds

N = 100: manual sum with index
    3.3501826484e-07 seconds

N = 10_000: skipmissings
    1.4783000000e-06 seconds

N = 10_000: vector
    1.0908000000e-06 seconds

N = 10_000: manual sum with view
    2.4616000000e-05 seconds

N = 10_000: manual sum with index
    1.7179000000e-05 seconds


Zip
==========
N = 100: skipmissings
    1.0692448759e-07 seconds

N = 100: vector
    9.9993576017e-08 seconds

N = 100: manual skipping missings
    1.0125751073e-07 seconds

N = 10_000: skipmissings
    1.1827000000e-05 seconds

N = 10_000, vector
    1.1820000000e-05 seconds

N = 10_000: manual skipping missings
    1.2560000000e-05 seconds

src/Missings.jl Outdated Show resolved Hide resolved
src/Missings.jl Outdated Show resolved Hide resolved
src/Missings.jl Outdated Show resolved Hide resolved
src/Missings.jl Outdated Show resolved Hide resolved
src/Missings.jl Outdated Show resolved Hide resolved
src/Missings.jl Outdated Show resolved Hide resolved
src/Missings.jl Outdated Show resolved Hide resolved
src/Missings.jl Outdated Show resolved Hide resolved
src/Missings.jl Outdated Show resolved Hide resolved
src/Missings.jl Outdated Show resolved Hide resolved
@pdeffebach
Copy link
Contributor Author

I have responded to all comments above.

  • With regards to branching and inference, I settled on:
ai === missing || @inbounds _anymissingindex(itr.others, i) || (v = op(v, f(ai)))

which is hacky but infers. On the other hand it isn't any faster than nested if statements so let me know if I should change it back. Nested if statements in general seem just as fast as the short-circuited expressions. sum is still 30x slower than a sum with a collected vector, which is frustrating.

  • I will need your help on show because I'm not sure how to get it to be compact enough.

  • I added @inbounds in front of every while and for loop and deleted the @inbounds inside those loops. Performance is the same so I think I did it right.

src/Missings.jl Outdated Show resolved Hide resolved
src/Missings.jl Outdated Show resolved Hide resolved
src/Missings.jl Outdated
@inbounds x_item = itr.x[ind]
@inbounds while true
x_item === missing || _anymissingindex(itr.others, ind) || break
indnext = iterate(eix, indstate)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not call this inditr as above? That's the same thing.

(BTW, it's a bit inconsistent to use x_item but xitr here and in the previous method. Better use the same convention regarding underscores everywhere.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@@ -402,7 +406,7 @@ Base.mapreduce_impl(f, op, A::SkipMissingsofArrays, ifirst::Integer, ilast::Inte
v = op(f(a1), f(a2))
@simd for i = i:ilast
@inbounds ai = A[i]
ai === missing || @inbounds !_anymissingindex(itr.others, i) || (v = op(v, f(ai)))
ai === missing || @inbounds _anymissingindex(itr.others, i) || (v = op(v, f(ai)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this was incorrect before and tests didn't spot it? Probably there was no test with an input large enough to use this branch (IIRC 16 elements is the threshold). Can you add one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a test for this.

src/Missings.jl Outdated Show resolved Hide resolved
test/runtests.jl Show resolved Hide resolved
src/Missings.jl Outdated Show resolved Hide resolved
src/Missings.jl Outdated Show resolved Hide resolved
src/Missings.jl Outdated Show resolved Hide resolved
src/Missings.jl Outdated Show resolved Hide resolved
src/Missings.jl Outdated Show resolved Hide resolved
@pdeffebach
Copy link
Contributor Author

I believe I successfully rebased! There were no conflicts.

@pdeffebach
Copy link
Contributor Author

As you know, my rebasing skills are weak. Googling around it seems like a re-base is too complicated for this scenario because I have too many fixes to code I added during the course of this PR. This is an alternate method I got off slack.

It works when I try it on a test branch.

git checkout multiskipmissing # this branch
git reset --soft <commit I diverged from>
git stash
git checkout master
git checkout -b temp_branch
git stash pop # will have to resolve conflicts in two lines. It's nbd. 
git commit -am "Squash everything"
git checkout multiskipmissing 
git reset --hard temp_branch # scary!

@pdeffebach
Copy link
Contributor Author

I propose to do this, but then push temp_branch to a new PR and merge it master in from there.

@nalimilan
Copy link
Member

I think the rebase is OK (except for that mysterious @inline change). But you need to make the @inferred test conditional on VERSION to fix CI.

src/Missings.jl Outdated Show resolved Hide resolved
@pdeffebach
Copy link
Contributor Author

The rebase didn't work all the way, I still think it's because of all the re-factoring that I did mid-PR. You can tell because the passmissing test still fails but would have not failed had the rebase gone through. But I updated the tests and docs.

@pdeffebach
Copy link
Contributor Author

I can confirm that git merge master works perfectly, though. with no conflicts.

@nalimilan
Copy link
Member

But the passmissing test doesn't seem to fail (CI passes everywhere except Julia 1.0)?

@pdeffebach
Copy link
Contributor Author

It doesn't seem to be testing 1.4 yet, so it passes. The PR to have it test 1.4 is not in here because the rebase didn't work. If I merge then it will see it.

@nalimilan
Copy link
Member

But master doesn't appear to test 1.4. I've pulled locally and rebased against latest master, but indeed the branch seems to be really messy. I think the simplest solution would be to squash everything before rebasing, so that you only have to fix conflicts once.

test/runtests.jl Outdated Show resolved Hide resolved
@nalimilan nalimilan closed this Apr 1, 2020
@nalimilan nalimilan reopened this Apr 1, 2020
@nalimilan
Copy link
Member

Argh, since the @inferred failure happens at parsing time, I think you need to wrap it in a @eval block.

@pdeffebach
Copy link
Contributor Author

Tests pass!

@test isapprox(mapreduce(cos, *, collect(mx)), mapreduce(cos, *, mx))
if VERSION >= v"1.4.0-DEV"
t = quote
@inferred Union{Float64, Missing} mapreduce(cos, *, $mx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, thinking about it again, shouldn't this be just Float64? The result cannot be missing by definition, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the skipmissing in Base we have

julia> @code_warntype sum(sx)
Variables
  #self#::Core.Compiler.Const(sum, false)
  a::Base.SkipMissing{Array{Union{Missing, Int64},1}}

Body::Union{Missing, Int64}
1 ─ %1 = Base.sum(Base.identity, a)::Union{Missing, Int64}
└──      return %1

I don't know why the output works like that given eltype(sx) == Int but it's a yet-to-be improved aspect of mapreduce with skipmissing. So it's fair that skipmissings would work the same. You have a comment on some issue highlighting that it's not a high priority performance issue but I can't find it.

Thankfully small unions are fast.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I need to investigate this. It's somewhat ironic that the function designed to skip missings isn't inferred as such. :-D

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I've realized @static if works here instead of @eval. I've committed that.

test/runtests.jl Outdated Show resolved Hide resolved
pdeffebach and others added 2 commits April 2, 2020 09:43
Co-Authored-By: Milan Bouchet-Valat <nalimilan@club.fr>
@nalimilan nalimilan merged commit bb10b04 into JuliaData:master Apr 3, 2020
@nalimilan
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants