Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make match return Nullable{RegexMatch} #12033

Closed
wants to merge 1 commit into from

Conversation

malmaud
Copy link
Contributor

@malmaud malmaud commented Jul 6, 2015

Obviously this is a breaking change, but it has to be done at some point if we ever want a type-stable match.

@simonster
Copy link
Member

Ref #10550. But as I noted there, the compiler is pretty smart about this particular kind of type instability and I would not be surprised if it's more expensive to allocate a Nullable.

@quinnj
Copy link
Member

quinnj commented Jul 6, 2015

Compiler cleverness or not, IMO this is a conceptually better API. @johnmyleswhite and I have also seen that the compiler can be pretty smart dealing with Nullables as well. +1 to the change here.

@malmaud
Copy link
Contributor Author

malmaud commented Jul 6, 2015

Quick benchmark:

Before:

function f(N=10000)
    r=r"a"
    x=Vector{RegexMatch}()
    for i=1:N
        push!(x, match(r, "a"))
    end
    x
end
@time f();
3.023 milliseconds (60011 allocations: 2600 KB)

After:

function f(N=10000)
    r=r"a"
    x=Vector{Nullable{RegexMatch}}()
    for i=1:N
        push!(x, match(r, "a"))
    end
    x
end
@time f();
3.336 milliseconds (90011 allocations: 3538 KB)

@ScottPJones
Copy link
Contributor

Does anybody have any examples with generated code to show whether or not Nullable{RegexMatch} is in any way more or less performant than Union{RegexMatch, Void}? The compiler is pretty smart, and I agree that this is a better API. (Take a look at Swift for first class handling of nullability in a language [whether or not you like their syntax], and Nullable makes null handling a lot more first class in Julia, IMO)

@tkelman tkelman added the breaking This change will break code label Jul 6, 2015
@malmaud
Copy link
Contributor Author

malmaud commented Jul 6, 2015

It occurs to me that if we're going to do this, we should also change RegexMatch.captures to Nullable{SubString} instead of Union{Void, SubString}.

@tkelman
Copy link
Contributor

tkelman commented Jul 6, 2015

This appears to freeze during bootstrap on Win64.

@malmaud
Copy link
Contributor Author

malmaud commented Jul 6, 2015

@tkelman Are you sure that's a problem with this PR and not an unrelated issue with appveyor?

@tkelman
Copy link
Contributor

tkelman commented Jul 6, 2015

I'll build locally and check, but freezes during bootstrap are not currently expected - only freezes during test startup.

@tkelman
Copy link
Contributor

tkelman commented Jul 6, 2015

A local cygwin cross-compile actually works okay, so I'll restart the appveyor build. We're using a slightly different toolchain on appveyor for various reasons, will add to my to-do list to try and reproduce via the appveyor script.

@johnmyleswhite
Copy link
Member

FWIW, I agree that using Nullable is a better approach in the long run. Using Union{T, Union{}} or Union(T, Void) works in some contexts, but still causes boxing in other contexts such as repeated object creation in loops. That makes me really hesitant to keep using those techniques, since I find it hard to predict which optimizations the compiler will apply.

Simon, if we solved the GC issues that cause non-bitstype immutables to have pointer indirection, would you expect that Nullable objects would perform at least as well Union(T, Void)? That's my expectation, but I'm not totally sure. My current understanding is that GC issues are the last impediment to making Nullable objects consistently low-cost abstractions.

Re the .captures bit, there's a bunch of code in the regex file that uses union types in places I would (naively) think they're unnecessary.

@davidagold
Copy link
Contributor

Also FWIW, using a NullableArray appears to save on some allocation costs (as compared to Array{Nullable{RegexMatch}, 1}), at least in the test case above:

EDIT: Now including "before" and "after", since I realized my previous example was on the current implementation of match. The "after" example below is on my local build of @malmaud 's regex_null branch.

Before:

using NullableArrays
function f(N=10000)
    r=r"a"
    x=NullableArray{RegexMatch, 1}()
    for i=1:N
        push!(x, match(r, "a"))
    end
    x
end
f();

julia> @time f();
   3.049 milliseconds (60025 allocations: 2633 KB)

After:

using NullableArrays
function f(N=10000)
    r=r"a"
    x=NullableArray{RegexMatch, 1}()
    for i=1:N
        push!(x, match(r, "a"))
    end
    x
end
f();

julia> @time f();
   3.089 milliseconds (70025 allocations: 2945 KB)

However, the time elapsed seems to be quite variable, going up to as high as ~9 ms on some of my runs.

@nalimilan nalimilan added the missing data Base.missing and related functionality label Sep 6, 2016
@malmaud
Copy link
Contributor Author

malmaud commented Oct 6, 2016

Wanted to ping this again. I know it's an obnoxious breaking change, but I don't think we want a Julia 1.0 that has a type-unstable match.

@nalimilan
Copy link
Member

Similarly breaking changes are discussed at #15755. Would be logical to make them all at the same time.

@KristofferC
Copy link
Member

Nullable is removed from Base

@KristofferC KristofferC closed this Jan 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking This change will break code missing data Base.missing and related functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants