Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IN(s) #1438

Closed
pkoppstein opened this issue Jul 1, 2017 · 8 comments
Closed

IN(s) #1438

pkoppstein opened this issue Jul 1, 2017 · 8 comments

Comments

@pkoppstein
Copy link
Contributor

pkoppstein commented Jul 1, 2017

[EDIT: The original post proposed an incorrect implementation of IN. It has been corrected.]

I believe the following establishes that the current implementation of IN(s) is both needlessly complex and slow.

Unless I'm missing something, here is an alternative (and much simpler) implementation:

def IN0(s): . as $in | first( if (s == $in) then true else empty end ) // false;

And here is the test I ran (for various values of N):

 [range(0;N) | IN( range(0;N) )] | length

Results:

# N             u+s

#   1000   IN0 0.389
#   1000   IN  0.43

#  10000   IN0 29.32
#  10000   IN  32.9s

Discussion:
One would expect the current implementation to be unnecessarily slow because it uses reduce, and therefore does no short-circuiting.

@Wolf-SO
Copy link

Wolf-SO commented Jul 1, 2017

@pkoppstein What part of code are you talking about?

@pkoppstein
Copy link
Contributor Author

For the record, the current definitions of IN/1 and IN/2 in builtin.jq are:

def IN(s): reduce (first(select(. == s)) | true) as $v (false; if . or $v then true else false end);
def IN(src; s): reduce (src|IN(s)) as $v (false; if . or $v then true else false end);

I would also propose simplifying the current definition of IN/2 to:

def IN(src; s): any( src|IN(s); .);

@fadado
Copy link

fadado commented Jul 1, 2017

I believe the following establishes that the current implementation of IN(s) is not only needlessly complex but also both needlessly and abysmally slow.

Unless I'm missing something, here is an alternative (and much simpler) implementation:

def IN0(s): . as $in | (first(s == $in) | true) // false;

You are right to consider the implementation of IN unpleasant, but I think your IN0 filter is wrong. Simple counter example:

def IN0(s): . as $in | (first(s == $in) | true) // false;
7777777 | IN0(range(88))

This outputs true!

My solution in the next post...

@fadado
Copy link

fadado commented Jul 1, 2017

Unless I'm missing something, here is an alternative (and much simpler) implementation:

def IN0(s): . as $in | (first(s == $in) | true) // false;

I have my own versions for some jq builtins, more elegant but only a bit faster (around a 10% in this case). For example, using my version for any (I add all for completeness) I can define a very short IN/0:

def all(stream; predicate):
    isempty(stream | predicate and empty)
;

def any(stream; predicate):
    isempty(stream | predicate or empty) | not
;

def IN(s): . as $in | any(s; . == $in);

I like your IN/2. I will integrate in my library a bit changed:

def some(stream):
    isempty(stream | . or empty) | not
;

def IN(src; s): some(src|IN(s));

@pkoppstein
Copy link
Contributor Author

pkoppstein commented Jul 1, 2017

@fadado - Thanks for pointing out the error in my original proposal. The original post has been corrected.

Please note that there are already definitions of all/2 and any/2 in builtin.jq. (They were written before isempty/0 was included.) Are you proposing that they be changed?

My timings for your implementation of IN/1 (using your revised any/2) indicate it is significantly slower than the current definition in builtin.jq. E.g. for N=10000, the comparable u+s for the range-based test using your IN/1 is 37.2s

@pkoppstein pkoppstein changed the title abysmal IN(s) IN(s) Jul 2, 2017
@fadado
Copy link

fadado commented Jul 2, 2017

Please note that there are already definitions of all/2 and any/2 in builtin.jq. (They were written before isempty/0 was included.) Are you proposing that they be changed?

If the actual builtin is faster, ok! But compare the "style":

def any(generator; condition):
        [label $out | foreach generator as $i
                 (false;
                  if . then break $out elif $i | condition then true else . end;
                  if . then . else empty end)] | length == 1;

vs

def any(stream; predicate):
    isempty(stream | predicate or empty) | not;

I work as a teacher and I'm always searching for beauty in the code...

@pkoppstein
Copy link
Contributor Author

@fadado - My comment about speed was confined to IN/1.

Now that I've compared some timings of your any/2 and the one currently in builtin.jq, I can see that your version seems to be significantly faster. That does not surprise me for several reasons, notably thatany/2 was added before isempty, as I mentioned.

@emanuele6
Copy link
Member

Outdated issue; IN/1 and IN/2 are defined as:

def IN(s): any(s == .; .);
def IN(src; s): any(src == s; .);

Closing...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants