Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: query modifies the frame when you compare with = #8828

Closed
wants to merge 6 commits into from

Conversation

onesandzeroes
Copy link
Contributor

Fix for #8664. The simplest way to fix this involved adding an assignment_allowed=True arg to the internal eval() function. All existing behaviour should be preserved. If adding the arg isn't okay I'm not quite sure how else to do it, as it's only once we reach the internal eval function that the expression is actually parsed and we know that it includes the assignment.

It also seems like a pretty bad side effect if query can silently overwrite values (obviously only if you accidentally include an assignment), so hopefully that's a good argument in favour of having the
assignment_allowed arg.

@jreback
Copy link
Contributor

jreback commented Nov 16, 2014

This is much simpler actually.

The res is None here because the query didn't return anything (e.g. it was an assignment). This is allowed.
So only do the following try: except if res is not None

https://github.com/pydata/pandas/blob/master/pandas/core/frame.py#L1918

Separate issue if we don't want to allow assignment in a .query (it IS allowed in .eval).
Fixing that is a bit more complicated as currently query is implemented in terms of eval. And they use the same expression node-set (e.g. they parse to the same thing and have the same unsupported nodes and such). So I think you would have to create a new dialect (maybe parser='pandas_query') which is a derivative of parser='pandas'.

This is not that difficult actually, just sub-class and add an unsupported node. (and add it to the tests for the various dialects).

@cpcloud

@cpcloud
Copy link
Member

cpcloud commented Nov 16, 2014

One thing to try is adding an unsupported node at runtime (instead of at class definition time which is how it's done now) based on the suggested flag

@jreback
Copy link
Contributor

jreback commented Mar 25, 2015

@onesandzeroes you want to give a shot to modify as discussed above? thxs

@onesandzeroes
Copy link
Contributor Author

@jreback Having another go at this now with a new parser subclass. I tried the simpler method you suggested of checking if res was None but I don't think that works as the assignment/mutation of the dataframe (which shouldn't ever happen from a query, IMO) has already happened at that point.

If the parser subclass looks like the right way to go I can clean up so that the PandasQueryExprVisitor isn't just a copy-paste of the original PandasExprVisitor. Obviously need to go through and clean up different docstrings etc. as well. Just slightly slow going at the moment as a lot of this is pretty new to me.

Doing this does make attempted assignment from query raise, although the error message is
a bit cryptic- you get:

df.query('a=1')
Traceback (most recent call last):
...
NotImplementedError: 'Assign' nodes are not implemented

So I guess we want to catch the exception and give a more specific message.

df = DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']})
a_before = df['a'].copy()
self.assertRaisesRegexp(
NotImplementedError, "'Assign' nodes are not implemented",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal (and you don't need to fix it here), but in an ideal world this should be something closer to SyntaxError, rather than NotImplementedError. NotImplementedError is really for abstract methods that should be overridden in a subclass or for features that genuinely have not been implemented yet. This is different: if = ever works in query it would be a bug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NotImplementedError is what you currently get when you do unsupported operations within a query/eval and hit an unsupported node, e.g.:

import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']})
df.eval("{'x': 1}")
NotImplementedError 
/home/me/.local/lib/python2.7/site-packages/pandas/computation/expr.pyc in visit(self, node, **kwargs)
    312         method = 'visit_' + node.__class__.__name__
    313         visitor = getattr(self, method)
--> 314         return visitor(node, **kwargs)
    315 
    316     def visit_Module(self, node, **kwargs):

/home/me/.local/lib/python2.7/site-packages/pandas/computation/expr.pyc in f(self, *args, **kwargs)
    203     def f(self, *args, **kwargs):
    204         raise NotImplementedError("{0!r} nodes are not "
--> 205                                   "implemented".format(node_name))
    206     return f
    207 
NotImplementedError: 'Dict' nodes are not implemented

So at the moment it's just working like any other invalid query/eval. For this specific case I'll probably try to catch the exception within DataFrame.query() and spit out a more meaningful
message. Not sure where you'd have to intercept the exception if you wanted to raise more
meaningful error messages from eval more generally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the NotImplementedError is fine. I agree ideally should be SyntaxError but that's a separate change (welcome to make an issue if so desired)

@onesandzeroes
Copy link
Contributor Author

Another option I just thought of is that we can just override visit_Assign in the query parsers, which makes it nice and clear that we shouldn't ever do anything with an assignment from a query. Ends up looking pretty clean, it's basically:

class PandasQueryExprVisitor(PandasExprVisitor):

    def visit_Assign(self, node, **kwargs):
        raise ValueError("Cannot assign within queries")

@shoyer
Copy link
Member

shoyer commented Mar 26, 2015

@onesandzeroes I like that latest idea!

@jreback jreback added this to the Next Major Release milestone May 9, 2015
@jreback
Copy link
Contributor

jreback commented May 9, 2015

@onesandzeroes want to rebase and we can re-examine this for 0.17.0?

@jreback
Copy link
Contributor

jreback commented Jun 26, 2015

can you rebase?

@onesandzeroes
Copy link
Contributor Author

OK, I rebased. I feel like this one is probably above my paygrade and I'm obviously not finding time to work on this stuff at the moment, so if anyone else want to take over they should definitely go ahead.

I do still have a branch sitting around where I tried to do this more simply by subclassing the existing ExprVisitor classes and overriding visit_Assign, it seemed promising at first but there were a lot of test cases to rewrite and I'm not sure if it was ultimately going to work.

@jreback
Copy link
Contributor

jreback commented Jul 6, 2015

@onesandzeroes what you are doing looks reasonable.

Need to explicity pass the parser though to have it catch the assign nodes (your tests catches it, but a real world example will still fail). e.g. in DataFrame.query

@@ -15225,6 +15225,16 @@ def test_query_builtin(self):
result = df.query('sin > 5', engine=engine, parser=parser)
tm.assert_frame_equal(expected, result)

def test_query_with_assign_statement(self):
df = DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the issue number as a comment here

@jreback jreback modified the milestones: 0.17.1, Next Major Release Oct 11, 2015
@jreback
Copy link
Contributor

jreback commented Oct 11, 2015

  • can you rebase
  • add a whatsnew note for 0.17.1 (add in API changes, give a mini example)
  • squash

@jreback
Copy link
Contributor

jreback commented Oct 25, 2015

@onesandzeroes can you update according to comments

@jreback jreback modified the milestones: Next Major Release, 0.17.1 Nov 13, 2015
@jreback
Copy link
Contributor

jreback commented Nov 13, 2015

@onesandzeroes pls update according to comments

@jreback
Copy link
Contributor

jreback commented Nov 25, 2015

can you update

@jreback
Copy link
Contributor

jreback commented Dec 9, 2015

closing in favor of #11149 ; this is fixed more generally

@jreback jreback closed this Dec 9, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants