-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: query modifies the frame when you compare with =
#8828
Conversation
This is much simpler actually. The https://github.com/pydata/pandas/blob/master/pandas/core/frame.py#L1918 Separate issue if we don't want to allow assignment in a This is not that difficult actually, just sub-class and add an unsupported node. (and add it to the tests for the various dialects). |
One thing to try is adding an unsupported node at runtime (instead of at class definition time which is how it's done now) based on the suggested flag |
@onesandzeroes you want to give a shot to modify as discussed above? thxs |
5cbf94d
to
312220b
Compare
@jreback Having another go at this now with a new parser subclass. I tried the simpler method you suggested of checking if If the parser subclass looks like the right way to go I can clean up so that the Doing this does make attempted assignment from df.query('a=1')
Traceback (most recent call last):
...
NotImplementedError: 'Assign' nodes are not implemented So I guess we want to catch the exception and give a more specific message. |
df = DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']}) | ||
a_before = df['a'].copy() | ||
self.assertRaisesRegexp( | ||
NotImplementedError, "'Assign' nodes are not implemented", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal (and you don't need to fix it here), but in an ideal world this should be something closer to SyntaxError
, rather than NotImplementedError
. NotImplementedError
is really for abstract methods that should be overridden in a subclass or for features that genuinely have not been implemented yet. This is different: if =
ever works in query it would be a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NotImplementedError
is what you currently get when you do unsupported operations within a query/eval and hit an unsupported node, e.g.:
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']})
df.eval("{'x': 1}")
NotImplementedError
/home/me/.local/lib/python2.7/site-packages/pandas/computation/expr.pyc in visit(self, node, **kwargs)
312 method = 'visit_' + node.__class__.__name__
313 visitor = getattr(self, method)
--> 314 return visitor(node, **kwargs)
315
316 def visit_Module(self, node, **kwargs):
/home/me/.local/lib/python2.7/site-packages/pandas/computation/expr.pyc in f(self, *args, **kwargs)
203 def f(self, *args, **kwargs):
204 raise NotImplementedError("{0!r} nodes are not "
--> 205 "implemented".format(node_name))
206 return f
207
NotImplementedError: 'Dict' nodes are not implemented
So at the moment it's just working like any other invalid query/eval. For this specific case I'll probably try to catch the exception within DataFrame.query()
and spit out a more meaningful
message. Not sure where you'd have to intercept the exception if you wanted to raise more
meaningful error messages from eval
more generally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the NotImplementedError
is fine. I agree ideally should be SyntaxError
but that's a separate change (welcome to make an issue if so desired)
Another option I just thought of is that we can just override class PandasQueryExprVisitor(PandasExprVisitor):
def visit_Assign(self, node, **kwargs):
raise ValueError("Cannot assign within queries") |
@onesandzeroes I like that latest idea! |
@onesandzeroes want to rebase and we can re-examine this for 0.17.0? |
can you rebase? |
This reverts commit 655fb5e5be3981b43ff146d68a0fa53e75d98dd1.
7a9081e
to
70d8345
Compare
OK, I rebased. I feel like this one is probably above my paygrade and I'm obviously not finding time to work on this stuff at the moment, so if anyone else want to take over they should definitely go ahead. I do still have a branch sitting around where I tried to do this more simply by subclassing the existing |
@onesandzeroes what you are doing looks reasonable. Need to explicity pass the parser though to have it catch the assign nodes (your tests catches it, but a real world example will still fail). e.g. in |
@@ -15225,6 +15225,16 @@ def test_query_builtin(self): | |||
result = df.query('sin > 5', engine=engine, parser=parser) | |||
tm.assert_frame_equal(expected, result) | |||
|
|||
def test_query_with_assign_statement(self): | |||
df = DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the issue number as a comment here
|
@onesandzeroes can you update according to comments |
@onesandzeroes pls update according to comments |
can you update |
closing in favor of #11149 ; this is fixed more generally |
Fix for #8664. The simplest way to fix this involved adding an
assignment_allowed=True
arg to the internaleval()
function. All existing behaviour should be preserved. If adding the arg isn't okay I'm not quite sure how else to do it, as it's only once we reach the internal eval function that the expression is actually parsed and we know that it includes the assignment.It also seems like a pretty bad side effect if query can silently overwrite values (obviously only if you accidentally include an assignment), so hopefully that's a good argument in favour of having the
assignment_allowed
arg.