-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: have series/panel arithmetic operators use expressions (numexpr) #3765
Comments
I have this done for Series and implemented for scalar arithmetic in Panel. (less clear if it's possible / useful to accelerate combination operations with numexpr). @cpcloud @jreback could one of you write up how to create a Panel with >10K items with integer, floats, mixed integer and integers with zeros? Basically, like these (which are what I'm using for test cases in _frame = DataFrame(np.random.randn(10000, 4), columns = list('ABCD'), dtype='float64')
_frame2 = DataFrame(np.random.randn(100, 4), columns = list('ABCD'), dtype='float64')
_mixed = DataFrame({ 'A' : _frame['A'].copy(), 'B' : _frame['B'].astype('float32'), 'C' : _frame['C'].astype('int64'), 'D' : _frame['D'].astype('int32') })
_mixed2 = DataFrame({ 'A' : _frame2['A'].copy(), 'B' : _frame2['B'].astype('float32'), 'C' : _frame2['C'].astype('int64'), 'D' : _frame2['D'].astype('int32') })
_integer = DataFrame(np.random.randint(1, 100, size=(10001, 4)), columns = list('ABCD'), dtype='int64') |
also examples in
|
@cpcloud FYI: Exception: TypeError("unsupported operand type(s) for //: 'VariableNode' and 'VariableNode'",) |
Yup check out my list on the eval thread I think I note it there. Not sure what to do there. What's weird is that they support scalar floor division |
@cpcloud maybe we could special case it to do true division and then convert it to int? |
Yeah that should be fine. |
Hm but let me think about it cuz for a complex expression how to cast without performing the op? |
I setup series to use In [10]: ser = Series(np.random.randn(1000000))
In [11]: ser2 = Series(np.random.randn(1000000))
In [12]: %%timeit
....: ser * ser2
....:
100 loops, best of 3: 6.87 ms per loop
In [13]: expr.set_use_numexpr(True)
In [14]: %%timeit
....: ser * ser2
....:
100 loops, best of 3: 6.76 ms per loop
In [15]: %%timeit
....: ser / ser2
....:
100 loops, best of 3: 7.89 ms per loop
In [16]: expr.set_use_numexpr(False)
In [17]: %%timeit
....: ser / ser2
....:
100 loops, best of 3: 10.4 ms per loop |
In reality these are very similar to numpy operations and they prob cannot be parallelized, which is where a large part fo the speedups come from; doing these same operations on frames yields pretty big speedups (and even better when we have eval, which will enable multiple operates to be sent to ne) |
It just occurred to me that I'm using a 2 core Macbook Air, so I might not notice as much of a difference or I'm not using the right test cases to see the difference. (frame appears to be accelerated by about 30%, huge panel very little and series very little) |
build the eval branch that @cpcloud is working on that's interesting because the more things u have in a term the faster it gets |
@jtratner just fyi as of now only scalar arith ops with series/frame and already-aligned frames and series arith ops work as well... |
it's also fun to make a huge frame and run |
hey, do you have any instructions on how to set up vbench/vb_suite? Not clear and I'd like to help write cases and/or check out performance after changes. |
see this #3156 |
@jreback I worked something up and would be interested in yoru thoughts on it. After I started with this, I realized that I was duplicating all the code to create the special arithmetic methods (i.e. Upsides:
Downsides:
|
I like it! your downside points are correct but clearly IMHO outweighed by I am of the camp that you should only have code to do something in one place great job! can merge first thing in 0.12 helps consolidation - always a good thing |
this is nice! great to have the behavior propagate down into the pandas object hierarchy with minimal fuss |
@jreback @cpcloud thanks! :) now I just need to add a testing mechanism to |
@jtratner as an aside - my heuristic of when to use ne may be too conservative I am talkin about the 20k num of elements here (I forgot what number is actually there) |
number is 10k |
what's with those anyway i keep getting google/yahoo failures |
Maybe Travis' IPs are getting flagged as spamming because of all the builds? On Fri, Jun 14, 2013 at 9:18 PM, Phillip Cloud notifications@github.comwrote:
|
I thought there was a fix somewhere? need a decorator to skip a network test that fails because of a urlibopen failure (eg connection refused) |
anyway let me not derail the conversation here |
@jreback @cpcloud this is 90% of the way done, I just have to get a test suite working for testing that evaluate actually successfully evaluated an expression. The only problem that I have with this is that it kills the ability for static code analyzers to note that |
personally, i never use that feature of static code analyzers. i usually just want them to tell me, e.g., "hey you forgot to assign a variable here, dummy!" so i would say it's not worth it...expressions in pandas are a bit different anyway so the best way to get information about them short of reading the code, is in the online documentation. |
@cpcloud sorry, I was conflating two things. I mean that, in the original code, the static analyzer could at least tell you that the separately - I'm trying to add some kind of testing functionality to |
i think this will be closed by @jtratner's arith op refactor |
No description provided.
The text was updated successfully, but these errors were encountered: