-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add groupby to dataframe #278
Conversation
Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some early comments, I need to pull this PR locally and give it a spin to see what the requests and responses look like :)
@sethmlarson if b.is_timestamp:
agg_value = elasticsearch_date_to_pandas_date(
agg_value, b.es_date_format
) There will be an improvement of at least 2 seconds 😲 I guess we have to do something in this: Any suggestion? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more comments for you!
The timestamp
field should be the same as what it'd be if it wasn't in the index so if it's a datetime it should still be a datetime. This will be necessary when we go to implement resample()
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pulling this locally to take a closer look again! :) Left you a collection of comments. Also a general comment, would be good to start working on more test cases, especially to cover errors.
I need to add pytest-cov
so we can see how much test coverage we have :)
jenkins test this please |
|
|
jenkins test this please |
Yay! Builds successful 😎 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me! Will do some tweaking and documenting now that the core functionality is there. Thanks for working so hard on this :)
PR on #172
@sethmlarson Can you please review these when free? Still lot of work has to be done. 😕
I implemented groupby to dataframe using composite aggreagation
Only implemented single aggs and aggregation
TODO Count, grouped (
ed_flights.groupby("Cancelled")
), dropnaOne more suggestion needed
For the following query eland is taking 7.93 seconds, whereas pandas takes 1.56 seconds 😨
If you can review the composite aggregation logic, and give some inputs on dropna, grouped, count (how to query ES using filters?). I will proceed on the logic in eland. That's where I was getting stopped.