Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transformations: Group by and aggregate on multiple fields #25498

Merged
merged 53 commits into from
Aug 31, 2020

Conversation

Totalus
Copy link
Contributor

@Totalus Totalus commented Jun 10, 2020

What this PR does / why we need it:

Adding a new transformation that calculates the number of occurrences of each value of the specified field and displays the count of each value as result. See #24716 for details.

Which issue(s) this PR fixes:

Fixes #24716

Special notes for your reviewer:

  • I based it off of other transformations source code.
  • The UI (OccurrencesTransformerEditor.tsx) is the same as the Outer Joint transformation as the options for this transformation are the same.
  • I modified the doc, but I don't know where to upload the actual images. They don't seem to be stored on the git repo.

@Totalus Totalus requested review from a team, mckn and hugohaggmark and removed request for a team June 10, 2020 00:26
@CLAassistant
Copy link

CLAassistant commented Jun 10, 2020

CLA assistant check
All committers have signed the CLA.

@torkelo
Copy link
Member

torkelo commented Jun 10, 2020

Thanks for contributing!

Wonder if we should turn this into a more general group by transform?

@mckn
Copy link
Contributor

mckn commented Jun 10, 2020

Great contribution! I agree with @torkelo regarding the group by.

Copy link
Contributor

@mckn mckn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome initiative and thanks for adding this! I think we should do a couple of changes in this PR to make this feature even more awesome.

Instead of only supporting occurrences I think that changing this one into a proper groupBy would be really great. One part of that is already in place and that is the grouping of the field that now is called occurrences.

What we also need to add is the possibility to add any of the other fields and in what way you want to aggregate them. If we take GROUP BY in SQL as example you can do something like:

SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country;

That basically mean that you will group your data by country and then also for each group add a column where we count all the Customer ID's for each group. We would like to have something similar for this transformation.

What do you think about changing the editor so you first select what field you want to group by. When that one is selected we could add the possibility to add other fields and how those should be aggregated. We already have the most common aggregation functions in place if you look at the reduce transformer. So those could basically be reused in this scenario.

So to make this feature ready to be merged I would suggest the following things:

  1. Rename this transform to groupBy.
  2. Add the possibility to add any of the other fields and an aggregation column for that field in the editor.
  3. Make sure we add a column for the other selected fields and properly aggregate them according to the selected function.

@Totalus what do you think about this? I am happy to help out if you need any help or have any questions.

@Totalus
Copy link
Contributor Author

Totalus commented Jun 10, 2020

Hello,
I understand what you mean. This would allow to do more than just count the number of occurrences, but also calculate the min, max, average, etc. for each different value of the specified field to group by. Smart ! I'll look into it.

I'm wondering if we could find a better name than Group by. For me Group by by itself doesn't mean much, especially since I am not a experienced SQL query writer. Also the group by operation seems to always be followed by a calculation function. I find most of the transformations name confusing currently, so I'm trying to find the best self explanatory name for this one. Would it make it clearer for users to name it Group by and Reduce ?

@oddlittlebird
Copy link
Contributor

Looks like this is still in progress. Please tag me when it's time for me to review docs.

@mckn
Copy link
Contributor

mckn commented Jun 11, 2020

Hello,
I understand what you mean. This would allow to do more than just count the number of occurrences, but also calculate the min, max, average, etc. for each different value of the specified field to group by. Smart ! I'll look into it.

I'm wondering if we could find a better name than Group by. For me Group by by itself doesn't mean much, especially since I am not a experienced SQL query writer. Also the group by operation seems to always be followed by a calculation function. I find most of the transformations name confusing currently, so I'm trying to find the best self explanatory name for this one. Would it make it clearer for users to name it Group by and Reduce ?

Super! This is great 🙌

Regarding the naming I see your point and I agree and disagree at the same time 🙈 For people used to the concept of group by from other products/languages this behavior is expected. So in that sense it is good to follow the classic way of doing a group by. So for them naming it something else might be confusing.

But for people not used to the details of that concept it might be confusing. That is why we have a description text below the name of the transform. So we can give the user a short description of what the transform does.

So I think naming it group by to start, get some feedback from the community and later change it if needed is the way to go. If that makes sense?

@torkelo
Copy link
Member

torkelo commented Jun 15, 2020

created an issue for the group by: #25498

@stale
Copy link

stale bot commented Jun 29, 2020

This pull request has been automatically marked as stale because it has not had activity in the last 2 weeks. It will be closed in 30 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale Issue with no recent activity label Jun 29, 2020
@Totalus
Copy link
Contributor Author

Totalus commented Jun 29, 2020

Work in progress. I still have a few additional commits to make before pushing for a review.

@stale stale bot removed the stale Issue with no recent activity label Jun 29, 2020
@mckn
Copy link
Contributor

mckn commented Jun 30, 2020

Work in progress. I still have a few additional commits to make before pushing for a review.

Awesome! No stress, please let me know if I can help out with something :)

@mckn mckn added area/dataframe area/transformations pr/external This PR is from external contributor labels Jul 7, 2020
@torkelo
Copy link
Member

torkelo commented Aug 25, 2020

Can't you limit the number of fields in your query? or in the filter transform?

@Totalus
Copy link
Contributor Author

Totalus commented Aug 25, 2020

Can't you limit the number of fields in your query? or in the filter transform?

I'm not experiencing this use case personally, and yes you could filter out the fields in another prior transform or in the query itself.

Look, if you need it to be that way, I'll make it that way. At this point its just a mater of personal preferences and I don't want to debate over that. Plus I'm getting a tired of working on this PR.

@torkelo
Copy link
Member

torkelo commented Aug 25, 2020

Look, if you need it to be that way, I'll make it that way. At this point its just a mater of personal preferences and I don't want to debate over that. Plus I'm getting a tired of working on this PR.

Totally understand that! Sorry, just trying to refine the UX for this. We can take over all the remaining tweaks! super grateful for all the work you have put in!

@Totalus
Copy link
Contributor Author

Totalus commented Aug 26, 2020

There you go.

snapshot

A remove button also appears at the end of a row if the field is not present anymore in the input data, giving the user the option to remove it.

@mckn
Copy link
Contributor

mckn commented Aug 26, 2020

There you go.

snapshot

A remove button also appears at the end of a row if the field is not present anymore in the input data, giving the user the option to remove it.

@Totalus

Really awesome work! I like the latest version of the UI and think we can merge this one. Get some feedback and change it afterward if needed. I will do a small adjustment to make the UI more intuitive but lets merge it after that one.

Great work! Really appreciate it and sorry for not being so responsive last couple of weeks. Have had focus on another project that have taken a bit to much time.

Copy link
Contributor

@mckn mckn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you

@torkelo
Copy link
Member

torkelo commented Aug 26, 2020

Awesome!

The UI editor needs more work, think we need to handle longer field names better (maybe use table layout to get field names to align). And the FieldCalculationsSelector is untyped. And think it should show fields in the same order as the data (currently it sorts field names).

Would like to test using a radio button group (https://developers.grafana.com/ui/latest/index.html?path=/story/forms-radiobuttongroup--full-width) for the group by | calculate | ignore selector.

Then there is the question of showing fields that exist in options but not in data, what do you think should the user have to manually remove those? Or automatically when they any option change?

@Totalus
Copy link
Contributor Author

Totalus commented Aug 26, 2020

The UI editor needs more work, think we need to handle longer field names better (maybe use table layout to get field names to align). And the FieldCalculationsSelector is untyped. And think it should show fields in the same order as the data (currently it sorts field names).

@torkelo At this point, I'll let you tweak it up to your taste.

Would like to test using a radio button group (https://developers.grafana.com/ui/latest/index.html?path=/story/forms-radiobuttongroup--full-width) for the group by | calculate | ignore selector.

That is going to take up more horizontal space though, up to you.

Then there is the question of showing fields that exist in options but not in data, what do you think should the user have to manually remove those? Or automatically when they any option change?

At first I was going to remove them automatically, but then I figured that if the user is configuring his query and transforms, and for some reason he decides to change a bit his query and it ends up returning no data, then he would loose all his transofmation configuration automatically, which I wanted to avoid. A possible way would be to remove the absent fields if the input data is not empty, or something like that. That could work too.

We could also improve the current behavior if we keep it, and make sure only the absent fields that were not ignored are kept in the list, with an option to remove them.

Anyway, at this point, I'm letting you finish it up.

@mckn
Copy link
Contributor

mckn commented Aug 27, 2020

The UI editor needs more work, think we need to handle longer field names better (maybe use table layout to get field names to align). And the FieldCalculationsSelector is untyped. And think it should show fields in the same order as the data (currently it sorts field names).

@torkelo At this point, I'll let you tweak it up to your taste.

Would like to test using a radio button group (https://developers.grafana.com/ui/latest/index.html?path=/story/forms-radiobuttongroup--full-width) for the group by | calculate | ignore selector.

That is going to take up more horizontal space though, up to you.

Then there is the question of showing fields that exist in options but not in data, what do you think should the user have to manually remove those? Or automatically when they any option change?

At first I was going to remove them automatically, but then I figured that if the user is configuring his query and transforms, and for some reason he decides to change a bit his query and it ends up returning no data, then he would loose all his transofmation configuration automatically, which I wanted to avoid. A possible way would be to remove the absent fields if the input data is not empty, or something like that. That could work too.

We could also improve the current behavior if we keep it, and make sure only the absent fields that were not ignored are kept in the list, with an option to remove them.

Anyway, at this point, I'm letting you finish it up.

I will help out making these changes :)

@mckn
Copy link
Contributor

mckn commented Aug 27, 2020

Example with radio button group:

Skärmavbild 2020-08-27 kl  07 29 43

@mckn mckn requested review from torkelo and removed request for torkelo August 28, 2020 04:32
@torkelo
Copy link
Member

torkelo commented Aug 28, 2020

The 100% width of the calculation select results in a bit broken layout.
Screenshot from 2020-08-28 15-42-31

Sorry for so many meandering UX feedback thoughts, this has been harder to find a UX for than I thought. Not 100% happy still.

Have potentially one more UX idea to try.

Mockup

Group by: Region, Data center   | + | 
Select: Max(Sensor1), Last(Sensor1), Max(Sensor2)   | + |

So similar to how it was before :) but I feel like we have explored and gone back and forth with this too much already to lets merge this and explore UI change in the future.

@mckn
Copy link
Contributor

mckn commented Aug 31, 2020

The 100% width of the calculation select results in a bit broken layout.
Screenshot from 2020-08-28 15-42-31

Sorry for so many meandering UX feedback thoughts, this has been harder to find a UX for than I thought. Not 100% happy still.

Have potentially one more UX idea to try.

Mockup

Group by: Region, Data center   | + | 
Select: Max(Sensor1), Last(Sensor1), Max(Sensor2)   | + |

So similar to how it was before :) but I feel like we have explored and gone back and forth with this too much already to lets merge this and explore UI change in the future.

Hmm, that sounds like a good plan! Probably something we can try to get some UX feedback on. I will have a look at the long field name issue that you posted above.

@mckn mckn merged commit 7db42f0 into grafana:master Aug 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Display the number of occurence of each value of a field
5 participants