Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] [Vis Builder] Aggregation Persistence in VisBuilder #3482

Closed
abbyhu2000 opened this issue Feb 22, 2023 · 8 comments
Closed

[Proposal] [Vis Builder] Aggregation Persistence in VisBuilder #3482

abbyhu2000 opened this issue Feb 22, 2023 · 8 comments

Comments

@abbyhu2000
Copy link
Member

As the result of this research task on aggregation persistence for Vis Builder: #2900 (comment), I propose the following:

Proposal: Metric data to metric data, bucket data to bucket data

After implementing global query persistence and app persistence, Vis Builder should also be able to persist aggregational values across compatible visualization types, and ideally between incompatible visualizations to a possible degree.

All aggregation schemas are divided into two categories: metric and bucket. Metric field means the data is numerical, and bucket field means the data is categorical. Since numerical and categorical data tend to serve different purposes in a visualization, another approach is to map all the metric field to metric field, bucket field to bucket field.

Implementation idea:

Since each schema field has a property group, and it will either be AggGroupNames.Metrics or AggGroupNames.Buckets . We can collect a list of aggregation that belongs to metrics group and another list of bucket group, and map them to the new visualization type’s metrics group and bucket group.

schemas: new Schemas([
          {
            group: AggGroupNames.Metrics,
            ...
            min: 1,
            max: 3,
          },
          {
            group: AggGroupNames.Buckets,
            ...
            min: 0,
            max: 1,
          },
export const AggGroupNames = Object.freeze({
  Buckets: 'buckets' as 'buckets',
  Metrics: 'metrics' as 'metrics',
  None: 'none' as 'none',
});

Pros & Cons:

  • Pros: The rules are simple to follow and it is scalable since all schema fields will belong to either one of the group.

  • Cons:

    • Some aggregation mappings might not make sense when switch to a new visualization type. It might introduce confusing user experience.
    • There need to be further mappings rules introduced since each schema might have multiple metrics group and multiple bucket groups. Each metric or bucket group might have different bounds for min and max number as well. We need to define a rule on the order of mappings, and what happened if we there are more fields than what can be mapped.

Implementation reason:

Since metric field is mostly for displaying numerical data, and bucket field is mostly for separating data into groups depending on how a visualization graph can be split up, i think it makes sense to map the aggregations to the fields according to their functionalities. For the ones that previously in a metric group, the user’s intent is probably to just display those data against some type of units. So if we map them into a new metric group in another visualization type, those data will still be displayed but just in a different format. For the ones that previously in a bucket group, the user’s intent is probably to break the global data into separate groups and observe if there will be any patterns existing in each group. If we map those into a new bucket group, we are still following the user’s intent of separating global data into groups.

Mapping rules:

  • Collect a list of aggregations that are in metric group, and a list of aggregations that are in bucket group.
  • For aggregations that previously belonged to metric group
    • First check if there are any new metric fields that have the same name. If so, mapping them to that new metric fields, drop the ones that exceed the max count.
    • Second, for the metric fields that do not have same name, start adding them to the metric field that have the most max count allowed, and drop the ones that can no longer be mapped to any metric field.
  • For aggregations that previously belonged to bucket group,
    • First check if there are any new bucket fields that have the same name. If so, If so, mapping them to that new metric fields, drop the ones that exceed the max count.
    • Second, for the bucket fields that do not have same name, start adding them to the new bucket field that have the most max count allowed, and drop the ones that can no longer be mapped to any bucket field.

Mapping example:

Here is an example for Bar:
Y-axis: Unique count of flight delay + Unique count of flightTimeHour
X-axis: timestamp per hour
Split series: day of week/descending
Split chart: flight delay/descending

If we switch from Bar to Line chart:
Y-axis: Unique count of flight delay + Unique count of flightTimeHour
X-axis: timestamp per hour
Split series: day of week/descending
Split chart: flight delay/descending
Radius: None

If we switch from Bar to Table vis:
Metric: Unique count of flight delay + Unique count of flightTimeHour
Split rows: timestamp per hour + day of week/descending + flight delay/descending
Split table in rows:
Split table in columns:

If we switch from Bar to Metric:
Metric: Unique count of flight delay + Unique count of flightTimeHour
Split groups: timestamp per hour + day of week/descending + flight delay/descending

UI/UX proposal:

To avoid over-engineering and introducing confusing user flow, I propose that we should keep the mapping rule simple and scalable, with the addition of giving users option to either have this aggregation persistence feature or not.

  • Persist on default and remove the popup window ‘Change visualization type’.
  • Persist on default, and have a mechanism/button for users to reset the page.
  • Add a toggle button on the Vis Builder page to let users indicate either to have this feature on or off.
  • On the pop up window(as shown below) after user switch the visualization type, add another button or toggle saying Change type and persisting current aggregations.

Screen Shot 2023-02-07 at 11 06 52 AM

@abbyhu2000
Copy link
Member Author

@KrooshalUX Could you please provide some insight on this? For the UI/UX section of the proposal, if we are going to implement the aggregation persistence on the vis builder page when we switch visualization type, what do you think we should do on the UI and user experience?

@abbyhu2000
Copy link
Member Author

@joshuarrrr @ashwin-pc @kavilla @ananzh Could you guys provide your insights on the above proposed mapping rules?

@ashwin-pc
Copy link
Member

I like the proposed solution here. Something you might want to play around with to see if it makes more sense is:

Second, for the metric/bucket fields that do not have same name, start adding them to the metric/bucket field that have the most max count allowed, and drop the ones that can no longer be mapped to any metric/bucket field.

I wonder if going by the order in the schema would be preferable here instead of the max count, since its sometimes possible that the max count could be high for a schema but its a less important breakdown than the one higher up in the list

@abbyhu2000
Copy link
Member Author

abbyhu2000 commented Feb 23, 2023

I like the proposed solution here. Something you might want to play around with to see if it makes more sense is:

Second, for the metric/bucket fields that do not have same name, start adding them to the metric/bucket field that have the most max count allowed, and drop the ones that can no longer be mapped to any metric/bucket field.

I wonder if going by the order in the schema would be preferable here instead of the max count, since its sometimes possible that the max count could be high for a schema but its a less important breakdown than the one higher up in the list

As i implement, i do think mapping by order make more sense. Here are some more questions: @ashwin-pc

  1. Should we assume, for now, there will only be two groups, metric and bucket? I think we can because all fields should only belong to either one of the group.
  2. When new vis type are added, since we implement the persistence mapping by order, should we make some rules for schema developer to follow? For example, add a readme stating
    • define metric schema fields before the bucket schema fields
    • define more important schema fields before the less important ones

@abbyhu2000
Copy link
Member Author

abbyhu2000 commented Feb 23, 2023

Updated proposing mapping rules:

  • Collect a list of aggregations that are in metric group, and a list of aggregations that are in bucket group.
  • For aggregations that previously belonged to metric group
    • Mapping metric fields according to their order in the schema; first one from the old vis mapped to first one from the new vis, second to second.. If they have different max count, drop the additional ones that can not be mapped. (Instead of mapping the additional one to the next available metric fields, i think it makes more sense to drop them because i assume each field serve their own purpose, so we should just map all the aggregations within one field strictly to another one field)
    • If there are more metric fields from the old vis than the new vis, drop all the aggregations from the additional metric fields
  • For aggregations that previously belonged to bucket group,
    • Mapping bucket fields according to their order in the schema; first one from the old vis mapped to first one from the new vis, second to second..
    • If they have different max count, drop the additional ones that can not be mapped
    • If there are more bucket fields from the old vis than the new vis, drop all the aggregations from the additional bucket fields

@abbyhu2000
Copy link
Member Author

abbyhu2000 commented Feb 24, 2023

Examples:

Among Histogram, and Line and Area are pretty straight forward. Below are some more confusing aggregation persistence involving Metric and Table:

If Table vis type have aggregations such as:
Metric: A, B, C, D, E
Split Rows: F, G, H, I, J, K, L, M, N
Split table in rows: O
Split table in columns: P

If we switch to Metric:
Metric: A, B, C, D, E
Split group: F

If we switch to Area:
Y-axis: A, B, C (drop D and E since Y-axis is the only metric group and the max count is 3)
X-axis: F
Split series: G, H, I
Split chart: J

If we switch to Line:
Y-axis: A, B, C (drop D and E since Y-axis is the only metric group and the max count is 3)
X-axis: F
Split series: G, H, I
Split chart: J
Dot size: K

Here we assume that users has entered their aggregations in the order of the schema, which means they finish entering aggregations for Y-axis, then move to the next one X-axis. Maybe we also need to inform users that for best persistence experience, inserting their aggregations in the order of the schema fields? @ashwin-pc

@ashwin-pc
Copy link
Member

@abbyhu2000 I'd suggest just documenting this in the code as a comment since this is important only to the visualization type authors

@ashwin-pc
Copy link
Member

@abbyhu2000 can we close this issue now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants