Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📈 New "Dashboard" tab - Design Considerations #961

Closed
JGreenlee opened this issue Aug 25, 2023 · 66 comments
Closed

📈 New "Dashboard" tab - Design Considerations #961

JGreenlee opened this issue Aug 25, 2023 · 66 comments

Comments

@JGreenlee
Copy link

A couple months ago, we discussed in #922 and created some wireframes
Now that Abby and I are implementing this Dashboard rewrite in e-mission/e-mission-phone#1018, I am starting a new issue to continue discussion.

These are the wireframes from #922, copied here for convenience:

Here are some initial drafts of the implementation:

image

image

image

Some things to note:

  • the carousel mechanism is working really well and makes it way clearer that there are additional cards to swipe to
  • units of measurement on the graphs are scaled wrong

There are a few things we need to stop and consider.

Daily / Weekly / Monthly / Yearly interval

The old Dashboard has options to change the basis of time on which these metrics are represented:

  1. Concretely, what does this do?

  2. Do we need to support it? Why not just fetch the data on a daily basis and segment it into week / months that way? Does that put extra stress on the server?

Active minutes

The wireframes showed active minutes per day as a chart. However, the CDC recommendation is on a weekly basis (150 minutes, moderate intensity)
I think that weekly goals are generally more appropriate for this, so I am suggesting that we pivot to a simple comparison of "past week" vs "previous week", each of these with stacked 'walk' and 'bike' totals.
Then, we can put the target line at 150 and visually see whether, between your cumulative active minutes, you reached 150.

Then, I think we should have a separate card to the right of this (swipable to by carousel) that breaks this down by (i) high-intensity, (ii) moderate-intensity, (iii) low-intensity.

Average speed

We receive average speed metrics from the server. These appear to be the average speed per mode per day.

So if on Monday, I walked to the bank at a speed of 4mph, and return at a speed of 2mph - my average for Monday is 3mph.
Then on Tuesday, I walk to the store at 3mph and return at 5mph - my average for Tuesday is 4mph.

But to get my average across both days, I don't think we can just take these two figures (3mph and 4mph) and average them together to get 3.5mph.
Because what if the walk to the store was 20 minutes, while to the bank it's only 10 minutes?
Then mathematically, my average speed between those days was not 3.5mph, it was greater than that.

The proper way to calculate my Average walking speed for Monday and Tuesday would be to find my total walking distance on those days, divided by my total walking duration on those days.
Those are two metrics that we already have - so I don't think we actually have any use for the speed metrics that we get from the server.

@Abby-Wheelis
Copy link
Member

We receive average speed metrics from the server. These appear to be the average speed per mode per day.

I am actually bumping up against this same issue as I start to boil down the metrics from the server into the formats that FootprintHelper accepts (I plan to continue using FootprintHelper, at least for now, to get the carbon footprints). At a high level, data from the server is parsed into "modeMaps" which are then aggregated into "modeSummaries" -- the previous footprint card seems to only rely on distance summaries, but the mean_speed summary was still calculated using the mathematically improper way Jack describes - summing across days for each mode and dividing by the number of days.

So, not a block on my current goal (rough draft of a carbon footprint card) but another place to consider this change, and now that I'm writing it out, we might want to centralize this data mapping for use across cards.

@shankari
Copy link
Contributor

Do we need to support it? Why not just fetch the data on a daily basis and segment it into week / months that way? Does that put extra stress on the server?

Intuitively, yes but I don't think we have ever quantified the savings.
Think about it - reading a year's worth of data, segmented as days, will return 365 * 4 (number of metrics) results.
Reading a years's worth of data, segmented as months, will return only 12 * 4

The general rule of thumb is that as the number of entries that you retrieve grows, collating on the server is going to be much more performant than collating on the client.

However, I take your point that it may not need to be exposed to the user. The original goal was just to provide a user-version of the server API. I am open to not exposing this, and choosing it automatically under the hood based on the range selected, for example.

Do you have a concrete proposal?

@shankari
Copy link
Contributor

I think that weekly goals are generally more appropriate for this, so I am suggesting that we pivot to a simple comparison of "past week" vs "previous week", each of these with stacked 'walk' and 'bike' totals.
Then, we can put the target line at 150 and visually see whether, between your cumulative active minutes, you reached 150.

Then, I think we should have a separate card to the right of this (swipable to by carousel) that breaks this down by (i) high-intensity, (ii) moderate-intensity, (iii) low-intensity.

Sounds good to me.

We receive average speed metrics from the server. These appear to be the average speed per mode per day.

I think the only reason we even include the average speed in the metrics is because the METs (and thus our new mild/moderate/high exercise counts) depend on the average speed for the active transportation mode - walking at 5mph is very different from an exertion perspective, than walking at 2 mph.

And once we got the metrics, we just treated them like all the other metrics. I am not sure that the mean speed is useful outside the context of METs (I never use it), so we could just omit it from the visualizations.

@shankari
Copy link
Contributor

All in all, this looks great; look forward to seeing it on staging soon!

@Abby-Wheelis
Copy link
Member

I've been working through pulling the data into a new card for carbon footprint, and so far I have the 2030 and 2050 goals alongside the user's low and high estimates for current and past weeks (see screenshot, user metrics would show as a range if there were unlabeled trips). Our existing implementation also includes a "vs all taxi" and "average for group" metric.

I think we should be really intentional here to show users "how well" they're doing in terms of having a low carbon footprint and/or lowering their footprint. To that end I like the plan with the "% change" being a central display and with the 2030 and 2050 lines as goals on the graph. This would leave user current week, user previous week, average for group, and all taxi as candidates for the main bars on the graph.

I can see the value in "vs all taxi" (to show savings) and "average for group" (for competitive users) but I think it opens a conversation about error bars and how to calculate and show those. For most of the metrics as they are calculated now, we maintain a "low" and "high" estimate and that's what is used to show users the range. How would that translate to error bars? My instinct would be to show the bar's value as the middle of the range, then the error bars as the low-high estimates, but I'm not sure that's statistically sound.

The aggregate metrics might throw the scale of the carbon card (nrel-commute, for example, the range right now is 0-484) and if scaling is wonky then it's hard to make sense of a graph at all). I might propose a toggle to see the group metrics on the carbon card, but I'm unsure how much interest there would be in that. Does anyone have a suggestion on how we should show group metrics in terms of carbon?

@shankari
Copy link
Contributor

For most of the metrics as they are calculated now, we maintain a "low" and "high" estimate and that's what is used to show users the range. How would that translate to error bars? My instinct would be to show the bar's value as the middle of the range, then the error bars as the low-high estimates, but I'm not sure that's statistically sound.

The "error bars" project will generate an estimated value along with the range. At least for CO2, that range will be 1 SD. For some of the other metrics, such as distance and duration, it will likely be 1 variance. That is in fact statistically sound 😄 (@humbleOldSage, @rahulkulhalli, @allenmichael099)

Also, given that we are redesigning this anyway and are funded by the Department of Energy, we might want to show energy and emissions separately. Right now, given the state of the US grid, energy and emissions are proportional. However, there are significant grid decarbonization efforts ongoing (many of which NREL is very involved in), so once we implement #954 they may start to diverge in a year or so.

@JGreenlee
Copy link
Author

Until we have the estimated values available to use, maybe for now we can use stacked bar charts to show uncertainty.

The range between the "low" and "high" estimates can be shown in a lighter color, or potentially with slashed lines, to represent indeterminacy.

This page illustrates the kind of visualization I mean:
https://setproduct.com/charts/templates

@JGreenlee
Copy link
Author

I am not sure what to do about getting these "low-", "moderate-", and "high-intensity" active minute counts.

To do this correctly, we would need trip-level information, not just day-level or week-level information. If I went for a 10-minute sprint this morning and take a 30-minute stroll this evening, we should expect that to count as "10 minutes high intensity" and "30 minutes low intensity"

However, the only information we have access to is the total walk distance/duration/speed for the entire day. So the best we can do is say which days were high/moderate/low intensity.
A day like today, where I take a sprint and a stroll, might end up averaging out as "moderate intensity" for the entire day, and I think this would be misleading.

To get the desired result of "10 minutes high intensity and 30 minutes low intensity", we would need server-side changes.

@Abby-Wheelis
Copy link
Member

The range between the "low" and "high" estimates can be shown in a lighter color, or potentially with slashed lines, to represent indeterminacy.

I like this idea, here's a first draft of including the carbon metrics as a chart:

I want to adjust the colors to the darker/lighter like you suggested, and I'm also hoping to move the change to the header of the card, as it is in the wireframe, as well as move the goals to lines on the graph, instead of their own bars.

If the 2030/2050 goals are horizontal lines, should they stay on the chart if they are a certain degree out-of-range of the user's data?

JGreenlee added a commit to JGreenlee/e-mission-phone that referenced this issue Aug 28, 2023
"Average speed" needs to be handled differently because it is not mathematically correct to average it across days without considering differences in distance and duration between those days.
(described in e-mission/e-mission-docs#961)

We can comment this out; maybe revisit later
@JGreenlee
Copy link
Author

Looks like it is coming along great!

Is there a reason you opted to show this chart vertically? Our wireframes from before had these in a horizontal layout. I believe we made that choice in consideration of the "meter" metaphor that we were trying to convey.

But if you think the vertical layout is better, I'm happy to have additional discussion and weigh the pros and cons.

If the 2030/2050 goals are horizontal lines, should they stay on the chart if they are a certain degree out-of-range of the user's data?

I think so. The target lines are the most important point of reference - they give meaning to all the other measurements.
Even in a situation where a user's emissions are well below the goals (meaning the user is doing really well at reducing emissions), and their emission bars only show up as 5 pixels tall, I still think that is exactly what we should show.

@Abby-Wheelis
Copy link
Member

Is there a reason you opted to show this chart vertically? Our wireframes from before had these in a horizontal layout. I believe we made that choice in consideration of the "meter" metaphor that we were trying to convey.

I think I was just most familiar with vertical charts, so I started there. I just flipped it and looking at the goal lines that way I think we should find a way to color the lines to covey that "less is more" here, my first instinct when I saw the 2030 goal off to the right was that it would be "better" if my bars were closer, which is not the case.

Even in a situation where a user's emissions are well below the goals (meaning the user is doing really well at reducing emissions), and their emission bars only show up as 5 pixels tall, I still think that is exactly what we should show.

I hadn't thought about it that way, but that totally makes sense. I'm curious to see how the interactivity to see the values of the bars works on a real phone, I think having that pop up saves us from the "where'd it go" concern that I initially had.

@JGreenlee
Copy link
Author

I think we should find a way to color the lines to covey that "less is more" here, my first instinct when I saw the 2030 goal off to the right was that it would be "better" if my bars were closer, which is not the case.

Brainstorming ideas for that:

  • tint the background of the chart with green-yellow-red, like a classic "meter" or "gauge"
    -- green = below 2050 goal; yellow = between 2050 and 2030 goal; red = over 2030 goal
    -- but it might be ugly to overlay bars on top of a colored chart background
  • just color the bars themselves as green-yellow-red, depending on whether they pass the targets
    -- I don't know if this will get the point across
    -- but we could add some icons or emojis (⚠️❗) to make it clearer
    -- uncertainty makes it tricky to say whether we are past the target or not
  • at the bottom of the chart, show a meter with green-yellow-red similar to this

@Abby-Wheelis
Copy link
Member

Abby-Wheelis commented Aug 29, 2023

I think coloring the bars or a meter across the bottom (2nd or 3rd bullet) could be good.

With the uncertainty, I think if the uncertainty pushes us past a goal it's fair to change the bar color, and this could serve as further motivation to label? I also like the idea of icons if we need them, we could even add those to the lines themselves, potentially marking them better as thresholds? Not sure how crammed that would make those line annotation labels though.

I have tried turning the bars horizontal (by changing the isHorizontal param to true), but I can't get bars to show up at all that way. The line annotations and scales seem fine and I haven't seen any errors. I haven't figured out why yet, but I'll keep looking into it. update: this was because I forgot to flip x and y in the graph records when I flipped the boolean

@JGreenlee
Copy link
Author

I am not sure what to do about getting these "low-", "moderate-", and "high-intensity" active minute counts.

To do this correctly, we would need trip-level information, not just day-level or week-level information. If I went for a 10-minute sprint this morning and take a 30-minute stroll this evening, we should expect that to count as "10 minutes high intensity" and "30 minutes low intensity"

However, the only information we have access to is the total walk distance/duration/speed for the entire day. So the best we can do is say which days were high/moderate/low intensity. A day like today, where I take a sprint and a stroll, might end up averaging out as "moderate intensity" for the entire day, and I think this would be misleading.

To get the desired result of "10 minutes high intensity and 30 minutes low intensity", we would need server-side changes.

Seeking advice on how to proceed with this. I think it would require server changes, so is it even worth implementing right now? Maybe we should revisit it later? Is there a suitable substitute we can implement in the meantime?

@shankari
Copy link
Contributor

shankari commented Aug 29, 2023

Seeking advice on how to proceed with this. I think it would require server changes, so is it even worth implementing right now? Maybe we should revisit it later? Is there a suitable substitute we can implement in the meantime?

It is very dear to my heart but I think we should hold off on it for now.

Couple of options:

  • energy (see my note above about being funded by DOE)
  • cost (per suggestion from @idillon-sfl?)

The classic travel behavior drivers are cost and time, and we have time (although maybe not super visible), but no cost.

Both of them could start with a basic value/PkmT and just multiply and combine to give the value.
The energy values (which we infer emissions from) are at https://github.com/e-mission/em-public-dashboard/tree/main/viz_scripts/auxiliary_files
and there's a pending PR related to cost which I'm waiting for somebody to cleanup but which does have some reasonable estimates for the cost intensity (e-mission/em-public-dashboard#36)

@Abby-Wheelis
Copy link
Member

The aggregate metrics might throw the scale of the carbon card (nrel-commute, for example, the range right now is 0-484) and if scaling is wonky then it's hard to make sense of a graph at all). I might propose a toggle to see the group metrics on the carbon card, but I'm unsure how much interest there would be in that. Does anyone have a suggestion on how we should show group metrics in terms of carbon?

I went ahead and tried the toggle solution to this issue, and I think it works nicely, but I'm open to other suggestions and feedback!

Simulator.Screen.Recording.-.iPhone.13.Pro.-.2023-08-29.at.17.24.13.mp4

@JGreenlee
Copy link
Author

JGreenlee commented Aug 30, 2023

It's a bit unclear to me what the 'group' option represents here. Is that the cumulative emissions for the entire group? Or is it representing the 'average user' in the group?

If it's cumulative, it doesn't make sense to show the goals there because those are on a per-capita basis.

If it's 'average user', I would rather see them stacked up against my own emissions. With me on one tab and 'average user' on another tab, it's hard for me to compare and see if I'm doing better than average.

@Abby-Wheelis
Copy link
Member

Is that the cumulative emissions for the entire group? Or is it representing the 'average user' in the group?

It's supposed to be average, but some of the numbers (that I've seen in staging/production) don't make sense to me. "Average for Group" is the label now, and across my phones the values are: nrel-commute -> 15-638, stage-study -> 9-84, stage-program -> 9-84. Given how big NREL-commute is, was why I felt the need to isolate the group away from the user, since we're looking at more than 10x the 2030 goal there. Maybe nrel-commute stats are some sort of crazy outlier, maybe lots of people are flying?

If it's 'average user', I would rather see them stacked up against my own emissions. With me on one tab and 'average user' on another tab, it's hard for me to compare and see if I'm doing better than average.

I agree that this is the probably the most reasonable way to present "aggregate carbon". I'll test a few of my opcodes in the morning and check on the data at different points in the process to make sure that the intended result (average user) is what's actually happening, or fix it if I find the average is getting lost somewhere.

Assuming we confirm the metrics are being averaged, and nrel-commute remains an outlier, would a condition to omit that bar if it's too high make sense? I'd think a cap of 3x the 2030 goal or the user's average (whichever is bigger) might make sense here, to keep the focus on the user's choices over the collective.

@Abby-Wheelis
Copy link
Member

Abby-Wheelis commented Aug 30, 2023

and check on the data at different points in the process to make sure that the intended result (average user) is what's actually happening

I traced through the aggregate metric calculations, and at a high level the "averaging" takes place when the total for a mode for a day is divided by nUsers in the process of formatting the data from the metrics, then these are summarized for each mode by summing across the days. I sketched this out below. I'm confused what "group" is in this context, since nUsers fluctuates day-to-day. What is nUsers (gotten from the server as a part of each metric)? Is it the number of people that label in a given day, the total number of opcodes that exist? I looked on the server code, but couldn't find the definition of getDistinctUserCount.

@Abby-Wheelis
Copy link
Member

I think we should find a way to color the lines to covey that "less is more" here, my first instinct when I saw the 2030 goal off to the right was that it would be "better" if my bars were closer, which is not the case.

This is probably the most simplistic way to show that more emissions are bad, does it get the point across?

I'm not really sure how I would go about coloring the bars themselves, since the colors are tied to the labels. Right now I have "certain" and "uncertain" but I think it might make sense to change them to "labeled" and "unlabeled" to show that labeling will collapse the range. Or maybe "labeled trips" and "unlabeled trips" if that fits?

@JGreenlee
Copy link
Author

Here's something I found:

https://stackoverflow.com/a/70377422

It looks like we can set backgroundColor to a callback function inside which we have access to the raw values

      <Bar ref={barChartRef}
        data={{datasets: chartData.map((e, i) => ({
          ...e,
          // cycle through the default palette, repeat if necessary
          backgroundColor: (ctx: any) => {
            console.debug("ctx", ctx);
            if (ctx.raw.x > 100) return 'red';
            return defaultPalette[i % defaultPalette.length];
          }
        }))}}

We can get x and y values through ctx.raw.x and ctx.raw.y (among many other things inside ctx).

@JGreenlee
Copy link
Author

This is probably the most simplistic way to show that more emissions are bad, does it get the point across?

Coloring the dotted lines is great! It's small but I really think that does helps a lot. I'm not sure if the emojis are as effective though.
I'm interested to see what colored lines + colored bars would look like together

@JGreenlee
Copy link
Author

JGreenlee commented Aug 30, 2023

It would be really cool if we could get the bars to "bleed" into red as they approach the 2030 goal, like this (but horizontal):
https://stackoverflow.com/questions/60679709/chart-js-add-gradient-to-bar-chart

Or, this example where the gradient covers the full spectrum of green-red:

@Abby-Wheelis
Copy link
Member

It would be really cool if we could get the bars to "bleed" into red as they approach the 2030 goal, like this (but horizontal):

That looks cool! I'll mess mess around with it when I get the chance, maybe there's some way to show the gradient + stacked to maintain the uncertainty? I think we need to keep the distinction between certain & uncertain (or labeled and unlabeled). I thought about doing the background as a green -> red gradient, but thought that might be visually overwhelming and hard to maintain the goal lines as color transitions rather than allowing the gradient to take up the entire graph.

@JGreenlee
Copy link
Author

Couple of options:

  • energy (see my note above about being funded by DO_E_)
  • cost (per suggestion from @idillon-sfl?)

The classic travel behavior drivers are cost and time, and we have time (although maybe not super visible), but no cost.

Both of them could start with a basic value/PkmT and just multiply and combine to give the value. The energy values (which we infer emissions from) are at https://github.com/e-mission/em-public-dashboard/tree/main/viz_scripts/auxiliary_files and there's a pending PR related to cost which I'm waiting for somebody to cleanup but which does have some reasonable estimates for the cost intensity (e-mission/em-public-dashboard#36)

Would it be ok to hold off on these for this release cycle so we don't get bogged down on the rest of the rewrite?

One easy thing we can do right now, with the metrics we already have, is to show daily active minutes for the past 1-2 weeks (likely as line chart(s)). Although not as rich as a breakdown by intensity, it does at least show the data in more granular chunks and give the user more things they can explore about their data.

This way, we would have weekly active minutes on the front page, and swipable to right would be daily active minutes.

@shankari
Copy link
Contributor

shankari commented Aug 30, 2023

Would it be ok to hold off on these for this release cycle so we don't get bogged down on the rest of the rewrite?

Sure. I am always in favor of incremental progress.
I want to do a bunch of unification of the dashboard inputs and outputs anyway, across public dashboard and individual dashboard, and we can do a better design then...

@JGreenlee
Copy link
Author

Is there anything else we want to add for active minutes? Maybe more descriptives (as text)?

@Abby-Wheelis
Copy link
Member

To that end, I've started adding text to the CarbonFootprint Card. Do we think it should be in the same "view" as the graph? It needs to be associated with the graph and easy to find, but we also want to keep the dashboard as viewer-friendly as possible. A simple route would be to move the change sticker to the left and include a toggle like some of the other cards have, but that would give up a place for the leaf like in the wireframe. Do we think the leaf is an important part of indicating what the card is for?

@JGreenlee
Copy link
Author

I don't think the leaf is critical, but I do think it's most logical to have the change sticker to the right of "My Footprint".

"My Footprint | +4%" immediately makes sense, while I'm not sure that "+4% | My Footprint" does.

I'm a bit hesitant to have the text accessible only by toggling. In consideration of screen readers (which is a primary reason we even want the text), I think it's better to have the text directly in the DOM and not hidden behind a toggle.

But I agree we don't want the front card to be cluttered.

What do you think about putting the My Footprint card also in a carousel view, with a card to the right that has the text? It seems the carousel has been working really well for the other sections of the dashboard in allowing us to provide detailed information without cluttering the page vertically. And then the Dashboard would be more structurally unified - 3 sections, each with a row of cards to swipe through.

The drawback to this I see is that we would lose about 15-20 pixels of width on the My Footprint card. But I think we can afford that, especially if can shorten, abbreviate, or lower font size of the Y axis titles.

@shankari
Copy link
Contributor

shankari commented Sep 6, 2023

I am fine with dropping the leaf. Carousel sounds good in principle, would be good to see what it looks like

Abby-Wheelis pushed a commit to JGreenlee/e-mission-phone that referenced this issue Sep 6, 2023
e-mission/e-mission-docs#961 (comment) -> implementing this suggestion

isolate the text to a dedicated card, and place the "meter" card and "text" card in a carousel, now we have three rows, each a carousel.

also isolated data management functions shared across the two cards into `metricsHelper.ts`

The two cards keeps information easily accessible to those using a screen reader, while maintaining focus on the "meter" card and not cluttering the screen
@Abby-Wheelis
Copy link
Member

Carousel sounds good in principle, would be good to see what it looks like

Here are the two cards side-by side in the carousel! I think Jack's right that a little bit of work on the y axis titles would help with the meter staying in perspective.

@JGreenlee
Copy link
Author

Here is what I would suggest then

  • Abbreviate to "Prev. Week" and "Group Avg."
    • note that we'll also have to consider abbreviations for other languages, (esp. Spanish which I believe tends to be a bit more verbose)
  • Lower font size of axis titles by 1 or 2 pt
    • looks like scales.y.ticks.fontSize controls this

@Abby-Wheelis If you have time, else I'll try it out later because I'll be working several hours tonight

@Abby-Wheelis
Copy link
Member

If you have time, else I'll try it out later

I made your abbreviation suggestion in en.json, but have not changed the font size.

JGreenlee added a commit to JGreenlee/e-mission-phone that referenced this issue Sep 7, 2023
in an effort to give some more space to the chart itself
e-mission/e-mission-docs#961 (comment)
@Abby-Wheelis
Copy link
Member

I'm confused what "group" is in this context, since nUsers fluctuates day-to-day. What is nUsers (gotten from the server as a part of each metric)? Is it the number of people that label in a given day, the total number of opcodes that exist? I looked on the server code, but couldn't find the definition of getDistinctUserCount.

I don't think I ever figured out where nUsers was coming from as used in calculating the group metrics. I did confirm that it goes up and down sometimes as time advances, so it's not just from new people joining the study. @shankari do you have any insight on where this "group size" is drawn from?

@shankari
Copy link
Contributor

shankari commented Sep 14, 2023

I believe that nUsers is the number of unique users who are in the dataset that we are sending over.
https://github.com/e-mission/e-mission-server/blob/55704fc0296ba70d7af18351dd43ee1dcd88f50d/emission/analysis/result/metrics/time_grouping.py#L164

If you look at the code just below that, we support both mode_confirm or sensed_mode, which means that we shouldn't have filtered before. And we don't even send over only the filtered results, because we do send over unlabeled for mode_confirm. So it should be the number of users who had even one trip recorded.

So here's where we read the data
https://github.com/e-mission/e-mission-server/blob/55704fc0296ba70d7af18351dd43ee1dcd88f50d/emission/analysis/result/metrics/time_grouping.py#L47

and then we adjust_userinputs which clearly has unlabled trips because we replace any unlabeled trips with the string "unlabeled"

and then we just find the number of users (in grouped_to_summary) without any further filtering

@shankari
Copy link
Contributor

From the public dashboard, this is roughly what we should expect for NREL commute

Number of trips Number of miles
Screenshot 2023-09-14 at 11 45 19 AM Screenshot 2023-09-14 at 11 46 16 AM

@Abby-Wheelis
Copy link
Member

I looked into the carbon footprint values, and I am seeing some differences especially on user footprints for a given week, but I haven't figured out exactly what is happening. I checked the FootprintHelper.getFootprintForMetrics against a sample input of [{key: "drove_alone", values: 100000}] and the output was exactly the input in km * the US intensity for a car, so I have reason to believe that function is working. There are some warnings in the logs about "other" modes, but that is from "unlabeled" distances and the default value (0 for "low estimates" and the highest mode value for "high estimates") is used.

I think there might be a chance that the dates are fetched differently between the two implementations, but I need to keep walking through what's happening in the emulator tomorrow. The other thing I can think of is that somewhere in the summation of the distances there is an error, so I plan to examine that process tomorrow as well.

@JGreenlee
Copy link
Author

To see if there are any inaccuracies in our calculations, I thought I'd do a side-by-side comparison between this branch and master to find discrepancies.

But interestingly, I found discrepancies within master itself. Same OPCode, same date range - different values

image

@JGreenlee
Copy link
Author

The left is the devapp running on a real Android phone. The right is the devapp running on an iOS Simulator.

I can't think of any good reason for these to yield different metrics

@JGreenlee
Copy link
Author

The new dashboard is at least consistent with itself, but none of the calculations line up with the old dashboard

image

It doesn't seem to be as simple as just being off by 1 day, either. This is going to take some digging...

@Abby-Wheelis
Copy link
Member

Abby-Wheelis commented Sep 15, 2023

^I agree that it's taking some digging, so far I've found one different between old and new:

The default calls seem to be different between what's currently in production and the new dashboard, when I opened them both just now, production is showing Sept.8-Sept15, but the metrics used to populate are only the 1st through the 13th (13 days, not 14?) while the new dashboard pulls Aug 31st through Sept14th (15 days of data) by default. The extra day is trimmed off when the data is divided into weeks (31-6 and 7-13).

However, when I set the dates to a single week on my phone (I can't alter the dates on production in the emulator) the numbers are still off by 20-30% between production and the new dashboard, which is very significant.

My next step is to hand-calculate what's on the new dashboard based on the data it's using, maybe something got lost in the math when I was re-writing the formatting functions

@JGreenlee
Copy link
Author

Good plan. If the old dashboard is not a reliable source of truth (it seems like it might not be), I think we can use hand-calculations as a better ground truth to compare the new dashboard against

@JGreenlee
Copy link
Author

the new dashboard pulls Aug 1st through Sept14th (15 days of data) by default. The extra day is trimmed off when the data is divided into weeks (31-6 and 7-13).

I'm not sure what you meant by this. If you meant Sept 1st through the 14th, that is 14 days.
Or did you mean to type Aug 31st through Sept 14th?

@Abby-Wheelis
Copy link
Member

I'm not sure what you meant by this. If you meant Sept 1st through the 14th, that is 14 days.
Or did you mean to type Aug 31st through Sept 14th?

oops! yes, Aug 31st. I had noted that the userMetrics and aggMetrics received by the CarbonFootPrintCard were for those 15 days by default, but the data used for calculations ends up being the 31st - 13th after the data is split into weeks.

@Abby-Wheelis
Copy link
Member

I think we can use hand-calculations as a better ground truth to compare the new dashboard against

The good news is that things check out agains my hand calculations, with the exception of handling "shared ride" modes. When I added up one of my weeks, I had: 78km of driving, 5.2km of walking, and 9.4km of biking. The only emitting mode there is driving, and 78km * 0.1659 = 12.7kg, which is the same thing the dashboard shows. I did the math by hand, then stepped through the footprint calculations in the emulator and saw that the results and the steps matched.

I also calculated a second week by hand, and the answer was still correct, but I noticed my 4km of "shared ride" were treated as the same as "drove alone" as they share the base mode "CAR". If I recall correctly "shared" modes should be calculated as 1/2 the intensity?

Aside from the snag on shared modes, both weeks were the same what I got by hand, and the totals between the two weeks matched the distance card. Carbon and Distance:

My values based on the metrics split into weeks -- I treated this as the input data, summing across the days by hand:
31-6 -> 77km drove alone, 5.2km walk, 9.4km bike = 77*0.17 = 12.7kg
7-13 -> 163km drove alone, 4km shared ride, 5.5km walk, 12km bike = 167*0.17 = 27kg

I now feel confident that the "binning" process is correct, as are the carbon calculations based on base mode and distance.

Next steps: follow up on shared modes and check that the metrics I used as input here match the actual trips

@Abby-Wheelis
Copy link
Member

check that the metrics I used as input here match the actual trips

I just added up all the displayed distances by their labeled modes for the same two weeks (8/31-9/14) by scrolling back through my labeled trips and got: drove alone: 152 miles, shared ride: 3.8 miles, walk: 6.6 miles, bike: 13.4 miles -- the three miles different on the "drove alone" can be explained by the fact that I was using the displayed mileage on the label screen and had lots of trips by that mode - so I believe 3 miles of rounding error. And, for the record, the old dashboard shows the same distances within about 0.3 miles.

So I'm now pretty convinced that the calculations are right -- I've checked the distances used for footprint against the actual trips and stepped through the footprint calculations, everything is checking out against what I get by hand.

@Abby-Wheelis
Copy link
Member

Looking into the shared mode, and I think I found the explanation for the discrepancies: when calculating the footprint, the mapping of modes to values is retrieved through a method getFootprint() which either fetches a custom footprint, or the current carbon dataset (in our case that's usually US). What the new dashboard is using is the default (US) values, which have less values -- only CAR no e-car, no shared ride. Production uses a custom footprint that has different values as well as many more modes (including e-car, shared ride, etc).

I'll start working on a fix to ensure that the custom footprint is used, when needed, in the new dashboard.

@JGreenlee
Copy link
Author

Good catch - that's a mistake I made in e-mission/e-mission-phone@5fcc5d4

I was thinking that footprints were tied to base modes, but they are actually specific to each rich mode because we list the kgCo2PerMeter for each mode in label option.

Abby-Wheelis pushed a commit to JGreenlee/e-mission-phone that referenced this issue Sep 15, 2023
we had figured out that there were some differences e-mission/e-mission-docs#961 (comment)

Eventually, we realized this was because the new dashboard was not using the custom labels.

This commit adds the methods that check to see if the labels are custom or sensed to `metricsHelper`, checks for custom labels and indicates the need for custom footprint mappings in `CarbonFootprintCard` and the finally reverts back to "rich modes" rather than "base modes" in `metrics-factory` so we use the custom labels
@Abby-Wheelis
Copy link
Member

I needed to add code to handle deciding if we used a custom dataset, and also reverted back to using the rich modes rather than the base modes. The production and dashboard now match a little more. If I select the same date range with the same opcode with the new dashboard in the emulator and on my phone on production, the carbon values now match when the mileages by mode match (I've been comparing to compensate for date ranges getting picked differently. For example, 8/29 - 9/11 shows 21+28 on the new dashboard and 49 on production.

The "taxi" values are hard to compare, since we show the whole number now rather than "savings" to stay consistent with the meter. The "group" values still vary a lot between dates, and between production and the new dashboard, so I'll dig into those more next.

@Abby-Wheelis
Copy link
Member

I've stepped through a group calculation, and confirmed that the custom footprint is now used for the group as well as individual users, and that the metrics are averaged by dividing the total distance for a mode in a day by nUsers.

"Taxi" values seem to align well, as [week total on new dashboard] + [taxi savings on production] = [if all taxi on the new dashboard] over a given week, which is what we would expect.

@JGreenlee
Copy link
Author

New dashboard is now merged into master with e-mission/e-mission-phone#1046

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants