Should variable-unit pairs be unique in a scenario (ensemble)? #338

danielhuppmann · 2020-02-26T12:47:11Z

Description

A comment by @gidden in #335 raises a thorny question that deserves its own issue for discussion (I think):

Should any variable in a scenario (or scenario ensemble) be linked to a specific unit, or should we allow non-unique combinations?

Why could that happen

a model reports several units for easier visualization workflows
an IamDataFrame combines multiple models reporting natively in different units
a user converts data from one unit to another for some reason, but wants to keep the "original data" as is e.g., using df.append(df.convert_unit(...)) (the default behaviour of unit conversion is a replacement of the data, not appending).

Immediate problems?

AFAIK all functions (aggregation, etc.) work on both variable and unit separately, so there is no risk of accidentally summing apples and oranges and get fruit punch. But I might be wrong...

Relevant example as food for thought

One relevant example In the SR15: we had three variables

Emissions|Kyoto Gases (AR4-GWP100)
Emissions|Kyoto Gases (AR5-GWP100)
Emissions|Kyoto Gases (SAR-GWP100)

each with the unit Mt CO2-equiv/yr. But could also have had one variable Emissions|Kyoto Gases and "stored" the GWP-metric in the unit.

Way forward

I see three options:

no validation or constraints in pyam
no hard constraints, but add something like assert_variable_consistency() which returns all non-unique pairs
validation of consistency (raise an error if [variable, unit] is non-unique)

The text was updated successfully, but these errors were encountered:

danielhuppmann · 2020-02-26T12:49:10Z

@gidden @znicholls @byersiiasa @khaeru @Rlamboll - any comments or preferences?

khaeru · 2020-02-26T15:51:02Z

Comments but no preferences:

Write tests for the unusual cases described in the OP.
About the IAMC format, my only thought is there should be a spec of what is required, suggested, and optional.
- Then, you can choose behaviour for pyam that matches or is more strict than the spec requirements.
- Validation tools like (2) and (3) in the OP are a good way to start.
- Use a function argument to control whether they generate warnings, or exceptions.
- Use configuration to control whether they are applied automatically, or not.
In SDMX terms:
- From what I've seen, UNITS are typically stored as an Attribute, rather than a Dimension.
- If the Attribute is attached to a Series or Dataset, then it's the same for all Observations therein. But the different kinds of attachment can be mixed in the same Dataset/message.
- If storing an Emissions measure, I would use:
  - a Dimension for the SPECIES, but
  - a separate Attribute for GWP_METHOD or something like that, with values like "AR4-GWP100", when UNITS == "t CO2-equiv/year"; but omit the attribute when e.g. UNITS == "t CH4/year" (no conversion based on GWP has been applied).

Rlamboll · 2020-02-26T16:55:21Z

I've been assuming model / scenario / region /variable / time is a unique key, and throwing errors if I see multiple units for a variable, so everything would need to be filtered before being put into Silicone if you allow multiple unit/variable pairs. It's worth focusing on the fact that the values we use in calculations are mostly the gas concentrations themselves (no -equiv), so I'd be happier with always specifying what equivalence we mean in the units than having a separate column for AR5 GWP100 etc., and extending our unique keys further. I don't know that we gain much by moving this from the variable name to the unit, although it is slightly more accurate. It looks like there have been problems with different equivalence factors in the past though so maybe a carefully-thought-through approach is good here.

My general response to the Why Could That Happen is that any specific sub-task should use a specific, filtered DataFrame anyway. I encounter a version of this problem when constructing the basket-values and then downscaling them, and simply have one DataFrame with CO2-equiv values to calculate the baskets and another with the gas-specific measurements to downscale.

danielhuppmann · 2020-02-26T20:31:04Z

Well, so at least we identified one potential misunderstanding between the different tools!

One immediate reaction, though - the current filter() in pyam only works per column, so it is not possible to say: give me Emissions|CO2 in CO2 and Emissions|CH4 in CH4 (e.g. to filter out CH4 emissions in CO2-equivalent). You'd have to do both filters separately and then concatenate again... If there is a specific need for that, please open an(other) issue.

znicholls · 2020-03-02T05:10:08Z

No strong thoughts from me! Discussion here looks good.

gidden · 2020-03-16T07:44:33Z

Hi all - picking this back up after our units-related PRs are starting to wrap up. First things are that perhaps this should be picked up a bit higher than this toolchain in the IAMC data group, as suggested by @khaeru:

About the IAMC format, my only thought is there should be a spec of what is required, suggested, and optional.

My immediate gut reaction is that variables should have a unique unit mapping (in pyam). Reasons for this is:

it has been our implicit assumption thus far
it simplifies internal design
it can be circumvented as needed by putting specific conversion information in variable names without breaking the data model

But that is only my gut feeling, and those having implemented already the unit conversions might have more informed opinions. @danielhuppmann, @khaeru, others?

danielhuppmann · 2020-03-16T08:42:23Z

I'm not opposed to that approach in principle, just slightly worried about how big the refactoring will be...

If someone volunteers to go ahead (maybe jointly with switching to pd.Series or xarray), please keep in mind:

implement validation during initialisation
append() should raise an error when trying to merge timeseries with inconsistent variable-unit mappings (currently, this wouldn't be a concern)
rename(unit=<mapping>) should continue to work as is

khaeru · 2020-03-17T09:54:23Z

About the IAMC format, my only thought is there should be a spec of what is required, suggested, and optional.

My immediate gut reaction is that variables should have a unique unit mapping (in pyam)

Nothing new to add, and still no preferences, since I don't use pyam. If you go this route, then I would again suggest making a clear distinction in your docs re: what is (a) part of the IAMC format spec and (b) pyam behaviour (limitations/automated helpful checks).

ETA: it's fine for (a) and (b) to be different, so long as the difference is advertised!

Rlamboll mentioned this issue Mar 24, 2020

Nans in additional columns #351

Closed

danielhuppmann mentioned this issue Jun 22, 2021

Add a unit_mapping attribute to show a variable -> unit dictionary #548

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should variable-unit pairs be unique in a scenario (ensemble)? #338

Should variable-unit pairs be unique in a scenario (ensemble)? #338

danielhuppmann commented Feb 26, 2020

danielhuppmann commented Feb 26, 2020

khaeru commented Feb 26, 2020

Rlamboll commented Feb 26, 2020 •

edited

Loading

danielhuppmann commented Feb 26, 2020

znicholls commented Mar 2, 2020

gidden commented Mar 16, 2020 •

edited

Loading

danielhuppmann commented Mar 16, 2020 •

edited

Loading

khaeru commented Mar 17, 2020 •

edited

Loading

Should variable-unit pairs be unique in a scenario (ensemble)? #338

Should variable-unit pairs be unique in a scenario (ensemble)? #338

Comments

danielhuppmann commented Feb 26, 2020

Description

Why could that happen

Immediate problems?

Relevant example as food for thought

Way forward

danielhuppmann commented Feb 26, 2020

khaeru commented Feb 26, 2020

Rlamboll commented Feb 26, 2020 • edited Loading

danielhuppmann commented Feb 26, 2020

znicholls commented Mar 2, 2020

gidden commented Mar 16, 2020 • edited Loading

danielhuppmann commented Mar 16, 2020 • edited Loading

khaeru commented Mar 17, 2020 • edited Loading

Rlamboll commented Feb 26, 2020 •

edited

Loading

gidden commented Mar 16, 2020 •

edited

Loading

danielhuppmann commented Mar 16, 2020 •

edited

Loading

khaeru commented Mar 17, 2020 •

edited

Loading