Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backwards-compatibility of variable names #46

Open
khaeru opened this issue Dec 6, 2023 · 5 comments
Open

Backwards-compatibility of variable names #46

khaeru opened this issue Dec 6, 2023 · 5 comments
Labels
discuss Gather ideas and consensus on specific topics

Comments

@khaeru
Copy link

khaeru commented Dec 6, 2023

At the SWG meeting on 2023-12-06, Masa Sugiyama and others raised the idea of how to support backward-compatibility if it becomes necessary to change a variable name.

This issue is to discuss/collect ideas.

@khaeru khaeru added the question Further information is requested label Dec 6, 2023
@khaeru
Copy link
Author

khaeru commented Dec 6, 2023

My suggestion:

  • In the NAVIGATE project, we made use of the fact that the nomenclature package (which is used with this repo) tolerates (or reads and stores?) extra entries in code lists. For example, we had:
    - NAV_Dem-20C-all_u:
        navigate_task: T3.5
        navigate_climate_policy: 20C
        navigate_T35_policy: act+ele+tec
    In this, navigate_T35_policy is like description, units, or other attributes.
  • This is analogous to/imitates the SDMX concept of an Annotation.
  • We should simply specify a common annotation ID that would contain 1 or a list of older/superseded/alias variable names. For instance:
    - Final Energy|Foo|Bar:
        iamc-variable-superseded: |
          Final Energy|Bar|Foo
          Final Energy|Foo Bar
    It could be iamc-variable-synonym, iamc-variable-old, or anything—I don't have any strong preference here.
  • Code that needs to handle older data could then access these annotations for info on the correspondence of old and current names, for instance to construct a "mapping" or "table", perform replacement, or whatever makes sense in a particular implementation.
A minimal working example (MWE) using SDMX:
import sdmx
import sdmx.model.v21 as m

# Create a Code whose ID is a current variable name
c = m.Code(id="Final Energy|Foo|Bar")

# Create an annotation containing old/superseded variable names
ann = m.Annotation(
    id="iamc-variable-old",
    text="\n".join(
        ["Final Energy|Bar|Foo", "Final Energy|Foo Bar"],
    )
)
c.annotations.append(ann)

# Write to file
cl = m.Codelist(id="VARIABLE", name="IAMC variable name")
cl.append(c)
msg = sdmx.message.StructureMessage()
msg.add(cl)
with open("example.xml", "wb") as f:
    f.write(sdmx.to_xml(msg, pretty_print=True))

This gives output like:

…
  <str:Code id="Final Energy|Foo|Bar">
    <com:Annotations>
      <com:Annotation id="iamc-variable-old">
        <com:AnnotationText xml:lang="en">Final Energy|Bar|Foo
Final Energy|Foo Bar</com:AnnotationText>
      </com:Annotation>
    </com:Annotations>
  </str:Code>

And can be read and used like:

# Read the file, retrieve the codelist
>>> msg = sdmx.read_sdmx("example.xml")
>>> cl = msg.codelist["VARIABLE"] 

# Retrieve a specific variable name
>>> c = cl["Final Energy|Foo|Bar"]
>>> c
<Code Final Energy|Foo|Bar>

# Retrieve the list of old names from the annotation
>>> c.eval_annotation("iamc-variable-old").split("\n")
['Final Energy|Bar|Foo', 'Final Energy|Foo Bar']

@khaeru khaeru added discuss Gather ideas and consensus on specific topics and removed question Further information is requested labels Dec 6, 2023
@christophbertram
Copy link
Contributor

Do I understand it right that you say we can in principle add as many entries as we want? The old examples of the ENGAGE and NAVIGATE template only seem to have the entries "description" and "unit", but you say we could also add extra entries for storing the 'old' name.
And then similarly, we could also create extra entries to denote maximum and minimum allowed per-capita values, and aliases with other data structures (e.g. the iTEM transport variable names or similar).

@khaeru
Copy link
Author

khaeru commented Dec 7, 2023

@christophbertram I say we should agree on as many common annotations as we need, and that doing so is a feature of the SDMX standard (and supported by tools that implement it). What I don't know is whether the nomenclature tool that @phackstock and @danielhuppmann have developed supports access and use of such annotations: I only know we can put such entries in YAML files such as appear in this repo and they will be tolerated by nomenclature, i.e. it won't error when trying to read the files.

Per full-resolution keys: yes, exactly. I hope we can provide a proof-of-concept when linking the iTEM structure info to this repo.

Per "minimum and maximum allowed values per capita"—I think that is actually data, not structure. You can imagine an IAMC-structured table (or with fewer or more dimensions, e.g. possibly without YEAR or REGION) in which the numbers are not "actual observed historical values" nor "model-projection values" but "expected {minimum,maximum} per capita values". One could imagine having different sets of such values for different purposes, even when the same variable names are used.

@danielhuppmann
Copy link
Member

danielhuppmann commented Dec 11, 2023

Thanks for raising this issue, see a few comments below. Let's please try to keep issues and discussions narrow and start new issues where possible.

Cross-reference to legacy variables/regions or other standards: this is already implemented in a simple example here, see

navigate: Final Energy|Carbon Removal|Electricity|{Carbon Removal Option}

and the value can be accessed from the nomenclature.DataStructureDefinition as

dsd.variable["Final Energy|Carbon Removal|Direct Air Capture|Electricity"].navigate

If you have specific suggestions for feature-support in nomenclature, e.g. as a "known" attribute with dedicated documentation, please start an issue there.

Validation of values should indeed be handled as a separate use-case and will be implemented similar to the required-data feature in nomenclature, see here. This PR IAMconsortium/pyam#804 is a step towards support for that feature.
The main reason for keeping this separate is that different projects may want to use different reference data or validation thresholds.

@FlorianLeblancDr
Copy link
Contributor

I think this is partly fixed by yesterday's Daniel commit
#PR61

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Gather ideas and consensus on specific topics
Projects
None yet
Development

No branches or pull requests

4 participants