Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include definitions from GitHub repo #265

Merged

Conversation

phackstock
Copy link
Contributor

@phackstock phackstock commented Jul 20, 2023

First version of the external repo fetching feature.

How to use

In the definitions folder of a project place a file called definitions with the following content:

region:
  repository: https://github.com/IAMconsortium/common-definitions.git/
  # hash: optionally specify a commit hash of the repository
  # release: optionally specify a release of the repository

the hash and release attributes are both optional.
If neither is given, the last version of the main branch is used.
They are also mutually exclusive. If both are given an error is raised.

Implementation details

The repo fetching logic is implemented as follows:

  1. If the repo is not present, clone it
  2. Perform a hard reset to get rid of any changes
  3. Check out the desired revision

@phackstock
Copy link
Contributor Author

Seems like the cloning of the common-definitions repo only worked locally because of my ssh keys. Working on a fix.

@danielhuppmann
Copy link
Member

The implementation looks good, thanks! But I still think that the configuration would be more elegant and future-proof if we do:

repository:
  common-definitions:
    git: git@github.com:IAMconsortium/common-definitions.git
    hash:  # optional
    release:  # optional
region:
    repository: common-definitions
    path:  # optional, defaults to `definitions / regions`
variable:
    repository: common-definitions
    path:  # optional, defaults to `definitions / variable`

This would make the repository_name explicit and customizable.

And the fetch_repo() should happen when initializing the DataStructureConfig (if a repository is given), so that the CodeListConfig only has to know which path ("repository" / <repository_name> / <path>) to parse in addition to its "own" (local) folder.

@phackstock
Copy link
Contributor Author

The implementation looks good, thanks! But I still think that the configuration would be more elegant and future-proof if we do:

Agreed, I'll change it that way.

@phackstock
Copy link
Contributor Author

Update

@danielhuppmann I have updated DataStructureConfig and all associated classes according to your suggestions.
The configuration file looks now like this:

repository:
  common-definitions:
    url: https://github.com/IAMconsortium/common-definitions
region:
  repository: common-definitions
  country: true
variable:
  repository: common-definitions
  repository_definition_path: definitions / variable

I changed your git attribute to url but happy to change it back. The repository url needs to needs to end with .git/ otherwise the gitpython library fails to fetch the repository.
I've also changed the path attribute in the dimensions, region and variable, in this example, to repository_definition_path. I understand that it's a bit clunky and will probably be changed but I was confused by just the name path. Reason being is that it could also refer to the path of the local repositories codelist folder. Happy to suggestions to rename. Or even leave out for now as I think we can enforce the structure of an external repository to be "repo/definitions/dimension".

I've also, according to your suggestion moved the repository fetching into the constructor of DataStructureConfiguration, much cleaner this way. Now all repos are automatically fetched upon initialization.

In addition I've extended the use of external repos from just region to any dimension.
The test that I've modified to cover this also tests for the case that there is no local codelist folder. In case of the test, there's no variable folder. Works perfectly.

Copy link
Member

@danielhuppmann danielhuppmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! A few suggestions for alternative keyword names, but good to be merged either way...

tests/test_definition.py Show resolved Hide resolved
nomenclature/config.py Outdated Show resolved Hide resolved
nomenclature/config.py Outdated Show resolved Hide resolved
nomenclature/config.py Outdated Show resolved Hide resolved
nomenclature/config.py Outdated Show resolved Hide resolved
nomenclature/config.py Outdated Show resolved Hide resolved
@phackstock phackstock merged commit 9f3e919 into IAMconsortium:main Jul 24, 2023
@phackstock phackstock deleted the feature/include-github-definitions-repo branch July 24, 2023 08:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants