Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Easy-to-use data download tools #681

Closed
Rlamboll opened this issue Jul 11, 2022 · 7 comments · Fixed by #696
Closed

Easy-to-use data download tools #681

Rlamboll opened this issue Jul 11, 2022 · 7 comments · Fixed by #696
Assignees

Comments

@Rlamboll
Copy link
Collaborator

Rlamboll commented Jul 11, 2022

Chris asked if we can add an easy-use tool to download emissions data (possibly checking if the data is already downloaded) from the AR6 database to Silicone, since I put in such a tool for SR1.5. GranthamImperial/silicone#148. It avoids the need to look up what connections are used for this.

It would be more logical for these tools to be part of the main pyam repository because 1) they will get better upkeep if you change the database behaviour 2) they are out-of-scope for Silicone and were just added for my convenience.

Proposed resolutions:

  1. We write a utility function to do this for AR6
  2. We transfer the code doing this for SR1.5 to pyam and replace the Silicone utility with a redirect call to the pyam tool in a later version.

Are there any objections to doing this?

@gidden
Copy link
Member

gidden commented Jul 11, 2022

Hi @Rlamboll - this seems like a great addition to pyam. Could you perhaps first outline what kind of functionality would be enabled on top of the current Connections to IIASA explorer infrastructure? That would help us also guide where to place this kind of utility within the library.

@Rlamboll
Copy link
Collaborator Author

It would mimic/generalise these two functions, extracting emissions data from the IIASA database unless it had already been downloaded previously:

`def download_or_load_sr15(filename, valid_model_ids="*"):
"""
Load SR1.5 data, if it isn't there, download it

Parameters
----------
filename : str
    Filename in which to look for/save the data
valid_model_ids : str
    Models to return from data

Returns
-------
:obj: `pyam.IamDataFrame`
    The loaded data
"""
if not os.path.isfile(filename):
    get_sr15_scenarios(filename, valid_model_ids)
return pyam.IamDataFrame(filename).filter(model=valid_model_ids)

def get_sr15_scenarios(output_file, valid_model_ids):
"""
Collects world-level data from the IIASA database for the named models and saves
them to a given location.

Parameters
----------
output_file : str
    File name and location for data to be saved

valid_model_ids : list[str]
    Names of models that are to be fetched.
"""
conn = pyam.iiasa.Connection("IXSE_SR15")
variables_to_fetch = ["Emissions*"]
for model in valid_model_ids:
    print("Fetching data for {}".format(model))
    for variable in variables_to_fetch:
        print("Fetching {}".format(variable))
        var_df = conn.query(
            model=model, variable=variable, region="World", timeslice=None
        )
        try:
            df.append(var_df, inplace=True)
        except NameError:
            df = pyam.IamDataFrame(var_df)

print("Writing to {}".format(output_file))
df.to_csv(output_file)

`

@gidden
Copy link
Member

gidden commented Jul 13, 2022

Perhaps I'm wrong (and please tell me if so!), but wouldn't the second function be simply:

df = pyam.read_iiasa(
    'IXSE_SR15',
    model=valid_model_ids,
    variable=['Emissions|*'],
    region='World',
)

If so, I think this functionality is already supported, and silicone may want to just keep a local copy of the data as desired. May @znicholls or @chrisroadmap can chime in if they want to.

One thing we could think about adding are local on-disk caches of previously queried data. I've done that in prior projects, but it gets a bit tricky having to embed version information in the cache so you know when it needs to be redownloaded, etc. But I'm open to suggestions!

@danielhuppmann
Copy link
Member

FWIW, the pyam.IIASA.Connection class has a properties() method that allows to check when each scenario was last updated - this could be helpful to check if a local copy of the data needs to be updated…

@znicholls
Copy link
Collaborator

I think the download or load idea is helpful. It's just never been clear to me where it best fits. Options I see:

  • github gist (pros: easy and clear, cons: not tested or maintained really)
  • pyam (pros: existing package, cons: not really within package scope)
  • new package (e.g. database_utils) (pros: clear package scope, cons: one more thing to maintain)

I'm happy with whatever (and I don't think this is really a blocker for anyone as writing your own workaround is so simple) but I think a solution to these sort of utility bits and pieces could be a great community sharing/cooperation thing.

@gidden
Copy link
Member

gidden commented Jul 14, 2022

Given its close proximity to the Connection style of operation, I think pyam can be a good fit for this.

Based on the discussion here, I'd propose the following. We implement pyam.lazy_read() or similar that takes optionally arguments or a path to a yaml configuration file holding the arguments. Those arguments would be:

  1. path to local file location
  2. args to pyam.IIASA.Connection for reading

The method then would check if either the file does not exist or if the properties() are out of date. If so, downloads - otherwise, reads the file in.

Would that be something you could take a crack at @Rlamboll?

@Rlamboll
Copy link
Collaborator Author

it can go somewhere on my to-do list!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants