Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to optimise looking for files required for recipes? #730

Closed
ehogan opened this issue Jul 8, 2020 · 8 comments
Closed

Is there a way to optimise looking for files required for recipes? #730

ehogan opened this issue Jul 8, 2020 · 8 comments
Labels
AutoAssess Issues relevant to the conversion of AutoAssess metrics from Met Office to ESMValTool

Comments

@ehogan
Copy link
Contributor

ehogan commented Jul 8, 2020

The recipe_diurnal_temperature_index.yml recipe requires 12 files; in the main_log_debug.txt file there are 12 entries with the format Looking for files matching ['<filename>.nc'] in ['<data_path>']. However, the data_path we are using contains many files (38T!), which means that the searching step for each file takes up to 2 minutes, so for 12 files takes over 23 minutes!

The workaround we're using is to create a new directory, create links to the required files in this directory, then use this directory as the data_path.

Is there a way to optimise looking for files required for recipes?

@bouweandela bouweandela transferred this issue from ESMValGroup/ESMValTool Jul 24, 2020
@bouweandela
Copy link
Member

That would require some changed to the code that looks for files. Maybe it can be taking into account together with #281.

@valeriupredoi
Copy link
Contributor

we have a to-do item that will use wildcards to select all data files with a certain naming template see this too #589 - I will have to get my head around this and start working on it soon 🍺

@ehogan
Copy link
Contributor Author

ehogan commented Jul 27, 2020

@bouweandela @valeriupredoi that sounds great, thanks! :)

@bouweandela
Copy link
Member

@mattiarighi The work still needs to be done first..

@mattiarighi
Copy link
Contributor

I closed because it looks like a duplicate of #589

@bouweandela
Copy link
Member

It's a different problem, only related to the same code.

@bouweandela
Copy link
Member

I will have to get my head around this and start working on it soon

@valeriupredoi Maybe you can try writing a cached version of os.scandir by using functools.lru_cache and then patch the glob module so it uses that?

@ehogan ehogan added the AutoAssess Issues relevant to the conversion of AutoAssess metrics from Met Office to ESMValTool label Dec 10, 2021
@mo-tgeddes
Copy link
Contributor

@ehogan and I ran this again and it is no longer a problem so I will close the issue. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AutoAssess Issues relevant to the conversion of AutoAssess metrics from Met Office to ESMValTool
Projects
None yet
Development

No branches or pull requests

5 participants