Improvement: Speeding up EUPS #138

mwittgen · 2021-01-04T18:45:07Z

eups distrib install calls run very slow as hundreds with temp files are cached by each invocation by sequential http gets.
This is a suggestion to run getTaggedProductInfo in a thread pool which speeds up the significantly by only changing a few lines in distrib/server.py. listDir retrieves the same data from a handful of web directories multiple times. Results are cached in a dictionary to have only one http get per web directory.
master...mwittgen:master

More optimizations/suggestions: configure number of pool threads from eups command line. Allow setting the chunk size for
url reads from command line or replace by urllib3 streaming. Run non recursive installations (product + dependencies) in a thread pool.

The text was updated successfully, but these errors were encountered:

timj · 2021-01-04T20:36:11Z

Sounds interesting. Can you make a pull request please? (and preferably a Jira ticket branch)

RobertLuptonTheGood · 2021-01-04T20:38:12Z

I think it's done this way due to a robots.txt file that NCSA (used to?) have meaning that eups can't pull down all the files in one go. It'd probably be worth looking at seeing if this is still the case.

ktlim · 2021-01-04T21:54:27Z

As far as I know, robots.txt can only allow or disallow access, not rate-limit. It is also interpreted by the client, not the server. Finally, the Rubin Observatory eups package repository is no longer hosted at NCSA.

RobertLuptonTheGood · 2021-01-04T22:11:41Z

The configuration meant that I couldn't pull down the entire directory in one transaction. If that is no longer the case, I think that this change might well solve the reported problem.

mwittgen · 2021-01-04T22:48:57Z

I don't think the http protocol supports getting multiple files in one transactions unless there's some server side support to tar/zip such a request into a message body, which leaves only the option of having multiple gets in parallel. Since many small files are requested at the beginning of each eups run this is not efficient. yum for example bundles all the repo metadata into larger zip files. Downside is the metadata files need to be updated when the repo content changes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvement: Speeding up EUPS #138

Improvement: Speeding up EUPS #138

mwittgen commented Jan 4, 2021

timj commented Jan 4, 2021

RobertLuptonTheGood commented Jan 4, 2021

ktlim commented Jan 4, 2021

RobertLuptonTheGood commented Jan 4, 2021

mwittgen commented Jan 4, 2021

Improvement: Speeding up EUPS #138

Improvement: Speeding up EUPS #138

Comments

mwittgen commented Jan 4, 2021

timj commented Jan 4, 2021

RobertLuptonTheGood commented Jan 4, 2021

ktlim commented Jan 4, 2021

RobertLuptonTheGood commented Jan 4, 2021

mwittgen commented Jan 4, 2021