Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvement: Speeding up EUPS #138

Open
mwittgen opened this issue Jan 4, 2021 · 5 comments
Open

Improvement: Speeding up EUPS #138

mwittgen opened this issue Jan 4, 2021 · 5 comments

Comments

@mwittgen
Copy link

mwittgen commented Jan 4, 2021

eups distrib install calls run very slow as hundreds with temp files are cached by each invocation by sequential http gets.
This is a suggestion to run getTaggedProductInfo in a thread pool which speeds up the significantly by only changing a few lines in distrib/server.py. listDir retrieves the same data from a handful of web directories multiple times. Results are cached in a dictionary to have only one http get per web directory.
master...mwittgen:master

More optimizations/suggestions: configure number of pool threads from eups command line. Allow setting the chunk size for
url reads from command line or replace by urllib3 streaming. Run non recursive installations (product + dependencies) in a thread pool.

@timj
Copy link
Collaborator

timj commented Jan 4, 2021

Sounds interesting. Can you make a pull request please? (and preferably a Jira ticket branch)

@RobertLuptonTheGood
Copy link
Owner

I think it's done this way due to a robots.txt file that NCSA (used to?) have meaning that eups can't pull down all the files in one go. It'd probably be worth looking at seeing if this is still the case.

@ktlim
Copy link
Contributor

ktlim commented Jan 4, 2021

As far as I know, robots.txt can only allow or disallow access, not rate-limit. It is also interpreted by the client, not the server. Finally, the Rubin Observatory eups package repository is no longer hosted at NCSA.

@RobertLuptonTheGood
Copy link
Owner

The configuration meant that I couldn't pull down the entire directory in one transaction. If that is no longer the case, I think that this change might well solve the reported problem.

@mwittgen
Copy link
Author

mwittgen commented Jan 4, 2021

I don't think the http protocol supports getting multiple files in one transactions unless there's some server side support to tar/zip such a request into a message body, which leaves only the option of having multiple gets in parallel. Since many small files are requested at the beginning of each eups run this is not efficient. yum for example bundles all the repo metadata into larger zip files. Downside is the metadata files need to be updated when the repo content changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants