Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access individual DOI archive files for better performance #25

Closed
tomgoddard opened this issue Mar 25, 2017 · 3 comments
Closed

Access individual DOI archive files for better performance #25

tomgoddard opened this issue Mar 25, 2017 · 3 comments

Comments

@tomgoddard
Copy link
Collaborator

Currently fetching the DOI archive for the exosome from Zenodo takes about 30 minutes (1.3 Gbytes). This means the exosome IHM file is not viewable in ChimeraX for 30 minutes after trying to open it. This is ridiculously slow and unusable given it is just trying to get some small localization density maps from the file. I believe the bulk archive is ensemble models (which are currently not referenced by the IHM file).

If Zenodo allows accessing individual files from the DOI that should be used in the IHM file (ihm_external_files table) to improve performance, so only the data files that are actually being viewed get downloaded.

The current slow performance will inhibit most users of these files. If the files are only available as one Gbyte download this is a poor design and other archiving methods that allow access to individual files should be investigated.

@tomgoddard
Copy link
Collaborator Author

Not only is the access slow, but my current attempt to have ChimeraX download the exosome DOI archive (1.4 Gbytes) failed after 30 minutes after 0.7 Gbytes were received. Total download time would likely be 1 hour if it succeeds.

@benmwebb
Copy link
Contributor

I don't think we can currently access individual files, since Zenodo simply archives a zip file of the entire GitHub repository. So we have three options:

  1. Ask Zenodo to unzip the file at their end and allow downloading individual files.

  2. Follow the link from Zenodo to GitHub (e.g. for exosome, https://zenodo.org/record/60731 links to https://github.com/integrativemodeling/exosome/tree/v1.0). GitHub then allows downloading individual files.

  3. Break the archive up into several smaller zip files (e.g. input data, bulk ensemble, density localizations, cluster representatives).

Obviously (1) is more work for Zenodo, (2) is more work for Chimera, (3) is more work for depositors.

@tomgoddard
Copy link
Collaborator Author

The above timing of 30-60 minutes for downloading the exosome archive was on a home network (~5 Mbits / sec). From UCSF on a fast network the download took 5 minutes for the 1.3 Gbytes.

benmwebb added a commit to integrativemodeling/exosome that referenced this issue Apr 7, 2017
A single zipfile of the entire repository is very large (1.4GB)
and most of this is occupied by output trajectories, which most
users won't need to access anyway. Split these larger files out
into their own zipfiles. Relates ihmwg/IHMCIF#25.
benmwebb added a commit that referenced this issue Apr 7, 2017
This splits the externally referenced data into multiple
files, to make it more convenient to download. Relates #25.
benmwebb added a commit that referenced this issue Apr 20, 2017
This splits the externally referenced data into multiple
files, to make it more convenient to download. Relates #25.
benmwebb added a commit to integrativemodeling/nup84 that referenced this issue May 17, 2017
A single zipfile of the entire repository is very large (~1GB)
and most of this is occupied by output trajectories, which most
users won't need to access anyway. Split these larger files out
into their own zipfiles. Relates ihmwg/IHMCIF#25.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants