-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Access individual DOI archive files for better performance #25
Comments
Not only is the access slow, but my current attempt to have ChimeraX download the exosome DOI archive (1.4 Gbytes) failed after 30 minutes after 0.7 Gbytes were received. Total download time would likely be 1 hour if it succeeds. |
I don't think we can currently access individual files, since Zenodo simply archives a zip file of the entire GitHub repository. So we have three options:
Obviously (1) is more work for Zenodo, (2) is more work for Chimera, (3) is more work for depositors. |
The above timing of 30-60 minutes for downloading the exosome archive was on a home network (~5 Mbits / sec). From UCSF on a fast network the download took 5 minutes for the 1.3 Gbytes. |
A single zipfile of the entire repository is very large (1.4GB) and most of this is occupied by output trajectories, which most users won't need to access anyway. Split these larger files out into their own zipfiles. Relates ihmwg/IHMCIF#25.
This splits the externally referenced data into multiple files, to make it more convenient to download. Relates #25.
This splits the externally referenced data into multiple files, to make it more convenient to download. Relates #25.
A single zipfile of the entire repository is very large (~1GB) and most of this is occupied by output trajectories, which most users won't need to access anyway. Split these larger files out into their own zipfiles. Relates ihmwg/IHMCIF#25.
Currently fetching the DOI archive for the exosome from Zenodo takes about 30 minutes (1.3 Gbytes). This means the exosome IHM file is not viewable in ChimeraX for 30 minutes after trying to open it. This is ridiculously slow and unusable given it is just trying to get some small localization density maps from the file. I believe the bulk archive is ensemble models (which are currently not referenced by the IHM file).
If Zenodo allows accessing individual files from the DOI that should be used in the IHM file (ihm_external_files table) to improve performance, so only the data files that are actually being viewed get downloaded.
The current slow performance will inhibit most users of these files. If the files are only available as one Gbyte download this is a poor design and other archiving methods that allow access to individual files should be investigated.
The text was updated successfully, but these errors were encountered: