Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

host on cloud computing provider #26

Closed
danmackinlay opened this issue Jul 8, 2018 · 4 comments
Closed

host on cloud computing provider #26

danmackinlay opened this issue Jul 8, 2018 · 4 comments

Comments

@danmackinlay
Copy link

danmackinlay commented Jul 8, 2018

A suggestion - I notice there are a few open issues about outdated data version, so I presume the hosting of this data is inconvenient to update. As such i might be worth hosting the data somewhere else.

according to the FAQ, Microsoft Research Open Data will host data sets up to 250gb. Amazon ad probably google offer similar schemes.

@danmackinlay danmackinlay changed the title host on Microsoft Research Data host on cloud computing provider Jul 9, 2018
@danmackinlay
Copy link
Author

Amazon's AWS also hosts data sets and has a formal submission procedure for new data sets.

@dvolgyes
Copy link

Or maybe on https://zenodo.org/ ?
It is a Swiss (CERN based) data repository for scientific data sets, it gives DOI, you can link exisiting publications to it, and it has no space limit. (By default, it is 50GB, but you can contact them by email, and they will lift the limit for the given upload.)

@dvolgyes
Copy link

And Zenodo has a simple, usable API.

@mdeff
Copy link
Owner

mdeff commented Jun 13, 2020

Thanks for the suggestions! AWS and Microsoft are potential providers. I like Zenodo, but when I contacted them in May 2017 about hosting the FMA they answered: "Unfortunately the data sizes you mentioned are above of what we can accept." Another option is torrents (#32), though I don't know how convenient that is in general, and how to ensure that there's always one peer up.

The current hosting is not inconvenient to update, but I think that we should strive to update as infrequently as possible. One problem is that published results are only comparable on the same data, so every update makes things more difficult to compare.

I've documented the known issues in the README and in meta-issue #41. Hope that helps for the time being.

@mdeff mdeff closed this as completed Jun 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants