Skip to content
This repository has been archived by the owner on Jul 18, 2024. It is now read-only.

[DataCap Application] Foldingathome COVID-19 Dataset #1024

Closed
Megan008 opened this issue Sep 27, 2022 · 72 comments
Closed

[DataCap Application] Foldingathome COVID-19 Dataset #1024

Megan008 opened this issue Sep 27, 2022 · 72 comments

Comments

@Megan008
Copy link

Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

  • Organization Name: Public Data-Foldingathome COVID-19
  • Website / Social Media: https://registry.opendata.aws/foldingathome-covid19/
  • Total amount of DataCap being requested (between 500 TiB and 5 PiB):5PiB
  • Weekly allocation of DataCap requested (usually between 1-100TiB):100TiB
  • On-chain address for first allocation:f1au3nipqjprr5xp2mwsarr7obvpx2dwy4is6qn4y

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.

I have participated in some projects and hackathon. I have experience on it.

What is the primary source of funding for this project?

Personal income.

What other projects/ecosystem stakeholders is this project associated with?

No.

Use-case details

Describe the data being stored onto Filecoin

[Folding@home](http://foldingathome.org/) is a massively distributed computing project that uses biomolecular simulations to investigate the [molecular origins of disease](https://foldingathome.org/diseases/) and accelerate the discovery of new therapies.

Where was the data in this dataset sourced from?

Simulations of SARS-CoV-2 and associated host proteins, with emphasis on discovering druggable cryptic pockets, documented at the [MolSSI COVID Hub](https://covid.molssi.org//simulations/#foldinghome-simulations-of-the-sars-cov-2-spike-protein-spike-spike-binding).

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this. 

https://registry.opendata.aws/foldingathome-covid19/

        
Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

Yes, it's a public dataset.

What is the expected retrieval frequency for this data?

Multiple times.

For how long do you plan to keep this dataset stored on Filecoin?

2 years.

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

North america; Korea; China.

How will you be distributing your data to storage providers? Is there an offline data transfer process?

75% data will be distributed by offline data transfer. Other data will use online transfer for distributing with storage providers who close to me.

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

I would let 1 sp who used to cooperate with me for this deal. Now I'm chatting with other sps. f023495, f0508988

How will you be distributing deals across storage providers?

I have communicated with 4 sp. In first time, I will divide 1/4 data to each sp. If I find out more sp, I will decrease the percentage of deals to them --- for decentralized storage. 

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

Yes.
@large-datacap-requests
Copy link

Thanks for your request!
Everything looks good. 👌

A Governance Team member will review the information provided and contact you back pretty soon.

@raghavrmadya
Copy link
Collaborator

Do you have permission from Folding@home?

@raghavrmadya
Copy link
Collaborator

Who are the SPs you plan to work with and what exactly is your data transfer plan? the outlined plan is really unclear

@Megan008
Copy link
Author

@raghavrmadya Thank you for your questions.
COVID-19 is a public dataset and is not exclusive to a specific organization. So it is not necessary to have permission from folding@home in advance to download and store the dataset. It is similar to how programmers do not need to get permission from github to use their public code.
The SPs we have worked and discussed before include f01854755, f01823070 and f01878693, etc. After our application has been approved, we plan to divide data to 8-10 SPs according to BDE platform.

@raghavrmadya
Copy link
Collaborator

Thanks @Megan008. We have cases before where clients have needed approval of the manager for public data sets. I also see that you have many applications open. Can you share more about yourself and any organization you are representing as onboarding many PiBs of data through multiple applications requires a team effort. I'm tagging @kernelogic as they have dealt with such challenges with clients before as it relates to public datasets

@kernelogic
Copy link

Folding@home dataset is CreativeCommons licensed so license wise it should be fine.

It consists about 450TB of raw data from AWS S3:
arn:aws:s3:::fah-public-data-covid19-antibodies | us-east-2 | 8.6 TiB
arn:aws:s3:::fah-public-data-covid19-cryptic-pockets | us-east-2 | 71.0 TiB
arn:aws:s3:::fah-public-data-covid19-absolute-free-energy | us-east-2 | 369.5 TiB
arn:aws:s3:::fah-public-data-covid19-moonshot-dynamics | us-east-2 | 1.8 TiB

However, I would have the following questions:

  1. This dataset has been onboarded many times during Slingshot v2, it is also included in the Slingshot v3.
  2. For open dataset I think it is important to provide ways to index / retrieve, not just backup. Like in Slingshot v2 we were asked to provide websites and documents about how the data can be used. Do you have any plans on this regard?
  3. Download 450TB of raw data requires significant internet bandwidth, where are you located and do you have it?

@Megan008
Copy link
Author

@raghavrmadya I'm a community member. As I mentioned before, I'm going to contact more SPs to distribute data via BDE platform next. And I also have sp that I have worked with will continue to work together, so I think we can complete it.

@raghavrmadya Thank you for your questions. COVID-19 is a public dataset and is not exclusive to a specific organization. So it is not necessary to have permission from folding@home in advance to download and store the dataset. It is similar to how programmers do not need to get permission from github to use their public code. The SPs we have worked and discussed before include f01854755, f01823070 and f01878693, etc. After our application has been approved, we plan to divide data to 8-10 SPs according to BDE platform.

@Megan008
Copy link
Author

@kernelogic Thank you for your points and concern! I am currently in Singapore, but I look forward to contacting SPs around the world. I am not participating in the Slingshot, so I think I need to follow LDN's rules rather than slingshot's.

@large-datacap-requests large-datacap-requests bot deleted a comment from simonkim0515 Nov 29, 2022
@simonkim0515 simonkim0515 self-assigned this Nov 29, 2022
@simonkim0515
Copy link
Collaborator

Datacap Request Trigger

Total DataCap requested

5PiB

Expected weekly DataCap usage rate

100TiB

Client address

f1au3nipqjprr5xp2mwsarr7obvpx2dwy4is6qn4y

@large-datacap-requests
Copy link

large-datacap-requests bot commented Nov 29, 2022

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1au3nipqjprr5xp2mwsarr7obvpx2dwy4is6qn4y

DataCap allocation requested

50TiB

Id

fd3d1516-a183-467e-9ad7-01964bb49b11

@galen-mcandrew galen-mcandrew added the cg:stale Client Growth, No recent activity label Jan 31, 2023
@cryptowhizzard
Copy link

cryptowhizzard commented Feb 1, 2023

#1062
#1362

#1013 -> Abuse ( CID sharing )

@large-datacap-requests large-datacap-requests bot added status:Approved and removed state:Approved validated cg:stale Client Growth, No recent activity labels Mar 14, 2023
@large-datacap-requests
Copy link

DataCap Allocation requested

Request number 7

Multisig Notary address

f02049625

Client address

f1au3nipqjprr5xp2mwsarr7obvpx2dwy4is6qn4y

DataCap allocation requested

400TiB

Id

d9dee3ed-014a-41e7-900c-821ac9129c52

@large-datacap-requests
Copy link

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1au3nipqjprr5xp2mwsarr7obvpx2dwy4is6qn4y

Rule to calculate the allocation request amount

400% of weekly dc amount requested

DataCap allocation requested

400TiB

Total DataCap granted for client so far

3.6379788070917166e+64YiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

3.6379788070917166e+64YiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
33372 16 400TiB 15.41 106.96TiB

@cryptowhizzard
Copy link

This applicant has a long history of fraud and refuses to start working on the right path. Applications still not provide retrieval / data stored is not the data stored the applicant is saying.
Scherm­afbeelding 2023-08-19 om 17 09 00

@Megan008
Copy link
Author

image
@cryptowhizzard Is it mean not provide retrieval? Why did you lie through your teeth? Can you please do any check before you do anything?
@raghavrmadya @dkkapur Is it allow to dispute anyone just on his own words?

@cryptowhizzard
Copy link

image @cryptowhizzard Is it mean not provide retrieval? Why did you lie through your teeth? Can you please do any check before you do anything? @raghavrmadya @dkkapur Is it allow to dispute anyone just on his own words?

It is public knowledge that the HTTP retrieval bot is gamed.

http://www.datasetcreators.com/downloadedcarfiles/logs/1024.log

Here you can find the log. Since you have range retrieval disabled ( Something natively enabled in boost ) it is clear you attempt to avoid that someone is retrieving the whole carfile to unpack it and do due diligence.

This is what your retrieval looks like. It is all junk and scam.

Scherm­afbeelding 2023-08-22 om 12 02 50

@Megan008
Copy link
Author

I will let SPs for check.

It is public knowledge that the HTTP retrieval bot is gamed.

This tool is from PL team, do you mean that it is useless? I think I can only trust people from official team.

@Carohere
Copy link

@cryptowhizzard Could you share the link to the file you downloaded?
@Megan008 Retrieval bots can be largely trusted, but sometimes they are not accurate.

@github-actions
Copy link

github-actions bot commented Sep 2, 2023

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

--
Commented by Stale Bot.

@raghavrmadya
Copy link
Collaborator

Hi, I'm following up on the dispute here - https://www.notion.so/filecoin/No-retrieval-supported-bfcfebbcbdbd475fab52cccaf83d4674?pvs=4

Client is requested to provide an update on retrievals if they are not satisfied with the evidence provided by @cryptowhizzard.

Until then, application will remain under dispute and notaries are encourage to not sign without providing evidence of retrieval compliance

@Megan008
Copy link
Author

Megan008 commented Sep 6, 2023

@raghavrmadya First, I've proved that we support retrieval and the retrieval report can also show it.

image @cryptowhizzard Is it mean not provide retrieval? Why did you lie through your teeth? Can you please do any check before you do anything? @raghavrmadya @dkkapur Is it allow to dispute anyone just on his own words?

Then, this is the retrieval download which is given by SPs.
6e356c07-4dbf-40dd-ae57-

All thing means that we support retrieval.

@cryptowhizzard
Copy link

Dear Megan008,

As notary I am doing due diligence on your LDN. I could not get retrieval to work. Can you please upload the car file of CID baga6ea4seaqofl35yu6stkuaeo4nbpe543355wtaglyv74pfwtyx5uqhpag34ii ?

You can use our upload system at http://send.datasetcreators.com. Please select 7 days for the system to keep the file and post the link you received here so I (and other notaries) can download your content.

@github-actions
Copy link

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

--
Commented by Stale Bot.

@github-actions
Copy link

This application has not seen any responses in the last 14 days, so for now it is being closed. Please feel free to contact the Fil+ Gov team to re-open the application if it is still being processed. Thank you!

--
Commented by Stale Bot.

@Megan008
Copy link
Author

Dear Megan008,

As notary I am doing due diligence on your LDN. I could not get retrieval to work. Can you please upload the car file of CID baga6ea4seaqofl35yu6stkuaeo4nbpe543355wtaglyv74pfwtyx5uqhpag34ii ?

You can use our upload system at http://send.datasetcreators.com. Please select 7 days for the system to keep the file and post the link you received here so I (and other notaries) can download your content.

I can not open the link to do upload. It's better that you check my answer as below. It can give you what you want.

Copy link

Thanks for your request!
❗ We have found some problems in the information provided.
We could not find Website / Social Media field in the information provided
We could not find Total amount of DataCap being requested (between 500 TiB and 5 PiB) field in the information provided
We could not find Weekly allocation of DataCap requested (usually between 1-100TiB) field in the information provided
We could not find On-chain address for first allocation field in the information provided
We could not find Data Type of Application field in the information provided

Please, take a look at the request and edit the body of the issue providing all the required information.

Copy link

RootKeyHolders have approved multisig account. You can now request first datacap release

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests