Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: add an explicit license for data available by API #2713

Closed
torgo opened this issue Mar 2, 2023 · 12 comments
Closed

Feature: add an explicit license for data available by API #2713

torgo opened this issue Mar 2, 2023 · 12 comments
Labels
kind/enhancement New feature or request

Comments

@torgo
Copy link

torgo commented Mar 2, 2023

Is your feature request related to a problem? Please describe.
Potential users of the batch generated data will want to know the license under which this data is released so they can be sure they can use it for their use case.

Describe the solution you'd like
Clearly sign post the license under which the data is released (ideally using a creative commons cc-0 license).

Additional context
One solution could be to add a second license file explicitly marked as applying to the BigQuery data (something like LICENSE_bigquerydata.md) which would contain the cc-0 license text.

@torgo torgo added the kind/enhancement New feature or request label Mar 2, 2023
@naveensrinivasan
Copy link
Member

@torgo Thanks for the suggestion. Would you like to do a PR for this?

@torgo
Copy link
Author

torgo commented Mar 6, 2023

Happy to give it a go if we think this is the right approach.

@torgo
Copy link
Author

torgo commented Mar 23, 2023

Ok - just having another think about this and polling the fediverse. Is this repo where the data license should live? I don't think it's appropriate or reasonable to put a general stipulation in here that anyone who generates data sets using Scorecard must release the data under an open license. What we're talking about here is the specific data set that is being placed in the BigQuery engine. So if feels to me like the license should be closer to the data rather than in this repo? However @naveensrinivasan if you think the thing I've suggested above would do the trick I'm happy to do a PR.

@spencerschrock
Copy link
Contributor

cc @david-a-wheeler
This was relevant to one of the topics in the Scorecard sync today, and seems more like something the OpenSSF should be making the decision on.

@torgo torgo changed the title Feature: add an explicit license for BigQuery batch generated data Feature: add an explicit license for data available by API Jun 1, 2023
@torgo
Copy link
Author

torgo commented Jun 1, 2023

Updating this issue as it seems more appropriate to assign the data license (my proposal is cc0) to the data available via the API.

torgo added a commit to torgo/scorecard that referenced this issue Jun 1, 2023
Related to ossf#2713.

Signed-off-by: Daniel Appelquist <daniel.appelquist@snyk.io>
@david-a-wheeler
Copy link
Contributor

I formally filed a legal review request with LF Legal as internal legal review issue LR-1558. I suspect that this is fine, but since I'm not a lawyer, I think it's important to bring this question to actual lawyers. I will definitely let you know once I know something. Thanks for asking!

@david-a-wheeler
Copy link
Contributor

Making our licensing clear is a good thing. Our legal team has some concerns about the CC0 license, especially outside the US.

One possibility they raised was to state that contributions of data would be under CDLA Permissive 2.0 and made available under that license. The license is here: https://cdla.dev/permissive-2-0. The OpenSSF Charter already authorizes this at: https://openssf.org/about/charter/.

However, after looking at their response (I just back from vacation) I realized that they may be thinking we're only accepting and distributing contributed data. Let me circle back to them for confirmation. I've learned that legal answers can be really specific depending on the circumstance, and I want to make sure that they understand our circumstance so they can give us good answers.

@spencerschrock
Copy link
Contributor

Making our licensing clear is a good thing. Our legal team has some concerns about the CC0 license, especially outside the US.

One possibility they raised was to state that contributions of data would be under CDLA Permissive 2.0 and made available under that license. The license is here: https://cdla.dev/permissive-2-0. The OpenSSF Charter already authorizes this at: https://openssf.org/about/charter/.

However, after looking at their response (I just back from vacation) I realized that they may be thinking we're only accepting and distributing contributed data. Let me circle back to them for confirmation. I've learned that legal answers can be really specific depending on the circumstance, and I want to make sure that they understand our circumstance so they can give us good answers.

Does contributed data in this sense refer to the GitHub API data we consume? Or is this distinction around API data that comes from our weekly cron vs data submitted by individual repos via scorecard action?

@david-a-wheeler
Copy link
Contributor

david-a-wheeler commented Jul 12, 2023

Does contributed data in this sense refer to the GitHub API data we consume? Or is this distinction around API data that comes from our weekly cron vs data submitted by individual repos via scorecard action?

They used the term "contribution", so I guess I'm not sure. They probably knew what they meant; I'm just reporting back. I filed more info & talked briefly with one of our lawyers.

I propose that we wait until July 19 to see if there are additional clarifications. They asked for time through the end of this week, but giving them a little extra time seems wise. My understanding is that in the US you can't really have a copyright on facts, but there are many asterisks to that statement, so having a clear license statement seems prudent.

If we don't hear otherwise, then after July 19 we should just attach the CDLA Permissive 2.0 and make it clear that generated data is available under that license. The license is here: https://cdla.dev/permissive-2-0. It basically lets receivers do whatever they want with the data, but makes it abundantly clear that there is "No Warranty; Limitation of Liability" (which from a risk point-of-view makes it better than CC0 for releasing data). This is also the easy path, because the OpenSSF Charter already authorizes this license at: https://openssf.org/about/charter/. They had previously recommended using this license for data, so their recommendations and the charter are all consistent.

In short, that's what our legal folks recommend & I think it makes sense. Does this seem like a reasonable plan?

@david-a-wheeler
Copy link
Contributor

The OpenSSF charter and our lawyers recommend CDLA for this case. So as long as we clearly say the generated data is released under the CDLA then all is well.

@github-actions
Copy link

github-actions bot commented Oct 4, 2023

This issue is stale because it has been open for 60 days with no activity.

@spencerschrock
Copy link
Contributor

Completed via #3404

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants