-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add licenses loaders and doc nits #410
Conversation
Codecov Report
@@ Coverage Diff @@
## master #410 +/- ##
==========================================
+ Coverage 97.76% 97.78% +0.01%
==========================================
Files 32 32
Lines 3446 3476 +30
==========================================
+ Hits 3369 3399 +30
Misses 77 77 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome ❤️
I made a few tiny comments!
@@ -51,7 +51,7 @@ There are two ways of citing mirdata: | |||
|
|||
If you are using the library for your work, please cite the version you used as indexed at Zenodo: | |||
|
|||
DOI | |||
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4355859.svg)](https://doi.org/10.5281/zenodo.4355859) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I click the link it takes me to a page that says "Error: DOI not found"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it says it might not have been activated yet (maybe it's too new!) I'll double check in a little bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reserved it but didn't upload anything in Zenodo yet, that's why is showing like that. It says it will be online when we make the release
@@ -166,7 +176,7 @@ def __init__(self, mtrack_id, data_home): | |||
self._data_home = data_home | |||
# these three attributes below must have exactly these names | |||
self.track_ids = [...] # define which track_ids should be part of the multitrack | |||
self.tracks = {t: Track(t, self._data_home) for t in track_ids} | |||
self.tracks = {t: Track(t, self._data_home) for t in self.track_ids} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch 😅
mirdata/core.py
Outdated
remotes (dict): data to be downloaded | ||
download_info (str): download instructions or caveats | ||
track (core.Track): an uninstantiated Track object | ||
track_object (mirdata.core.Track or None): an uninstantiated Track object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
track_object is an argument in init (documented there) but not an attribute of the object (which is hidden _track_object
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^^ i pushed a change that fixes the problem
@@ -190,6 +203,12 @@ def cite(self): | |||
print("========== BibTeX ==========") | |||
print(self.bibtex) | |||
|
|||
def license(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
@@ -383,8 +383,7 @@ def load_extractor(path): | |||
|
|||
@core.docstring_inherit(core.Dataset) | |||
class Dataset(core.Dataset): | |||
"""The acousticbrainz genre dataset | |||
""" | |||
"""The acousticbrainz genre dataset""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the license info is missing for this dataset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is! Good catch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmmmm This dataset is composed of 4 subdatasets. Three of them are Creative Commons Attribution Non Commercial Share Alike 4.0 International and the other one is non-comercial.
Please include in the justification field your academic affiliation (if you have one) and a brief description of your research topics and why you would like to use this dataset. If you do not include this information we may not approve your request.
The MediaEval AcousticBrainz Genre AllMusic dataset is released for non-commercial scientific research purposes only. Any publication of results based on the data extracts of the AllMusic database must cite AllMusic as the source of the data.
The MediaEval AcousticBrainz Genre AllMusic dataset is offered free of charge for internal non-commercial use only. You may not redistribute, publically communicate or modify it, unless expressly permitted by the Universitat Pompeu Fabra (UPF) or by applicable law. The individual contents of the dataset may be protected by copyright, and a non-transferrable limited license to reproduce and use the same is granted for the indicated purpose only.
The dataset and its contents are made available on an “as is” basis and without warranties of any kind, including without limitation satisfactory quality and conformity, merchantability, fitness for a particular purpose, accuracy or completeness, or absence of errors. Subject to any liability that may not be excluded or limited by law, the UPF is not liable for, and expressly excludes, all liability for loss or damage however and whenever caused to anyone by any use of the MediaEval AcousticBrainz Genre AllMusic dataset or any part of it.
Here the info about this sub-dataset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmmmm.. ok, will try to be more specific. What about giantestps_tempo? I couldn't find the license
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked a bit for it and I couldn't find an explicit license as well. Looks like it's Creative Commons Attribution Share Alike 4.0 International
like giantsteps_key
, as they were compiled and presented together, and data is extracted from the same source.
Here is a link to the ISMIR paper for the two Giantsteps datasets: http://www.cp.jku.at/research/papers/knees_etal_ismir_2015.pdf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PRamoneda can you take a look at the updated license of acousticbrainz and let me know if you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is perfect for me!
mirdata/datasets/medleydb_melody.py
Outdated
@@ -46,6 +46,8 @@ | |||
{} | |||
""" | |||
|
|||
LICENSE_INFO = "http://creativecommons.org/licenses/by-nc-sa/4.0/." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - for the other creative commons licenses we're writing the name of the license in text but this one is a link. Maybe we want them all to have links?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmmm.. I used links for those that only provide links in their websites. I'm a bit afraid of not matching exactly the license if I add links to the other ones. Want me to change this link to the name of the license instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't take responsibility for accuracy so it's not horrible if it's inconsistent. I'm fine either way though (to leave as is or to change it)
by the way, there are lists like this with badges - https://gist.github.com/lukas-h/2a5d00690736b4c3a7ba
we'd talked about adding a column to the datasets table with the license badge. is it worth doing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave it like is then.
Will take a look at the table now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@genisplaja I see we have a single entry for Saraga. Should we have one for hindustani and another for carnatic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply @magdalenafuentes!! I think we should have two entries because there are some differences between the available audio and annotations. I will push the new rows in this PR if you are OK with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Go ahead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!!
tests/test_loaders.py
Outdated
@@ -106,7 +106,7 @@ def test_dataset_attributes(httpserver): | |||
clean_remote_dataset(dataset_name) | |||
|
|||
|
|||
def test_cite(httpserver): | |||
def test_cite_license(httpserver): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - test_cite_and_license
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would also be good to add a line to test_dataset_attributes
to check that dataset.license is set and not None
mirdata/core.py
Outdated
@@ -76,13 +84,15 @@ class Dataset(object): | |||
|
|||
Attributes: | |||
data_home (str): path where mirdata will look for the dataset | |||
index (dict or None): the dataset's file index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
index shouldn't be in attributes, since it's not an attribute! It's a parameter and it's documented there already
mirdata/core.py
Outdated
bibtex (str or None): dataset citation/s in bibtex format | ||
remotes (dict or None): data to be downloaded | ||
download_info (str or None): download instructions or caveats | ||
license_info (str or None): license of the dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment, since there is no Dataset.license_info
attribute, it shouldn't be documented here.
@PRamoneda @genisplaja @rabitt let me know what do you think about this format for the table and I’ll complete the rest of the datasets. a bit of context: adding licenses implies adding long text in the table and formatting it as we we’re doing becomes a nightmare |
@@ -0,0 +1,54 @@ | |||
.. list-table:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️ this is so much better!!
@magdalenafuentes the new way of writing the table is amazing! It's much easier to add to without breaking the whole table like before! 👍 from me! |
one thing, and i dont know exactly why it's happening, but look at the built docs for all the dataset.Dataset objects in the Dataset Loaders documentation. The format is getting messed up. I think @drubinstein had the same issue in his PR and found a workaround. |
@@ -383,8 +393,7 @@ def load_extractor(path): | |||
|
|||
@core.docstring_inherit(core.Dataset) | |||
class Dataset(core.Dataset): | |||
"""The acousticbrainz genre dataset | |||
""" | |||
"""The acousticbrainz genre dataset""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@magdalenafuentes (thanks to @drubinstein for the fix) the change in formatting you made to the docstring -
"""blablabla""""
vs how it was
"""blablabla
"""
is what's causing the formatting problem in the docs build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one more note from @drubinstein -
black will reformat
"""blablabla
"""
to one line, but it wont reformat:
"""
blablabla
"""
which will build properly too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooops, good to know, so it should be either """blabla""" or
""""blabla
""""?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly.
"""A"""
Will cause the problem
"""A
"""
or
"""
A
"""
are acceptable solutions. However, black will format solution 1 to look like the version that causes the problem therefore solution 2 is recommended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmm ok, will change things to be that format then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw thanks @drubinstein for the hint!
@genisplaja just a nit: adding |
I see @magdalenafuentes, sorry! In this case, I think that for now, we can remove the |
Perfect! Thanks :) |
Merging! great PR :) |
Closes #397, fixes #382, resolves #396, closes #381
Dataset.license()
with disclaimer messageContributing
DOI
inREADME.md
and docs