Add licenses loaders and doc nits #410

magdalenafuentes · 2021-01-08T17:26:04Z

Closes #397, fixes #382, resolves #396, closes #381

Add Dataset.license() with disclaimer message
Update all loaders with custom licenses
Update example Contributing
Update tests
Fix nits docs
Add DOI in README.md and docs
Add citing instructions when working with datasets
Change table format
Add licenses in table for all datasets

codecov · 2021-01-08T17:35:40Z

Codecov Report

Merging #410 (393231b) into master (359997f) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #410      +/-   ##
==========================================
+ Coverage   97.76%   97.78%   +0.01%     
==========================================
  Files          32       32              
  Lines        3446     3476      +30     
==========================================
+ Hits         3369     3399      +30     
  Misses         77       77

… license

rabitt

This is awesome ❤️

I made a few tiny comments!

rabitt · 2021-01-08T18:24:40Z

README.md

@@ -51,7 +51,7 @@ There are two ways of citing mirdata:

 If you are using the library for your work, please cite the version you used as indexed at Zenodo:

-DOI
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4355859.svg)](https://doi.org/10.5281/zenodo.4355859)


When I click the link it takes me to a page that says "Error: DOI not found"

it says it might not have been activated yet (maybe it's too new!) I'll double check in a little bit

I reserved it but didn't upload anything in Zenodo yet, that's why is showing like that. It says it will be online when we make the release

rabitt · 2021-01-08T18:31:05Z

docs/source/contributing_examples/example.py

@@ -166,7 +176,7 @@ def __init__(self, mtrack_id, data_home):
        self._data_home = data_home
        # these three attributes below must have exactly these names
        self.track_ids = [...] # define which track_ids should be part of the multitrack
-        self.tracks = {t: Track(t, self._data_home) for t in track_ids}
+        self.tracks = {t: Track(t, self._data_home) for t in self.track_ids}


good catch 😅

rabitt · 2021-01-08T18:37:13Z

mirdata/core.py

-        remotes (dict): data to be downloaded
-        download_info (str): download instructions or caveats
-        track (core.Track): an uninstantiated Track object
+        track_object (mirdata.core.Track or None): an uninstantiated Track object


track_object is an argument in init (documented there) but not an attribute of the object (which is hidden _track_object)

^^ i pushed a change that fixes the problem

rabitt · 2021-01-08T18:46:43Z

mirdata/core.py

@@ -190,6 +203,12 @@ def cite(self):
        print("========== BibTeX ==========")
        print(self.bibtex)

+    def license(self):


rabitt · 2021-01-08T18:48:22Z

mirdata/datasets/acousticbrainz_genre.py

@@ -383,8 +383,7 @@ def load_extractor(path):

 @core.docstring_inherit(core.Dataset)
 class Dataset(core.Dataset):
-    """The acousticbrainz genre dataset
-    """
+    """The acousticbrainz genre dataset"""


i think the license info is missing for this dataset

It is! Good catch

mmmmm This dataset is composed of 4 subdatasets. Three of them are Creative Commons Attribution Non Commercial Share Alike 4.0 International and the other one is non-comercial.

Please include in the justification field your academic affiliation (if you have one) and a brief description of your research topics and why you would like to use this dataset. If you do not include this information we may not approve your request.
The MediaEval AcousticBrainz Genre AllMusic dataset is released for non-commercial scientific research purposes only. Any publication of results based on the data extracts of the AllMusic database must cite AllMusic as the source of the data.
The MediaEval AcousticBrainz Genre AllMusic dataset is offered free of charge for internal non-commercial use only. You may not redistribute, publically communicate or modify it, unless expressly permitted by the Universitat Pompeu Fabra (UPF) or by applicable law. The individual contents of the dataset may be protected by copyright, and a non-transferrable limited license to reproduce and use the same is granted for the indicated purpose only.
The dataset and its contents are made available on an “as is” basis and without warranties of any kind, including without limitation satisfactory quality and conformity, merchantability, fitness for a particular purpose, accuracy or completeness, or absence of errors. Subject to any liability that may not be excluded or limited by law, the UPF is not liable for, and expressly excludes, all liability for loss or damage however and whenever caused to anyone by any use of the MediaEval AcousticBrainz Genre AllMusic dataset or any part of it.

Here the info about this sub-dataset

https://zenodo.org/record/2554044#.X_jiSy3Tqw4

Mmmmm.. ok, will try to be more specific. What about giantestps_tempo? I couldn't find the license

I looked a bit for it and I couldn't find an explicit license as well. Looks like it's Creative Commons Attribution Share Alike 4.0 International like giantsteps_key, as they were compiled and presented together, and data is extracted from the same source.
Here is a link to the ISMIR paper for the two Giantsteps datasets: http://www.cp.jku.at/research/papers/knees_etal_ismir_2015.pdf

@PRamoneda can you take a look at the updated license of acousticbrainz and let me know if you agree?

It is perfect for me!

mirdata/datasets/beatles.py

rabitt · 2021-01-08T18:53:03Z

mirdata/datasets/medleydb_melody.py

@@ -46,6 +46,8 @@
    {}
 """

+LICENSE_INFO = "http://creativecommons.org/licenses/by-nc-sa/4.0/."


nit - for the other creative commons licenses we're writing the name of the license in text but this one is a link. Maybe we want them all to have links?

Mmmm.. I used links for those that only provide links in their websites. I'm a bit afraid of not matching exactly the license if I add links to the other ones. Want me to change this link to the name of the license instead?

We don't take responsibility for accuracy so it's not horrible if it's inconsistent. I'm fine either way though (to leave as is or to change it)

by the way, there are lists like this with badges - https://gist.github.com/lukas-h/2a5d00690736b4c3a7ba

we'd talked about adding a column to the datasets table with the license badge. is it worth doing here?

I'll leave it like is then.

Will take a look at the table now

@genisplaja I see we have a single entry for Saraga. Should we have one for hindustani and another for carnatic?

Sorry for the late reply @magdalenafuentes!! I think we should have two entries because there are some differences between the available audio and annotations. I will push the new rows in this PR if you are OK with it.

Sure! Go ahead

rabitt · 2021-01-08T18:57:18Z

tests/test_loaders.py

@@ -106,7 +106,7 @@ def test_dataset_attributes(httpserver):
            clean_remote_dataset(dataset_name)


-def test_cite(httpserver):
+def test_cite_license(httpserver):


nit - test_cite_and_license

it would also be good to add a line to test_dataset_attributes to check that dataset.license is set and not None

rabitt · 2021-01-09T20:06:50Z

mirdata/core.py

@@ -76,13 +84,15 @@ class Dataset(object):

    Attributes:
        data_home (str): path where mirdata will look for the dataset
+        index (dict or None): the dataset's file index


index shouldn't be in attributes, since it's not an attribute! It's a parameter and it's documented there already

rabitt · 2021-01-09T20:08:14Z

mirdata/core.py

+        bibtex (str or None): dataset citation/s in bibtex format
+        remotes (dict or None): data to be downloaded
+        download_info (str or None): download instructions or caveats
+        license_info (str or None): license of the dataset


same comment, since there is no Dataset.license_info attribute, it shouldn't be documented here.

magdalenafuentes · 2021-01-10T15:57:40Z

@PRamoneda @genisplaja @rabitt let me know what do you think about this format for the table and I’ll complete the rest of the datasets.

a bit of context: adding licenses implies adding long text in the table and formatting it as we we’re doing becomes a nightmare

rabitt · 2021-01-10T17:38:55Z

docs/source/table.rst

@@ -0,0 +1,54 @@
+.. list-table::


❤️ this is so much better!!

rabitt · 2021-01-10T17:42:53Z

@magdalenafuentes the new way of writing the table is amazing! It's much easier to add to without breaking the whole table like before! 👍 from me!

rabitt · 2021-01-10T17:46:57Z

one thing, and i dont know exactly why it's happening, but look at the built docs for all the dataset.Dataset objects in the Dataset Loaders documentation. The format is getting messed up. I think @drubinstein had the same issue in his PR and found a workaround.

rabitt · 2021-01-10T17:50:50Z

mirdata/datasets/acousticbrainz_genre.py

@@ -383,8 +393,7 @@ def load_extractor(path):

 @core.docstring_inherit(core.Dataset)
 class Dataset(core.Dataset):
-    """The acousticbrainz genre dataset
-    """
+    """The acousticbrainz genre dataset"""


@magdalenafuentes (thanks to @drubinstein for the fix) the change in formatting you made to the docstring -
"""blablabla""""
vs how it was
"""blablabla
"""
is what's causing the formatting problem in the docs build.

one more note from @drubinstein -
black will reformat
"""blablabla
"""
to one line, but it wont reformat:
"""
blablabla
"""
which will build properly too.

Ooops, good to know, so it should be either """blabla""" or
""""blabla
""""?

Not exactly.

"""A"""

Will cause the problem

"""A """

or

""" A """

are acceptable solutions. However, black will format solution 1 to look like the version that causes the problem therefore solution 2 is recommended.

mmm ok, will change things to be that format then

btw thanks @drubinstein for the hint!

magdalenafuentes · 2021-01-11T16:12:42Z

@genisplaja just a nit: adding multichannel in the second column makes the table quite wide and it doesn't fit in the page anymore (at least for me). Maybe keep it for the dataset info or abbreviate? In any case we should be consistent with other datasets that have multi-channel (either all of them say so or none?)

genisplaja · 2021-01-11T16:19:04Z

@genisplaja just a nit: adding multichannel in the second column makes the table quite wide and it doesn't fit in the page anymore (at least for me). Maybe keep it for the dataset info or abbreviate? In any case we should be consistent with other datasets that have multi-channel (either all of them say so or none?)

I see @magdalenafuentes, sorry! In this case, I think that for now, we can remove the multichannel comment here as that's explained in the Saraga Carnatic documentation. We can discuss whether we should add it for all the multi-channel/multi-track datasets when we have more of them. What do you think?

magdalenafuentes · 2021-01-11T16:19:53Z

@genisplaja just a nit: adding multichannel in the second column makes the table quite wide and it doesn't fit in the page anymore (at least for me). Maybe keep it for the dataset info or abbreviate? In any case we should be consistent with other datasets that have multi-channel (either all of them say so or none?)

I see @magdalenafuentes, sorry! In this case, I think that for now, we can remove the multichannel comment here as that's explained in the Saraga Carnatic documentation. We can discuss whether we should add it for all the multi-channel/multi-track datasets when we have more of them. What do you think?

Perfect! Thanks :)

rabitt · 2021-01-11T19:56:01Z

Merging! great PR :)

magdalenafuentes added 6 commits January 8, 2021 16:04

delete outdated git example

8e1b362

Add dataset.license() with disclaimer

e3935c9

Update modules with licenses

68ef132

Update contributing example.py

c62407d

Alphabetic order loaders

0d56631

Merge branch 'master' into license

be4774c

magdalenafuentes added 7 commits January 8, 2021 17:38

Nits in annotations and table

cee5ddd

Merge branch 'license' of github.com:mir-dataset-loaders/mirdata into…

4b4ef8e

… license

Update disclaimer and fix printing some loaders

ad45135

black

708ac78

Add DOI

f09eee6

Add DOI

8b0909f

Update tests

b0b527b

magdalenafuentes linked an issue Jan 8, 2021 that may be closed by this pull request

Create DOI for mirdata code releases #396

Closed

magdalenafuentes requested a review from rabitt January 8, 2021 18:13

magdalenafuentes and others added 3 commits January 8, 2021 18:19

Citing instructions when working with datasets

3d007da

some small edits

4f9abc8

update docs for track/track_object

bb49115

rabitt reviewed Jan 8, 2021

View reviewed changes

magdalenafuentes added 4 commits January 8, 2021 19:16

Add license acousticbrainz_genre

5fc9224

Forgot license_info in acousticbrainz_genre

cd33676

black

29de3f0

Update tests

2a65403

rabitt reviewed Jan 9, 2021

View reviewed changes

magdalenafuentes added 4 commits January 10, 2021 12:56

Fix attribute docs Dataset

3f05198

Add license giantsteps_tempo.py

b5856bf

Update license acousticbrainz_genre

818fe5e

Proposal for table format

20c2359

rabitt reviewed Jan 10, 2021

View reviewed changes

docs/source/table.rst

@@ -0,0 +1,54 @@

.. list-table::

Copy link

Collaborator

rabitt Jan 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ this is so much better!!

rabitt reviewed Jan 10, 2021

View reviewed changes

magdalenafuentes added 8 commits January 10, 2021 19:15

Complete licenses in table

6d748d5

Add license badges reference contributing.rst

90fa4dd

add aliases for table

91ca629

link -> title of license, for consistency

82fb0d6

Fix formatting

eb708d8

Formatting so black doesn't mess up

c8e2532

Merge branch 'master' into license and solve conflicts

ff2a731

black

ab4c1f2

magdalenafuentes requested a review from rabitt January 10, 2021 19:51

Split Saraga in table

0c2112b

genisplaja and others added 3 commits January 11, 2021 17:55

Remove multichannel flag from Saraga Carnatic

56d682a

remove download_info from attributes docs

22c194b

remove one line triple quote docstrings

393231b

rabitt approved these changes Jan 11, 2021

View reviewed changes

rabitt merged commit 9e5dd7b into master Jan 11, 2021

rabitt mentioned this pull request Jan 11, 2021

Dataset object download_info and readme #400

Closed

magdalenafuentes mentioned this pull request Jan 13, 2021

Dataset.info() #300

Closed

rabitt deleted the license branch February 11, 2021 18:04

Add licenses loaders and doc nits #410

Add licenses loaders and doc nits #410

Conversation

magdalenafuentes commented Jan 8, 2021 • edited Loading

codecov bot commented Jan 8, 2021 • edited Loading

Codecov Report

rabitt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PRamoneda Jan 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

genisplaja Jan 9, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rabitt Jan 9, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

magdalenafuentes commented Jan 10, 2021

Choose a reason for hiding this comment

rabitt commented Jan 10, 2021

rabitt commented Jan 10, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

magdalenafuentes commented Jan 11, 2021

genisplaja commented Jan 11, 2021

magdalenafuentes commented Jan 11, 2021

rabitt commented Jan 11, 2021

magdalenafuentes commented Jan 8, 2021 •

edited

Loading

codecov bot commented Jan 8, 2021 •

edited

Loading

PRamoneda Jan 8, 2021 •

edited

Loading

genisplaja Jan 9, 2021 •

edited

Loading

rabitt Jan 9, 2021 •

edited

Loading