Adding loader for Beatport EDM key #286

PRamoneda · 2020-10-11T17:38:56Z

Adding loader for Beatport EDM key

Please use the following title: "Adding loader for MyDATASET". If your pull request is work in progress, change your title to "[WIP] Adding loader for MyDATASET" to avoid reviews while the loader is not ready.

Description

Please include the following information at the top level docstring for the dataset's module mydataset.py:

Describe annotations included in the dataset
Indicate the size of the datasets (e.g. number files and duration, hours)
Mention the origin of the dataset (e.g. creator, institution)
Describe the type of music included in the dataset
Indicate any relevant papers related to the dataset
Include a description about how the data can be accessed and the license it uses (if applicable)

Dataset loaders checklist:

Create a script in scripts/, e.g. make_my_dataset_index.py, which generates an index file.
Run the script on the canonical version of the dataset and save the index in mirdata/indexes/ e.g. my_dataset_index.json.
Create a module in mirdata, e.g. mirdata/my_dataset.py
Create tests for your loader in tests/, e.g. test_my_dataset.py
Add your module to docs/source/mirdata.rst and docs/source/datasets.rst
Add the module to mirdata/__init__.py
Add the module to the list in the README.md file, section Currently supported datasets

If your dataset is not fully downloadable there are two extra steps you should follow:
It is fully downloadable

Please-do-not-edit flag

To reduce friction, we will make commits on top of contributor's pull requests by default unless they use the please-do-not-edit flag. If you don't want this to happen don't forget to add the flag when you start your pull request.

add to README list

codecov · 2020-10-11T17:43:52Z

Codecov Report

Merging #286 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #286   +/-   ##
=======================================
  Coverage   99.04%   99.04%           
=======================================
  Files          24       24           
  Lines        2523     2523           
=======================================
  Hits         2499     2499           
  Misses         24       24

nkundiushuti · 2020-10-19T14:45:00Z

mirdata/beatport_key.py

+    # This dataset annotations have characters that doesnt correspond with json format. In particular, "bpm": nan
+    # doesn't correspond to json format.
+    # input file
+    find_replace(os.path.join(data_home, "meta"), ": nan", ": null", "*.json")


what is the role of this function when you have an index for all the files in the dataset?

Original JSON files have nan that are not of the JSON standard format. So I can't read JSON. So from what is downloaded of Zenodo is changed each nan to none.

I understand why the function is there now. Maybe it would be more informative adding the long comment on the readme of the function above and leaving a short one here. Also, thinking out loud, I guess is a good choice to do it in the download so you do it only once and in a consistent way to the index, right?

nkundiushuti · 2020-10-19T14:46:40Z

mirdata/jams_utils.py

@@ -249,7 +249,7 @@ def jams_converter(


 def beats_to_jams(beats):


this file should not be included in this PR

Black modifies it.

Try running black -S

nkundiushuti · 2020-10-19T14:46:55Z

mirdata/rwc_classical.py

@@ -287,14 +287,14 @@ def track_ids():
 def load(data_home=None):


this file should not be included in this PR

Black modifies it.

try running it as defined in CONTRIBUTING
black --target-version py37 --skip-string-normalization mirdata/
as a rule of thumb try avoiding global commits and refer to specific files.
you can discard the commits on a specific file by updating the master and replacing your version of the file with the version from the master
git checkout master file_to_restore.ext

nkundiushuti · 2020-10-19T14:47:08Z

mirdata/utils.py

@@ -208,10 +208,10 @@ def load_json_index(filename):



this file should not be included in this PR

Black modifies it.

magdalenafuentes · 2020-10-19T22:34:12Z

docs/source/datasets.rst

+| giantsteps_key_  | Giantsteps EDM key  | - audio: ✅         | - global :ref:`key`       | 1486   |
+|                  |                     | - annotations: ✅   |                           |        |


Shouldn't this be Beatport?

Is the audio provided? or should it be an X?

done!!!
yes!! the audio is provided!!

magdalenafuentes · 2020-10-19T22:35:52Z

mirdata/beatport_key.py

@@ -0,0 +1,348 @@
+# -*- coding: utf-8 -*-
+"""beatport_key Dataset Loader
+The Beatport EDM Key Dataset includes 1,486 two-minute sound excerpts from various EDM


nit, I think we're using 1486 as notation for now, will see for bigger datasets

ok perfect!
done

magdalenafuentes · 2020-10-19T22:54:36Z

mirdata/beatport_key.py

+
+Data License: Creative Commons Attribution Share Alike 4.0 International
+"""
+import fnmatch


Just leaving a comment here to check if this is in requirements or partial requirements. If it is ignore this comment, if it's not we should discuss it

I think that fnmatch is from "python standard"

magdalenafuentes · 2020-10-19T22:58:42Z

mirdata/beatport_key.py

+def find_replace(directory, find, replace, pattern):
+    """
+    Replace in some directory all the songs with the format pattern find by replace
+    Parameters


At this point reading the module I'm not sure why this function is needed. Maybe expand on the docs why is it here?

See comment below

ok! done then

magdalenafuentes · 2020-10-19T23:00:27Z

mirdata/beatport_key.py

+
+    @utils.cached_property
+    def key(self):
+        """String: key annotation"""


I've noticed we're mixing docs styles. I don't know if this is the only loader that does it, but I'll add it as part of the audit issue to make sure we are consistent later on

I was imitating the Beatles loader. What style must we use?

magdalenafuentes · 2020-10-19T23:07:27Z

mirdata/beatport_key.py

+    # This dataset annotations have characters that doesnt correspond with json format. In particular, "bpm": nan
+    # doesn't correspond to json format.
+    # input file
+    find_replace(os.path.join(data_home, "meta"), ": nan", ": null", "*.json")


I understand why the function is there now. Maybe it would be more informative adding the long comment on the readme of the function above and leaving a short one here. Also, thinking out loud, I guess is a good choice to do it in the download so you do it only once and in a consistent way to the index, right?

magdalenafuentes · 2020-10-19T23:20:50Z

mirdata/jams_utils.py

@@ -249,7 +249,7 @@ def jams_converter(


 def beats_to_jams(beats):


Try running black -S

magdalenafuentes · 2020-10-19T23:23:52Z

tests/resources/mir_datasets/beatport_key/meta/100066 Lindstrom - Monsteer (Original Mix).json

+ }, 
+ "guest_pick": false, 
+ "sub_genres": []
+}


nit: end line

magdalenafuentes · 2020-10-19T23:25:05Z

tests/test_beatport_key.py

+
+    audio, sr = track.audio
+    assert sr == 44100, 'sample rate {} is not 44100'.format(sr)
+    assert audio.shape == (5292000,), 'audio shape {} was not (5294592,)'.format(


shouldn't numbers match here?

magdalenafuentes · 2020-10-19T23:32:52Z

@PRamoneda great job! Left my comments there. After you address them I'll take a look again, but first maybe it would be good to also merge the master so I can check and merge right away?

numbers must match

1486 without comma

PRamoneda · 2020-10-20T17:43:18Z

@magdalenafuentes I dont know I have requested or not the review! hahahaa Thnks so much!

Adding loader for Beatport EDM key (#286) * make script and module * fix nan problem * fixed tests * black * Update README.md add to README list * Update mirdata.rst * Update datasets.rst * fix codecov * fix codecov xd * added test in order to pass codecov * Update datasets.rst * Update test_beatport_key.py numbers must match * 1486 without comma 1486 without comma * move comments * try black

PRamoneda added 5 commits October 10, 2020 14:31

make script and module

5e4f3c3

fix nan problem

dc4dce6

fixed tests

3b2a969

black

5d6fa02

Update README.md

5757a25

add to README list

PRamoneda added 5 commits October 11, 2020 19:46

Update mirdata.rst

78811a0

Update datasets.rst

63f04f5

fix codecov

4bf5208

fix codecov xd

8139628

added test in order to pass codecov

b0795da

nkundiushuti reviewed Oct 19, 2020

View reviewed changes

magdalenafuentes requested changes Oct 19, 2020

View reviewed changes

PRamoneda added 5 commits October 20, 2020 19:10

Update datasets.rst

0da25ef

Update test_beatport_key.py

5895d22

numbers must match

1486 without comma

304d5a3

1486 without comma

move comments

744a89c

try black

be09e5a

PRamoneda requested a review from magdalenafuentes October 20, 2020 17:42

Merge branch 'master' into Pedro/Beatport_key

fa9a398

magdalenafuentes approved these changes Oct 22, 2020

View reviewed changes

magdalenafuentes merged commit d1570b2 into mir-dataset-loaders:master Oct 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding loader for Beatport EDM key #286

Adding loader for Beatport EDM key #286

PRamoneda commented Oct 11, 2020 •

edited

Loading

codecov bot commented Oct 11, 2020 •

edited

Loading

nkundiushuti Oct 19, 2020

PRamoneda Oct 19, 2020

magdalenafuentes Oct 19, 2020

PRamoneda Oct 20, 2020

nkundiushuti Oct 19, 2020

PRamoneda Oct 19, 2020

magdalenafuentes Oct 19, 2020

nkundiushuti Oct 19, 2020

PRamoneda Oct 19, 2020

nkundiushuti Oct 26, 2020

nkundiushuti Oct 19, 2020

PRamoneda Oct 19, 2020

magdalenafuentes Oct 19, 2020

magdalenafuentes Oct 19, 2020

PRamoneda Oct 20, 2020

magdalenafuentes Oct 19, 2020

PRamoneda Oct 20, 2020

magdalenafuentes Oct 19, 2020

PRamoneda Oct 20, 2020

magdalenafuentes Oct 19, 2020

magdalenafuentes Oct 19, 2020

PRamoneda Oct 20, 2020

magdalenafuentes Oct 19, 2020

PRamoneda Oct 20, 2020 •

edited

Loading

magdalenafuentes Oct 19, 2020

magdalenafuentes Oct 19, 2020

magdalenafuentes Oct 19, 2020

magdalenafuentes Oct 19, 2020

PRamoneda Oct 20, 2020

magdalenafuentes commented Oct 19, 2020

PRamoneda commented Oct 20, 2020

		@@ -249,7 +249,7 @@ def jams_converter(


		def beats_to_jams(beats):

		@@ -287,14 +287,14 @@ def track_ids():
		def load(data_home=None):

		\| giantsteps_key_ \| Giantsteps EDM key \| - audio: ✅ \| - global :ref:`key` \| 1486 \|
		\| \| \| - annotations: ✅ \| \| \|

Adding loader for Beatport EDM key #286

Adding loader for Beatport EDM key #286

Conversation

PRamoneda commented Oct 11, 2020 • edited Loading