Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding loader for Beatport EDM key #286

Merged
merged 16 commits into from
Oct 22, 2020

Conversation

PRamoneda
Copy link
Collaborator

@PRamoneda PRamoneda commented Oct 11, 2020

Adding loader for Beatport EDM key

Please use the following title: "Adding loader for MyDATASET". If your pull request is work in progress, change your title to "[WIP] Adding loader for MyDATASET" to avoid reviews while the loader is not ready.

Description

Please include the following information at the top level docstring for the dataset's module mydataset.py:

  • Describe annotations included in the dataset
  • Indicate the size of the datasets (e.g. number files and duration, hours)
  • Mention the origin of the dataset (e.g. creator, institution)
  • Describe the type of music included in the dataset
  • Indicate any relevant papers related to the dataset
  • Include a description about how the data can be accessed and the license it uses (if applicable)

Dataset loaders checklist:

  • Create a script in scripts/, e.g. make_my_dataset_index.py, which generates an index file.
  • Run the script on the canonical version of the dataset and save the index in mirdata/indexes/ e.g. my_dataset_index.json.
  • Create a module in mirdata, e.g. mirdata/my_dataset.py
  • Create tests for your loader in tests/, e.g. test_my_dataset.py
  • Add your module to docs/source/mirdata.rst and docs/source/datasets.rst
  • Add the module to mirdata/__init__.py
  • Add the module to the list in the README.md file, section Currently supported datasets

If your dataset is not fully downloadable there are two extra steps you should follow:
It is fully downloadable

Please-do-not-edit flag

To reduce friction, we will make commits on top of contributor's pull requests by default unless they use the please-do-not-edit flag. If you don't want this to happen don't forget to add the flag when you start your pull request.

@codecov
Copy link

codecov bot commented Oct 11, 2020

Codecov Report

Merging #286 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #286   +/-   ##
=======================================
  Coverage   99.04%   99.04%           
=======================================
  Files          24       24           
  Lines        2523     2523           
=======================================
  Hits         2499     2499           
  Misses         24       24           

# This dataset annotations have characters that doesnt correspond with json format. In particular, "bpm": nan
# doesn't correspond to json format.
# input file
find_replace(os.path.join(data_home, "meta"), ": nan", ": null", "*.json")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the role of this function when you have an index for all the files in the dataset?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original JSON files have nan that are not of the JSON standard format. So I can't read JSON. So from what is downloaded of Zenodo is changed each nan to none.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why the function is there now. Maybe it would be more informative adding the long comment on the readme of the function above and leaving a short one here. Also, thinking out loud, I guess is a good choice to do it in the download so you do it only once and in a consistent way to the index, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, right

@@ -249,7 +249,7 @@ def jams_converter(


def beats_to_jams(beats):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file should not be included in this PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Black modifies it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try running black -S

@@ -287,14 +287,14 @@ def track_ids():
def load(data_home=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file should not be included in this PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Black modifies it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try running it as defined in CONTRIBUTING
black --target-version py37 --skip-string-normalization mirdata/
as a rule of thumb try avoiding global commits and refer to specific files.
you can discard the commits on a specific file by updating the master and replacing your version of the file with the version from the master
git checkout master file_to_restore.ext

mirdata/utils.py Outdated
@@ -208,10 +208,10 @@ def load_json_index(filename):

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file should not be included in this PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Black modifies it.

Comment on lines 33 to 34
| giantsteps_key_ | Giantsteps EDM key | - audio: ✅ | - global :ref:`key` | 1486 |
| | | - annotations: ✅ | | |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be Beatport?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the audio provided? or should it be an X?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!!!
yes!! the audio is provided!!

@@ -0,0 +1,348 @@
# -*- coding: utf-8 -*-
"""beatport_key Dataset Loader
The Beatport EDM Key Dataset includes 1,486 two-minute sound excerpts from various EDM
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, I think we're using 1486 as notation for now, will see for bigger datasets

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok perfect!
done


Data License: Creative Commons Attribution Share Alike 4.0 International
"""
import fnmatch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leaving a comment here to check if this is in requirements or partial requirements. If it is ignore this comment, if it's not we should discuss it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that fnmatch is from "python standard"

Comment on lines +143 to +146
def find_replace(directory, find, replace, pattern):
"""
Replace in some directory all the songs with the format pattern find by replace
Parameters
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point reading the module I'm not sure why this function is needed. Maybe expand on the docs why is it here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment below

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok! done then


@utils.cached_property
def key(self):
"""String: key annotation"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noticed we're mixing docs styles. I don't know if this is the only loader that does it, but I'll add it as part of the audit issue to make sure we are consistent later on

Copy link
Collaborator Author

@PRamoneda PRamoneda Oct 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was imitating the Beatles loader. What style must we use?

# This dataset annotations have characters that doesnt correspond with json format. In particular, "bpm": nan
# doesn't correspond to json format.
# input file
find_replace(os.path.join(data_home, "meta"), ": nan", ": null", "*.json")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why the function is there now. Maybe it would be more informative adding the long comment on the readme of the function above and leaving a short one here. Also, thinking out loud, I guess is a good choice to do it in the download so you do it only once and in a consistent way to the index, right?

@@ -249,7 +249,7 @@ def jams_converter(


def beats_to_jams(beats):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try running black -S

},
"guest_pick": false,
"sub_genres": []
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: end line


audio, sr = track.audio
assert sr == 44100, 'sample rate {} is not 44100'.format(sr)
assert audio.shape == (5292000,), 'audio shape {} was not (5294592,)'.format(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't numbers match here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@magdalenafuentes
Copy link
Collaborator

@PRamoneda great job! Left my comments there. After you address them I'll take a look again, but first maybe it would be good to also merge the master so I can check and merge right away?

@PRamoneda
Copy link
Collaborator Author

@magdalenafuentes I dont know I have requested or not the review! hahahaa Thnks so much!

@magdalenafuentes magdalenafuentes merged commit d1570b2 into mir-dataset-loaders:master Oct 22, 2020
nkundiushuti pushed a commit that referenced this pull request Nov 4, 2020
 
Adding loader for Beatport EDM key (#286)

* make script and module

* fix nan problem

* fixed tests

* black

* Update README.md

add to README list

* Update mirdata.rst

* Update datasets.rst

* fix codecov

* fix codecov xd

* added test in order to pass codecov

* Update datasets.rst

* Update test_beatport_key.py

numbers must match

* 1486 without comma

1486 without comma

* move comments

* try black
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants