Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding loader for EGFxSet #556

Merged
merged 45 commits into from
Nov 25, 2022

Conversation

iranroman
Copy link
Contributor

@iranroman iranroman commented Sep 2, 2022

EGFxSet: Electric guitar tones processed through real effects of gain, modulation, delay and reverb

Description

Please include the following information at the top level docstring for the dataset's module mydataset.py:

  • Describe annotations included in the dataset
  • Indicate the size of the datasets (e.g. number files and duration, hours)
  • Mention the origin of the dataset (e.g. creator, institution)
  • Describe the type of music included in the dataset
  • Indicate any relevant papers related to the dataset
  • Include a description about how the data can be accessed and the license it uses (if applicable)

Dataset loaders checklist:

  • Create a script in scripts/, e.g. make_my_dataset_index.py, which generates an index file.

  • Run the script on the canonical version of the dataset and save the index in mirdata/indexes/ e.g. my_dataset_index.json.

  • Create a module in mirdata, e.g. mirdata/my_dataset.py

  • Create tests for your loader in tests/datasets/, e.g. test_my_dataset.py

  • Add your module to docs/source/mirdata.rst and docs/source/quick_reference.rst

  • Run tests/test_full_dataset.py on your dataset.

  • Make sure someone has run pytest -s tests/test_full_dataset.py --local --dataset my_dataset once on your dataset locally and confirmed it passes

@codecov
Copy link

codecov bot commented Sep 2, 2022

Codecov Report

Merging #556 (be96982) into master (02cd5c0) will increase coverage by 0.04%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #556      +/-   ##
==========================================
+ Coverage   96.81%   96.85%   +0.04%     
==========================================
  Files          54       55       +1     
  Lines        6620     6705      +85     
==========================================
+ Hits         6409     6494      +85     
  Misses        211      211              

@iranroman
Copy link
Contributor Author

Hello @mezaga and @hegelespaul. I have finished creating the dataset index and initializing the dataset module. You can go ahead and work on your assigned tasks.

@iranroman
Copy link
Contributor Author

Thank you @hegelespaul and @mezaga for your work with this PR. I will review and make comments/suggestions/changes before we remove the WIP tag and send to the mirdata admins. Cheers!

@iranroman
Copy link
Contributor Author

quick update. I ran pytest -s tests/test_full_dataset.py --local --dataset egfxset and everything passed. I'll do the code review now.

Copy link
Contributor Author

@iranroman iranroman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @hegelespaul and @mezaga. Please see my comments and let's discuss changes that may be needed. I think it would be good to have this ready today.

Comment on lines 11 to 13
This dataset was conceived during Iran Roman's "Deep Learning for Music Information Retrieval" course
imparted in the postgraduate studies in music technology at the UNAM (Universidad Nacional Autónoma de México).
The result is a combined effort between two UNAM postgraduate students (Hegel Pedroza and Gerardo Meza) and Iran Roman(NYU).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to the end of this section (i.e. after the paragraph saying "The effects employed were ..."). Or remove

All possible 138 notes of a standard tuning 22 frets guitar were recorded in each one of the 5 pickup configurations,
giving a total of 690 clean tone audio files ( 58 min ).

The 690 audio files were processed through 12 different audio effects employing actual guitar gear (no emulations),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to: "The 690 'clean' audio files ..."

change to: "(no VST emulations were used)"

Modulation:
Boss CE-3: Chorus
MXR Phase 45: Phaser
Mooer E-Lady: Flangergh pr checkout 556
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Flangergh pr checkout 556" ?

Knob types
Knob settings

For more details, please visit https://zenodo.org/record/7044411
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change for:

the dataset website is: (link to website)
the data can be accessed here: (link to zenodo)
an ISMIR extended abstract was presented in 2022: (link to the ISMIR proceedings; maybe this link for now https://ismir2022.ismir.net/program/lbd/)

Attributes:
audio_path (str): path to the track's audio file
track_id (str): track id
stringfret_tuple (str): the tuple of the note recorded
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stringfret_tuple (str):
is this really an str? maybe we should cast it to be an actual tuple of ints

audio_path (str): path to the track's audio file
track_id (str): track id
stringfret_tuple (str): the tuple of the note recorded
note (str): the notename of the file
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the note name as in A, A#, etc? or is it something else? Please make the description more thorough. Also, how is this different to the track_id?

stringfret_tuple (str): the tuple of the note recorded
note (str): the notename of the file
midinote (int): the midinote value
pickup_configuration: the pickup used in the recording
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the type of this? str?

Comment on lines 313 to 315
for row in csv_reader:
key = os.path.splitext(os.path.split(row[0])[1])[0]
for track in tracknames:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you have to iterate over all tracknames and all rows in the csv_reader? This is redundant and inefficient. Please discuss what you are trying to do here and let's revise together.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I've done all the changes already except this one, and the CSV in the test file, since we need it for the test. The main reason the dataset iterate through all the tracks is that our CSV does not have a list with track names, and the metadata is written by matching the track id's first letters of the effect to the CSV info. I have been trying to call this metadata outside the dataset definition like an older version of tinysol did but I haven't been successful at it. The other option is to change our metadata file to one that has all the track names (like tinysol does) and therefore remove the iteration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. Can we use a dictionary instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the last push, I resolved the nested looping by having a small list that serves as an index of the rows in the CSV before going to the entire tracks to write the metadata. Is way faster than the panda solution and has passed all the checks. Do you think this is the right solution?

@@ -0,0 +1,13 @@
Effect ,Model,Effect Type,Knob Names,Knob Type,Setting
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the exact same file as our egfxset_metadata in Zenodo? If yes, do we need all the lines for the tests? If no, shouldn't they be?

@@ -0,0 +1 @@
This is a test file. It is validated using checksum. Do not change the contents of this file.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this test file. I do not think it should be here at all, right @harshpalan ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without it, the download utils test marks 3 errors:

tests/test_download_utils.py ..F..FF.........

with it:

tests/test_download_utils.py ................

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this file is only meant to be used in local tests, but it is not needed for github builds/tests. I'll remove it.

@iranroman
Copy link
Contributor Author

Also, I think you need to add EGFxSet to docs/source/table.rst

Comment on lines 78 to 83
BIBTEX = """
@article{pedrozaegfxset,
title={EGFxSet: Electric guitar tones processed through real effects of distortion, modulation, delay and reverb},
author={Pedroza, Hegel and Meza, Gerardo and Roman, Iran}
}
"""
Copy link
Contributor Author

@iranroman iranroman Nov 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's change this for:

BIBTEX = """
@article{pedroza2022egfxset,
      title={EGFxSet: Electric guitar tones processed through real effects of distortion, modulation, delay and reverb}, 
      author={Pedroza, Hegel and Meza, Gerardo and Roman, Iran},
      year={2022},
      institution={UNAM},
      booktitle={Extended Abstracts for the Late-Breaking Demo Session of the 23rd Int. Society for Music Information Retrieval Conf., Bengaluru, India, 2022.},
}
"""

@iranroman
Copy link
Contributor Author

Thank you @hegelespaul and @mezaga for your work on this PR. I think it's ready.

@harshpalan @magdalenafuentes, I'm removing the WIP tag. Let us know if you would like us to address anything else in order to be able to merge. Cheers!

@iranroman iranroman changed the title [WIP] Adding loader for EGFxSet Adding loader for EGFxSet Nov 19, 2022
Copy link
Collaborator

@magdalenafuentes magdalenafuentes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hegelespaul @mezaga @iranroman Great job with this PR!! It is in great shape, and I'm very happy to have a Mexican contribution in mirdata ❤️

I left a few comments, once you address them we can merge. Let me know if anything is not clear!

mirdata/datasets/egfxset.py Show resolved Hide resolved
Comment on lines 51 to 59
Guitar string number
Fret number
Guitar pickup configuration
Effect name
Effect type
Hardware modes
Knob names
Knob types
Knob settings
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not rendering as a list but as a continues paragraph

Comment on lines 61 to 63
The dataset website is: https://egfxset.github.io/
The data can be accessed here: https://zenodo.org/record/7044411#.YxKdSWzMKEI
An ISMIR extended abstract was presented in 2022: https://ismir2022.ismir.net/program/lbd/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, you should format it as a list

Attributes:
audio_path (str): path to the track's audio file
stringfret_tuple (list): an array with the tuple of the note recorded
note (str): the notename of the file (i.e. D1,Eb4, etc.)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mirdata has a NoteData class, you should use that instead and indicate the units as note_name

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did all changes, this was the trickiest one by far but I think we've got it, please let us know if any changes are needed.

mirdata/datasets/egfxset.py Show resolved Hide resolved
return metadata_index

def load_audio(self, *args, **kwargs):
return load_audio(*args, **kwargs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that this line is not covered by tests



def test_track():
data_home = "tests/resources/mir_datasets/egfxset"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add os.path.normpath to any hardcoded path (i.e. paths with "/") so your tests run on Windows (see PR #567)

data_home = "tests/resources/mir_datasets/egfxset"
dataset = egfxset.Dataset(data_home, version="test")

track = dataset.track("TapeEcho_Bridge/2-0")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.path.normpath


expected_attributes = {
"track_id": "TapeEcho_Bridge/2-0",
"audio_path": "tests/resources/mir_datasets/egfxset/TapeEcho/Bridge/2-0.wav",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.path.normpath there are a few more below


audio, sr = track.audio
assert sr == 48000
assert audio.shape == (48000 * 5,)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use 1 second audio for the tests, please reduce the size of the test audio files

Copy link
Collaborator

@magdalenafuentes magdalenafuentes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hegelespaul great job, almost done! Just a few nit comments and we're good to go. Thanks for your contribution!


expected_property_types = {
"audio": tuple,
"note": annotations.NoteData,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small thing: now that note is a NoteData object, it would be good to check that its intervals and pitches values are returning what you expect by simply adding an extra assert check for each

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The asserts were added

- Knob names
- Knob types
- Knob settings

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: you might want to add extra spaces for formating here, see the rendering of the docs here

Copy link
Contributor

@hegelespaul hegelespaul Nov 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Dataset info is more legible now, I've done the changes. In the end, it has two cached properties: note_name and midinote, the first one using a function of annotations and the other one, the class you told us about before. Both properties have asserts to see if expected results get accomplished. Please, let me know if more changes are needed. Thanks for all!

Copy link
Collaborator

@magdalenafuentes magdalenafuentes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hegelespaul @mezaga @iranroman Great job! I'm merging this and including it in the release for ISMIR. Felicitaciones!

@magdalenafuentes magdalenafuentes merged commit 204750b into mir-dataset-loaders:master Nov 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants