Adding loader for EGFxSet #556

iranroman · 2022-09-02T19:40:45Z

EGFxSet: Electric guitar tones processed through real effects of gain, modulation, delay and reverb

Description

Please include the following information at the top level docstring for the dataset's module mydataset.py:

Describe annotations included in the dataset
- assigned to @iranroman
Indicate the size of the datasets (e.g. number files and duration, hours)
- assigned to @hegelespaul
Mention the origin of the dataset (e.g. creator, institution)
- assigned to @hegelespaul
Describe the type of music included in the dataset
- assigned to @mezaga
Indicate any relevant papers related to the dataset
- assigned to @iranroman
Include a description about how the data can be accessed and the license it uses (if applicable)
- assigned to @iranroman

Dataset loaders checklist:

Create a script in scripts/, e.g. make_my_dataset_index.py, which generates an index file.
- assigned to @iranroman
Run the script on the canonical version of the dataset and save the index in mirdata/indexes/ e.g. my_dataset_index.json.
- assigned to @iranroman
Create a module in mirdata, e.g. mirdata/my_dataset.py
- assigned to @mezaga
Create tests for your loader in tests/datasets/, e.g. test_my_dataset.py
- assigned to @mezaga
Add your module to docs/source/mirdata.rst and docs/source/quick_reference.rst
- assigned to @hegelespaul
Run tests/test_full_dataset.py on your dataset.
- assigned to @hegelespaul
Make sure someone has run pytest -s tests/test_full_dataset.py --local --dataset my_dataset once on your dataset locally and confirmed it passes
- assigned to @iranroman

codecov · 2022-09-02T19:46:09Z

Codecov Report

Merging #556 (be96982) into master (02cd5c0) will increase coverage by 0.04%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #556      +/-   ##
==========================================
+ Coverage   96.81%   96.85%   +0.04%     
==========================================
  Files          54       55       +1     
  Lines        6620     6705      +85     
==========================================
+ Hits         6409     6494      +85     
  Misses        211      211

iranroman · 2022-09-16T22:05:59Z

Hello @mezaga and @hegelespaul. I have finished creating the dataset index and initializing the dataset module. You can go ahead and work on your assigned tasks.

iranroman · 2022-11-17T13:43:01Z

Thank you @hegelespaul and @mezaga for your work with this PR. I will review and make comments/suggestions/changes before we remove the WIP tag and send to the mirdata admins. Cheers!

iranroman · 2022-11-18T03:28:18Z

quick update. I ran pytest -s tests/test_full_dataset.py --local --dataset egfxset and everything passed. I'll do the code review now.

iranroman

Great work @hegelespaul and @mezaga. Please see my comments and let's discuss changes that may be needed. I think it would be good to have this ready today.

iranroman · 2022-11-18T13:42:25Z

mirdata/datasets/egfxset.py

+    This dataset was conceived during Iran Roman's "Deep Learning for Music Information Retrieval" course
+    imparted in the postgraduate studies in music technology at the UNAM (Universidad Nacional Autónoma de México). 
+    The result is a combined effort between two UNAM postgraduate students (Hegel Pedroza and Gerardo Meza) and Iran Roman(NYU).    


move to the end of this section (i.e. after the paragraph saying "The effects employed were ..."). Or remove

iranroman · 2022-11-18T13:43:31Z

mirdata/datasets/egfxset.py

+    All possible 138 notes of a standard tuning 22 frets guitar were recorded in each one of the 5 pickup configurations,
+    giving a total of 690 clean tone audio files ( 58 min ).
+
+    The 690 audio files were processed through 12 different audio effects employing actual guitar gear (no emulations),


change to: "The 690 'clean' audio files ..."

change to: "(no VST emulations were used)"

iranroman · 2022-11-18T13:44:56Z

mirdata/datasets/egfxset.py

+        Modulation:
+            Boss CE-3: Chorus
+            MXR Phase 45: Phaser
+            Mooer E-Lady: Flangergh pr checkout 556


"Flangergh pr checkout 556" ?

iranroman · 2022-11-18T13:49:20Z

mirdata/datasets/egfxset.py

+    Knob types
+    Knob settings
+
+    For more details, please visit https://zenodo.org/record/7044411


change for:

the dataset website is: (link to website) the data can be accessed here: (link to zenodo) an ISMIR extended abstract was presented in 2022: (link to the ISMIR proceedings; maybe this link for now https://ismir2022.ismir.net/program/lbd/)

iranroman · 2022-11-18T13:50:32Z

mirdata/datasets/egfxset.py

+    Attributes:
+        audio_path (str): path to the track's audio file
+        track_id (str): track id
+        stringfret_tuple (str): the tuple of the note recorded


stringfret_tuple (str):
is this really an str? maybe we should cast it to be an actual tuple of ints

iranroman · 2022-11-18T13:51:42Z

mirdata/datasets/egfxset.py

+        audio_path (str): path to the track's audio file
+        track_id (str): track id
+        stringfret_tuple (str): the tuple of the note recorded
+        note (str): the notename of the file


is this the note name as in A, A#, etc? or is it something else? Please make the description more thorough. Also, how is this different to the track_id?

iranroman · 2022-11-18T13:52:04Z

mirdata/datasets/egfxset.py

+        stringfret_tuple (str): the tuple of the note recorded
+        note (str): the notename of the file
+        midinote (int): the midinote value
+        pickup_configuration: the pickup used in the recording


what's the type of this? str?

iranroman · 2022-11-18T13:57:11Z

mirdata/datasets/egfxset.py

+                for row in csv_reader:
+                    key = os.path.splitext(os.path.split(row[0])[1])[0]
+                    for track in tracknames:


why do you have to iterate over all tracknames and all rows in the csv_reader? This is redundant and inefficient. Please discuss what you are trying to do here and let's revise together.

Hi, I've done all the changes already except this one, and the CSV in the test file, since we need it for the test. The main reason the dataset iterate through all the tracks is that our CSV does not have a list with track names, and the metadata is written by matching the track id's first letters of the effect to the CSV info. I have been trying to call this metadata outside the dataset definition like an older version of tinysol did but I haven't been successful at it. The other option is to change our metadata file to one that has all the track names (like tinysol does) and therefore remove the iteration.

Thanks for the explanation. Can we use a dictionary instead?

In the last push, I resolved the nested looping by having a small list that serves as an index of the rows in the CSV before going to the entire tracks to write the metadata. Is way faster than the panda solution and has passed all the checks. Do you think this is the right solution?

iranroman · 2022-11-18T13:59:17Z

tests/resources/mir_datasets/egfxset/egfxset_metadata.csv

@@ -0,0 +1,13 @@
+Effect ,Model,Effect Type,Knob Names,Knob Type,Setting 


is this the exact same file as our egfxset_metadata in Zenodo? If yes, do we need all the lines for the tests? If no, shouldn't they be?

iranroman · 2022-11-18T14:04:14Z

tests/resources/tmp_download_test/remote.wav

@@ -0,0 +1 @@
+This is a test file. It is validated using checksum. Do not change the contents of this file.


please remove this test file. I do not think it should be here at all, right @harshpalan ?

without it, the download utils test marks 3 errors:

tests/test_download_utils.py ..F..FF.........

with it:

tests/test_download_utils.py ................

I think this file is only meant to be used in local tests, but it is not needed for github builds/tests. I'll remove it.

iranroman · 2022-11-18T14:28:07Z

Also, I think you need to add EGFxSet to docs/source/table.rst

iranroman · 2022-11-18T15:52:17Z

mirdata/datasets/egfxset.py

+BIBTEX = """
+@article{pedrozaegfxset,
+  title={EGFxSet: Electric guitar tones processed through real effects of distortion, modulation, delay and reverb},
+  author={Pedroza, Hegel and Meza, Gerardo and Roman, Iran}
+}
+"""


let's change this for:

BIBTEX = """ @article{pedroza2022egfxset, title={EGFxSet: Electric guitar tones processed through real effects of distortion, modulation, delay and reverb}, author={Pedroza, Hegel and Meza, Gerardo and Roman, Iran}, year={2022}, institution={UNAM}, booktitle={Extended Abstracts for the Late-Breaking Demo Session of the 23rd Int. Society for Music Information Retrieval Conf., Bengaluru, India, 2022.}, } """

…gfxset Merge

iranroman · 2022-11-19T17:15:05Z

Thank you @hegelespaul and @mezaga for your work on this PR. I think it's ready.

@harshpalan @magdalenafuentes, I'm removing the WIP tag. Let us know if you would like us to address anything else in order to be able to merge. Cheers!

magdalenafuentes

@hegelespaul @mezaga @iranroman Great job with this PR!! It is in great shape, and I'm very happy to have a Mexican contribution in mirdata ❤️

I left a few comments, once you address them we can merge. Let me know if anything is not clear!

mirdata/datasets/egfxset.py

magdalenafuentes · 2022-11-23T18:49:38Z

mirdata/datasets/egfxset.py

+    Guitar string number
+    Fret number
+    Guitar pickup configuration
+    Effect name
+    Effect type
+    Hardware modes
+    Knob names
+    Knob types
+    Knob settings


This is not rendering as a list but as a continues paragraph

magdalenafuentes · 2022-11-23T18:50:05Z

mirdata/datasets/egfxset.py

+    The dataset website is: https://egfxset.github.io/
+    The data can be accessed here: https://zenodo.org/record/7044411#.YxKdSWzMKEI
+    An ISMIR extended abstract was presented in 2022: https://ismir2022.ismir.net/program/lbd/


Same here, you should format it as a list

magdalenafuentes · 2022-11-23T19:00:28Z

mirdata/datasets/egfxset.py

+    Attributes:
+        audio_path (str): path to the track's audio file
+        stringfret_tuple (list): an array with the tuple of the note recorded
+        note (str): the notename of the file (i.e. D1,Eb4, etc.)


mirdata has a NoteData class, you should use that instead and indicate the units as note_name

We did all changes, this was the trickiest one by far but I think we've got it, please let us know if any changes are needed.

mirdata/datasets/egfxset.py

magdalenafuentes · 2022-11-23T19:11:08Z

mirdata/datasets/egfxset.py

+        return metadata_index
+
+    def load_audio(self, *args, **kwargs):
+        return load_audio(*args, **kwargs)


Please note that this line is not covered by tests

magdalenafuentes · 2022-11-23T19:14:04Z