-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding loader for EGFxSet #556
Adding loader for EGFxSet #556
Conversation
Codecov Report
@@ Coverage Diff @@
## master #556 +/- ##
==========================================
+ Coverage 96.81% 96.85% +0.04%
==========================================
Files 54 55 +1
Lines 6620 6705 +85
==========================================
+ Hits 6409 6494 +85
Misses 211 211 |
Hello @mezaga and @hegelespaul. I have finished creating the dataset index and initializing the dataset module. You can go ahead and work on your assigned tasks. |
Thank you @hegelespaul and @mezaga for your work with this PR. I will review and make comments/suggestions/changes before we remove the WIP tag and send to the mirdata admins. Cheers! |
quick update. I ran |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @hegelespaul and @mezaga. Please see my comments and let's discuss changes that may be needed. I think it would be good to have this ready today.
mirdata/datasets/egfxset.py
Outdated
This dataset was conceived during Iran Roman's "Deep Learning for Music Information Retrieval" course | ||
imparted in the postgraduate studies in music technology at the UNAM (Universidad Nacional Autónoma de México). | ||
The result is a combined effort between two UNAM postgraduate students (Hegel Pedroza and Gerardo Meza) and Iran Roman(NYU). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to the end of this section (i.e. after the paragraph saying "The effects employed were ..."). Or remove
mirdata/datasets/egfxset.py
Outdated
All possible 138 notes of a standard tuning 22 frets guitar were recorded in each one of the 5 pickup configurations, | ||
giving a total of 690 clean tone audio files ( 58 min ). | ||
|
||
The 690 audio files were processed through 12 different audio effects employing actual guitar gear (no emulations), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to: "The 690 'clean' audio files ..."
change to: "(no VST emulations were used)"
mirdata/datasets/egfxset.py
Outdated
Modulation: | ||
Boss CE-3: Chorus | ||
MXR Phase 45: Phaser | ||
Mooer E-Lady: Flangergh pr checkout 556 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Flangergh pr checkout 556" ?
mirdata/datasets/egfxset.py
Outdated
Knob types | ||
Knob settings | ||
|
||
For more details, please visit https://zenodo.org/record/7044411 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change for:
the dataset website is: (link to website)
the data can be accessed here: (link to zenodo)
an ISMIR extended abstract was presented in 2022: (link to the ISMIR proceedings; maybe this link for now https://ismir2022.ismir.net/program/lbd/)
mirdata/datasets/egfxset.py
Outdated
Attributes: | ||
audio_path (str): path to the track's audio file | ||
track_id (str): track id | ||
stringfret_tuple (str): the tuple of the note recorded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stringfret_tuple (str):
is this really an str
? maybe we should cast it to be an actual tuple of int
s
mirdata/datasets/egfxset.py
Outdated
audio_path (str): path to the track's audio file | ||
track_id (str): track id | ||
stringfret_tuple (str): the tuple of the note recorded | ||
note (str): the notename of the file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the note name as in A, A#, etc? or is it something else? Please make the description more thorough. Also, how is this different to the track_id
?
mirdata/datasets/egfxset.py
Outdated
stringfret_tuple (str): the tuple of the note recorded | ||
note (str): the notename of the file | ||
midinote (int): the midinote value | ||
pickup_configuration: the pickup used in the recording |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the type of this? str
?
mirdata/datasets/egfxset.py
Outdated
for row in csv_reader: | ||
key = os.path.splitext(os.path.split(row[0])[1])[0] | ||
for track in tracknames: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you have to iterate over all tracknames
and all rows in the csv_reader
? This is redundant and inefficient. Please discuss what you are trying to do here and let's revise together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I've done all the changes already except this one, and the CSV in the test file, since we need it for the test. The main reason the dataset iterate through all the tracks is that our CSV does not have a list with track names, and the metadata is written by matching the track id's first letters of the effect to the CSV info. I have been trying to call this metadata outside the dataset definition like an older version of tinysol did but I haven't been successful at it. The other option is to change our metadata file to one that has all the track names (like tinysol does) and therefore remove the iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. Can we use a dictionary instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the last push, I resolved the nested looping by having a small list that serves as an index of the rows in the CSV before going to the entire tracks to write the metadata. Is way faster than the panda solution and has passed all the checks. Do you think this is the right solution?
@@ -0,0 +1,13 @@ | |||
Effect ,Model,Effect Type,Knob Names,Knob Type,Setting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the exact same file as our egfxset_metadata
in Zenodo? If yes, do we need all the lines for the tests? If no, shouldn't they be?
@@ -0,0 +1 @@ | |||
This is a test file. It is validated using checksum. Do not change the contents of this file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove this test file. I do not think it should be here at all, right @harshpalan ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without it, the download utils test marks 3 errors:
tests/test_download_utils.py ..F..FF.........
with it:
tests/test_download_utils.py ................
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this file is only meant to be used in local tests, but it is not needed for github builds/tests. I'll remove it.
Also, I think you need to add |
mirdata/datasets/egfxset.py
Outdated
BIBTEX = """ | ||
@article{pedrozaegfxset, | ||
title={EGFxSet: Electric guitar tones processed through real effects of distortion, modulation, delay and reverb}, | ||
author={Pedroza, Hegel and Meza, Gerardo and Roman, Iran} | ||
} | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's change this for:
BIBTEX = """
@article{pedroza2022egfxset,
title={EGFxSet: Electric guitar tones processed through real effects of distortion, modulation, delay and reverb},
author={Pedroza, Hegel and Meza, Gerardo and Roman, Iran},
year={2022},
institution={UNAM},
booktitle={Extended Abstracts for the Late-Breaking Demo Session of the 23rd Int. Society for Music Information Retrieval Conf., Bengaluru, India, 2022.},
}
"""
Thank you @hegelespaul and @mezaga for your work on this PR. I think it's ready. @harshpalan @magdalenafuentes, I'm removing the WIP tag. Let us know if you would like us to address anything else in order to be able to merge. Cheers! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hegelespaul @mezaga @iranroman Great job with this PR!! It is in great shape, and I'm very happy to have a Mexican contribution in mirdata
❤️
I left a few comments, once you address them we can merge. Let me know if anything is not clear!
mirdata/datasets/egfxset.py
Outdated
Guitar string number | ||
Fret number | ||
Guitar pickup configuration | ||
Effect name | ||
Effect type | ||
Hardware modes | ||
Knob names | ||
Knob types | ||
Knob settings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not rendering as a list but as a continues paragraph
mirdata/datasets/egfxset.py
Outdated
The dataset website is: https://egfxset.github.io/ | ||
The data can be accessed here: https://zenodo.org/record/7044411#.YxKdSWzMKEI | ||
An ISMIR extended abstract was presented in 2022: https://ismir2022.ismir.net/program/lbd/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, you should format it as a list
mirdata/datasets/egfxset.py
Outdated
Attributes: | ||
audio_path (str): path to the track's audio file | ||
stringfret_tuple (list): an array with the tuple of the note recorded | ||
note (str): the notename of the file (i.e. D1,Eb4, etc.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mirdata
has a NoteData
class, you should use that instead and indicate the units as note_name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did all changes, this was the trickiest one by far but I think we've got it, please let us know if any changes are needed.
mirdata/datasets/egfxset.py
Outdated
return metadata_index | ||
|
||
def load_audio(self, *args, **kwargs): | ||
return load_audio(*args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note that this line is not covered by tests
tests/datasets/test_egfxset.py
Outdated
|
||
|
||
def test_track(): | ||
data_home = "tests/resources/mir_datasets/egfxset" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add os.path.normpath
to any hardcoded path (i.e. paths with "/") so your tests run on Windows (see PR #567)
tests/datasets/test_egfxset.py
Outdated
data_home = "tests/resources/mir_datasets/egfxset" | ||
dataset = egfxset.Dataset(data_home, version="test") | ||
|
||
track = dataset.track("TapeEcho_Bridge/2-0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
os.path.normpath
tests/datasets/test_egfxset.py
Outdated
|
||
expected_attributes = { | ||
"track_id": "TapeEcho_Bridge/2-0", | ||
"audio_path": "tests/resources/mir_datasets/egfxset/TapeEcho/Bridge/2-0.wav", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
os.path.normpath
there are a few more below
tests/datasets/test_egfxset.py
Outdated
|
||
audio, sr = track.audio | ||
assert sr == 48000 | ||
assert audio.shape == (48000 * 5,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use 1 second audio for the tests, please reduce the size of the test audio files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hegelespaul great job, almost done! Just a few nit comments and we're good to go. Thanks for your contribution!
tests/datasets/test_egfxset.py
Outdated
|
||
expected_property_types = { | ||
"audio": tuple, | ||
"note": annotations.NoteData, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small thing: now that note is a NoteData
object, it would be good to check that its intervals
and pitches
values are returning what you expect by simply adding an extra assert
check for each
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The asserts were added
- Knob names | ||
- Knob types | ||
- Knob settings | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: you might want to add extra spaces for formating here, see the rendering of the docs here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the Dataset info is more legible now, I've done the changes. In the end, it has two cached properties: note_name and midinote, the first one using a function of annotations and the other one, the class you told us about before. Both properties have asserts to see if expected results get accomplished. Please, let me know if more changes are needed. Thanks for all!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hegelespaul @mezaga @iranroman Great job! I'm merging this and including it in the release for ISMIR. Felicitaciones!
EGFxSet: Electric guitar tones processed through real effects of gain, modulation, delay and reverb
Description
Please include the following information at the top level docstring for the dataset's module mydataset.py:
Dataset loaders checklist:
Create a script in
scripts/
, e.g.make_my_dataset_index.py
, which generates an index file.Run the script on the canonical version of the dataset and save the index in
mirdata/indexes/
e.g.my_dataset_index.json
.Create a module in mirdata, e.g.
mirdata/my_dataset.py
Create tests for your loader in
tests/datasets/
, e.g.test_my_dataset.py
Add your module to
docs/source/mirdata.rst
anddocs/source/quick_reference.rst
Run
tests/test_full_dataset.py
on your dataset.Make sure someone has run
pytest -s tests/test_full_dataset.py --local --dataset my_dataset
once on your dataset locally and confirmed it passes