-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Dataset class. Cross-module download, duration, from_jams, load, to_jams, and validate #219
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Codecov Report
@@ Coverage Diff @@
## master #219 +/- ##
=======================================
Coverage 79.19% 79.19%
=======================================
Files 21 21
Lines 2442 2442
=======================================
Hits 1934 1934
Misses 508 508 |
This was referenced Apr 8, 2020
Closed
lostanlen
changed the title
[WIP] Dataset class. Cross-module download, duration, from_jams, load, to_jams, and validate
[RFC] Dataset class. Cross-module download, duration, from_jams, load, to_jams, and validate
Apr 9, 2020
rabitt
pushed a commit
that referenced
this pull request
Oct 20, 2020
rabitt
added a commit
that referenced
this pull request
Nov 3, 2020
* Dataset object, heavily inspired by the RFC in #219 * update top-level docs, adapt two loaders * update dataset api * update all loaders to fit new API * remove outdated test * update tests, inherit dataset-specific load functions, docstring hack, better error handling * remove data_home from Track docstrings * normalize dataset_dir to match module name, removes need for DATASET_DIR * update test_full dataset; fix introduced bug in orchset * fix bug in orchset download method #309 * consolodate track.py and dataset.py into core.py * create datasets submodule * fix import bug in tests * hack around git case sensitiveness * hack back around git case sensitiveness * hack around git ignore case changes * hack back around git ignoring case changes * fix capitalization in tests paths * port beatport key to 0.3 Co-authored-by: Rachel Bittner <rachelbittner@spotify.com>
nkundiushuti
pushed a commit
that referenced
this pull request
Nov 4, 2020
Dataset object (#296) * Dataset object, heavily inspired by the RFC in #219 * update top-level docs, adapt two loaders * update dataset api * update all loaders to fit new API * remove outdated test * update tests, inherit dataset-specific load functions, docstring hack, better error handling * remove data_home from Track docstrings * normalize dataset_dir to match module name, removes need for DATASET_DIR * update test_full dataset; fix introduced bug in orchset * fix bug in orchset download method #309 * consolodate track.py and dataset.py into core.py * create datasets submodule * fix import bug in tests * hack around git case sensitiveness * hack back around git case sensitiveness * hack around git ignore case changes * hack back around git ignoring case changes * fix capitalization in tests paths * port beatport key to 0.3 Co-authored-by: Rachel Bittner <rachelbittner@spotify.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reduce module-level code to the bare essentials:
Closes #196, closes #197, closes #210, closes #217
Offers a development path towards closing #81, #153, #176, #184, and #196
New features:
to_jams
implementation. Theto_jams
method is shared across modules, and is implemented in the parent class.Dataset
. I re-used the implementation ofLargeData
so that there's no loss in performance. Dataset construction is still very fast.cite()
, but also accessible as a string viadataset.bibtex
Dataset
class overloads__getitem__
, so it's still possible to load tracks one by one. I figured that this would be easier to use for newcomers than for them to learn about theTrack
constructorload_index()
now turns the index paths into machine-specific absolute paths. As a result, there is no need to store data_home in the Track object anymore.dataset.choice()
picks a Track in the Dataset uniformly at random and loads it.remotes
. I called the kwargdownload_items
, in accordance with Download() refactor #216, but the kwarg seems a bit long in my opinion. Perhaps we can find a shorter name? I'll defer to @magdalenafuentes for this decision.from_jams
which converts a JAMS Annotation into mirdata namedtuples: BeatData, ChordData, KeyData, SectionData.parse_track_id
. I am using this in Guitarset. This ties down with mirdata doesn't work cleanly for datasets not on disk #128 and resolves Throw warning when annotation path does not exist #196 quite nicely. (we can error / warn / pass if metadata is not found).Demo:
Worth noting that these new features come alongside a ~4x reduction in line count.
(although i haven't removed anything from the previous implementation because i want unit tests to keep working)
The new class is called
track2.Track2
for the time beingCovered loaders:
beatles
gtzan_genre
guitarset
ikala
medley_solos_db
medleydb_melody
medleydb_pitch
orchset
rwc_classical
rwc_jazz
rwc_popular
salami
tinysol
(i'm not counting DALI because DALI is broken at the moment)
flag196
temporarilyLet me know your thoughts!