-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset object #296
Dataset object #296
Conversation
@magdalenafuentes - can you take a quick look and let me know if you have any concerns? Note that the downloader has a default, but is configurable if needed, as in Once it looks good to you, i'll start porting the rest over & write some tests! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exciting PR! <3
Left minor comments, it's looking great!
Codecov Report
@@ Coverage Diff @@
## master #296 +/- ##
==========================================
+ Coverage 98.97% 99.05% +0.07%
==========================================
Files 25 25
Lines 2638 2223 -415
==========================================
- Hits 2611 2202 -409
+ Misses 27 21 -6 |
…, better error handling
mirdata/dataset.py
Outdated
module = importlib.import_module("mirdata.{}".format(dataset)) | ||
self.dataset = dataset | ||
self.bibtex = getattr(module, "BIBTEX", "") | ||
self._remotes = getattr(module, "REMOTES", {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment from @drubinstein : make all missing variables None
print("========== BibTeX ==========") | ||
print(self.bibtex) | ||
|
||
def download(self, partial_download=None, force_overwrite=False, cleanup=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this class overwrite the download inside the loader?
in Saraga we need to adapt the download to query the multi-track audio files individually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for loaders that need a non-standard download method, you can set a _download
method in the module and it uses that instead. In this PR see for example maestro.py
Hey @rabitt just FYI we will merge things in this order: dataset object, index version, lfs. I'm working on the index one locally and will create PR once you've merged this one. |
Merging this! FYI @genisplaja @PRamoneda and @nkundiushuti - when it comes time to merge the open data loaders I can help port it to the new API, just ask! One major change to be aware of - the "DATASET_DIR" where the data actually lives on your computer is now always the same as the name of the module. |
Dataset object (#296) * Dataset object, heavily inspired by the RFC in #219 * update top-level docs, adapt two loaders * update dataset api * update all loaders to fit new API * remove outdated test * update tests, inherit dataset-specific load functions, docstring hack, better error handling * remove data_home from Track docstrings * normalize dataset_dir to match module name, removes need for DATASET_DIR * update test_full dataset; fix introduced bug in orchset * fix bug in orchset download method #309 * consolodate track.py and dataset.py into core.py * create datasets submodule * fix import bug in tests * hack around git case sensitiveness * hack back around git case sensitiveness * hack around git ignore case changes * hack back around git ignoring case changes * fix capitalization in tests paths * port beatport key to 0.3 Co-authored-by: Rachel Bittner <rachelbittner@spotify.com>
Congratulations @rabitt and all! 🎉 |
Implements #293 - this is a breaking, non-backwards compatible change, so we're leaping from 0.2.0 to 0.3.0.
validate
,load
(-->load_tracks
),track_ids
,cite
anddownload
into a common Dataset class^Track
,Dataset
andMultiTrack
base objects into a new modulecore.py
mirdata.datasets
submodule for all loadersdataset.DATASET_DIR
, sets default folder name to the module name^This is heavily based on the RFC from #219, and related to #225 !
Dataset
classREADME
CONTRIBUTING