Dataset object #296

rabitt · 2020-10-20T08:25:51Z

Implements #293 - this is a breaking, non-backwards compatible change, so we're leaping from 0.2.0 to 0.3.0.

Abstracts validate, load (--> load_tracks), track_ids, cite and download into a common Dataset class^
Implements dataset.choice #228
would close move ".cite" string to docstring & return docstring #256 (it becomes irrelevant)
Enables Search by citation metadata #227 in the future
moves Track, Dataset and MultiTrack base objects into a new module core.py
creats a mirdata.datasets submodule for all loaders
removes dataset.DATASET_DIR, sets default folder name to the module name

^This is heavily based on the RFC from #219, and related to #225 !

rabitt · 2020-10-20T08:27:34Z

@magdalenafuentes - can you take a quick look and let me know if you have any concerns? Note that the downloader has a default, but is configurable if needed, as in maestro. It will reduce the lines of code by a LOT but is a breaking change to the top-level API.

Once it looks good to you, i'll start porting the rest over & write some tests!

magdalenafuentes

Exciting PR! <3

Left minor comments, it's looking great!

CONTRIBUTING.md

README.md

mirdata/dataset.py

mirdata/utils.py

codecov · 2020-10-21T01:46:46Z

Codecov Report

Merging #296 into master will increase coverage by 0.07%.
The diff coverage is 98.34%.

@@            Coverage Diff             @@
##           master     #296      +/-   ##
==========================================
+ Coverage   98.97%   99.05%   +0.07%     
==========================================
  Files          25       25              
  Lines        2638     2223     -415     
==========================================
- Hits         2611     2202     -409     
+ Misses         27       21       -6

…, better error handling

mirdata/dataset.py

rabitt · 2020-10-26T20:13:32Z

mirdata/dataset.py

+        module = importlib.import_module("mirdata.{}".format(dataset))
+        self.dataset = dataset
+        self.bibtex = getattr(module, "BIBTEX", "")
+        self._remotes = getattr(module, "REMOTES", {})


comment from @drubinstein : make all missing variables None

docs/source/example.rst

nkundiushuti · 2020-10-28T18:31:24Z

mirdata/core.py

+        print("========== BibTeX ==========")
+        print(self.bibtex)
+
+    def download(self, partial_download=None, force_overwrite=False, cleanup=True):


does this class overwrite the download inside the loader?
in Saraga we need to adapt the download to query the multi-track audio files individually.

for loaders that need a non-standard download method, you can set a _download method in the module and it uses that instead. In this PR see for example maestro.py

magdalenafuentes · 2020-11-02T17:19:24Z

Hey @rabitt just FYI we will merge things in this order: dataset object, index version, lfs. I'm working on the index one locally and will create PR once you've merged this one.

rabitt · 2020-11-03T22:03:22Z

Merging this! FYI @genisplaja @PRamoneda and @nkundiushuti - when it comes time to merge the open data loaders I can help port it to the new API, just ask!

One major change to be aware of - the "DATASET_DIR" where the data actually lives on your computer is now always the same as the name of the module.

Dataset object (#296) * Dataset object, heavily inspired by the RFC in #219 * update top-level docs, adapt two loaders * update dataset api * update all loaders to fit new API * remove outdated test * update tests, inherit dataset-specific load functions, docstring hack, better error handling * remove data_home from Track docstrings * normalize dataset_dir to match module name, removes need for DATASET_DIR * update test_full dataset; fix introduced bug in orchset * fix bug in orchset download method #309 * consolodate track.py and dataset.py into core.py * create datasets submodule * fix import bug in tests * hack around git case sensitiveness * hack back around git case sensitiveness * hack around git ignore case changes * hack back around git ignoring case changes * fix capitalization in tests paths * port beatport key to 0.3 Co-authored-by: Rachel Bittner <rachelbittner@spotify.com>

lostanlen · 2020-11-07T22:02:28Z

Congratulations @rabitt and all! 🎉

Rachel Bittner added 3 commits October 20, 2020 00:48

Dataset object, heavily inspired by the RFC in #219

a7c3d03

update top-level docs, adapt two loaders

a710b0f

major version change

8a71bee

rabitt requested a review from magdalenafuentes October 20, 2020 08:25

magdalenafuentes reviewed Oct 20, 2020

View reviewed changes

integrate review comments

c53fa77

rabitt mentioned this pull request Oct 20, 2020

Generalize copy-paste dataset functions to utils? #293

Closed

Rachel Bittner added 3 commits October 20, 2020 18:39

update dataset api

c9f0e5d

update all loaders to fit new API

7b95a83

remove outdated test

4fe7371

update tests, inherit dataset-specific load functions, docstring hack…

b5e8d3d

…, better error handling

rabitt commented Oct 22, 2020

View reviewed changes

mirdata/dataset.py Outdated Show resolved Hide resolved

magdalenafuentes added this to the 0.3 milestone Oct 23, 2020

Rachel Bittner added 2 commits October 23, 2020 16:23

remove data_home from Track docstrings

552e88a

beta v0.3

e4a3976

rabitt mentioned this pull request Oct 26, 2020

Dataset.info() #300

Closed

rabitt commented Oct 26, 2020

View reviewed changes

Rachel Bittner added 11 commits October 26, 2020 16:22

normalize dataset_dir to match module name, removes need for DATASET_DIR

0c27892

update test_full dataset; fix introduced bug in orchset

19f2131

fix bug in orchset download method

23560fe

consolodate track.py and dataset.py into core.py

f150ecd

create datasets submodule

9e630a1

fix import bug in tests

250e2d7

hack around git case sensitiveness

35dc732

hack back around git case sensitiveness

0c7c01a

hack around git ignore case changes

bf195e3

hack back around git ignoring case changes

ba6dbce

fix capitalization in tests paths

a47085e

Rachel Bittner added 7 commits October 26, 2020 19:50

fixing tests

ed9e9ea

last test maybe

ad15b56

initial merge with master

5401313

port beatport key to 0.3

e551691

test automodule for datasets

639c6b5

update datasets

a5acb26

format docstring

90c76ee

rabitt changed the title ~~[WIP] Dataset object~~ Dataset object Oct 27, 2020

rabitt requested a review from magdalenafuentes October 27, 2020 05:10

magdalenafuentes approved these changes Oct 27, 2020

View reviewed changes

docs/source/example.rst Show resolved Hide resolved

magdalenafuentes approved these changes Oct 27, 2020

View reviewed changes

rabitt mentioned this pull request Oct 27, 2020

Revert "MultiTrack class" #303

Merged

nkundiushuti reviewed Oct 28, 2020

View reviewed changes

nkundiushuti approved these changes Oct 28, 2020

View reviewed changes

Rachel Bittner added 4 commits October 29, 2020 18:49

update contributing

1c8d919

fix merge conflicts

a1236ce

update dataset to new api

a559bf8

update test location

20cb153

rabitt mentioned this pull request Oct 30, 2020

dataset download bug #309

Closed

merge with master

b4e496c

rabitt merged commit 753ef90 into master Nov 3, 2020

This was referenced Nov 3, 2020

dataset.choice #228

Closed

Dataset hyphenation #210

Closed

Should dataset modules have a common Dataset class? #225

Closed

rabitt mentioned this pull request Dec 17, 2020

Dataset object per loader #366

Closed

rabitt deleted the dataset-object branch February 11, 2021 18:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset object #296

Dataset object #296

rabitt commented Oct 20, 2020 •

edited

Loading

rabitt commented Oct 20, 2020

magdalenafuentes left a comment •

edited

Loading

codecov bot commented Oct 21, 2020 •

edited

Loading

rabitt Oct 26, 2020

nkundiushuti Oct 28, 2020

rabitt Oct 29, 2020

magdalenafuentes commented Nov 2, 2020

rabitt commented Nov 3, 2020

lostanlen commented Nov 7, 2020

Dataset object #296

Dataset object #296

Conversation

rabitt commented Oct 20, 2020 • edited Loading

rabitt commented Oct 20, 2020

magdalenafuentes left a comment • edited Loading

Choose a reason for hiding this comment

codecov bot commented Oct 21, 2020 • edited Loading

Codecov Report

rabitt Oct 26, 2020

Choose a reason for hiding this comment

nkundiushuti Oct 28, 2020

Choose a reason for hiding this comment

rabitt Oct 29, 2020

Choose a reason for hiding this comment

magdalenafuentes commented Nov 2, 2020

rabitt commented Nov 3, 2020

lostanlen commented Nov 7, 2020

rabitt commented Oct 20, 2020 •

edited

Loading

magdalenafuentes left a comment •

edited

Loading

codecov bot commented Oct 21, 2020 •

edited

Loading