Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset object #296

Merged
merged 34 commits into from
Nov 3, 2020
Merged

Dataset object #296

merged 34 commits into from
Nov 3, 2020

Conversation

rabitt
Copy link
Collaborator

@rabitt rabitt commented Oct 20, 2020

Implements #293 - this is a breaking, non-backwards compatible change, so we're leaping from 0.2.0 to 0.3.0.

^This is heavily based on the RFC from #219, and related to #225 !

  • Add Dataset class
  • update usage examples in README
  • update CONTRIBUTING
  • update examples in docs
  • update all loaders to conform to new API
  • update tests
  • update test_full_dataset.py
  • add tests

@rabitt
Copy link
Collaborator Author

rabitt commented Oct 20, 2020

@magdalenafuentes - can you take a quick look and let me know if you have any concerns? Note that the downloader has a default, but is configurable if needed, as in maestro. It will reduce the lines of code by a LOT but is a breaking change to the top-level API.

Once it looks good to you, i'll start porting the rest over & write some tests!

Copy link
Collaborator

@magdalenafuentes magdalenafuentes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exciting PR! <3

Left minor comments, it's looking great!

CONTRIBUTING.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
mirdata/dataset.py Outdated Show resolved Hide resolved
mirdata/dataset.py Outdated Show resolved Hide resolved
mirdata/dataset.py Outdated Show resolved Hide resolved
mirdata/dataset.py Outdated Show resolved Hide resolved
mirdata/utils.py Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Oct 21, 2020

Codecov Report

Merging #296 into master will increase coverage by 0.07%.
The diff coverage is 98.34%.

@@            Coverage Diff             @@
##           master     #296      +/-   ##
==========================================
+ Coverage   98.97%   99.05%   +0.07%     
==========================================
  Files          25       25              
  Lines        2638     2223     -415     
==========================================
- Hits         2611     2202     -409     
+ Misses         27       21       -6     

mirdata/dataset.py Outdated Show resolved Hide resolved
@magdalenafuentes magdalenafuentes added this to the 0.3 milestone Oct 23, 2020
@rabitt rabitt mentioned this pull request Oct 26, 2020
module = importlib.import_module("mirdata.{}".format(dataset))
self.dataset = dataset
self.bibtex = getattr(module, "BIBTEX", "")
self._remotes = getattr(module, "REMOTES", {})
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment from @drubinstein : make all missing variables None

@rabitt rabitt changed the title [WIP] Dataset object Dataset object Oct 27, 2020
docs/source/example.rst Show resolved Hide resolved
@rabitt rabitt mentioned this pull request Oct 27, 2020
print("========== BibTeX ==========")
print(self.bibtex)

def download(self, partial_download=None, force_overwrite=False, cleanup=True):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this class overwrite the download inside the loader?
in Saraga we need to adapt the download to query the multi-track audio files individually.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for loaders that need a non-standard download method, you can set a _download method in the module and it uses that instead. In this PR see for example maestro.py

@rabitt rabitt mentioned this pull request Oct 30, 2020
@magdalenafuentes
Copy link
Collaborator

Hey @rabitt just FYI we will merge things in this order: dataset object, index version, lfs. I'm working on the index one locally and will create PR once you've merged this one.

@rabitt
Copy link
Collaborator Author

rabitt commented Nov 3, 2020

Merging this! FYI @genisplaja @PRamoneda and @nkundiushuti - when it comes time to merge the open data loaders I can help port it to the new API, just ask!

One major change to be aware of - the "DATASET_DIR" where the data actually lives on your computer is now always the same as the name of the module.

@rabitt rabitt merged commit 753ef90 into master Nov 3, 2020
nkundiushuti pushed a commit that referenced this pull request Nov 4, 2020
 
Dataset object (#296)

* Dataset object, heavily inspired by the RFC in #219

* update top-level docs, adapt two loaders

* update dataset api

* update all loaders to fit new API

* remove outdated test

* update tests, inherit dataset-specific load functions, docstring hack, better error handling

* remove data_home from Track docstrings

* normalize dataset_dir to match module name, removes need for DATASET_DIR

* update test_full dataset; fix introduced bug in orchset

* fix bug in orchset download method  #309 

* consolodate track.py and dataset.py into core.py

* create datasets submodule

* fix import bug in tests

* hack around git case sensitiveness

* hack back around git case sensitiveness

* hack around git ignore case changes

* hack back around git ignoring case changes

* fix capitalization in tests paths

* port beatport key to 0.3

Co-authored-by: Rachel Bittner <rachelbittner@spotify.com>
@lostanlen
Copy link
Collaborator

Congratulations @rabitt and all! 🎉

@rabitt rabitt mentioned this pull request Dec 17, 2020
@rabitt rabitt deleted the dataset-object branch February 11, 2021 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

move ".cite" string to docstring & return docstring
4 participants