-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should dataset modules have a common Dataset class? #225
Comments
Thank you for opening this. For the record, #219 was just a wild attempt at unifying module APIs. I didn't see it as a take-it-or-leave-it situation, but more as a conversation starter. Regarding the example code you wentioned about download, my dream is that the user wouldn't have to import any submodules manually. from mirdata import orchset
orchset_dataset = orchset.Dataset()
orchset_dataset.download() but really orchset = mirdata.download("ORCHSET") (we can talk about case sensitivity if you want to allow for Where def find_submodule(name, case_sensitive=True):
if not case_sensitive:
name = name.lower()
for submodule_str in mirdata.__all__:
module_str = "mirdata." + submodule_str
submodule_name = importlib.import_module(module_str).name
if not case_sensitive:
submodule_name = submodule_name.lower()
if name == submodule.name:
return submodule
def download(
name,
data_home=None,
force_overwrite=False,
cleanup=False,
download_items=None
case_sensitive=True):
submodule = find_submodule(name, case_sensitive=case_sensitive)
dataset = mirdata.dataset.Dataset(
submodule, data_home=data_home)
dataset.download(
force_overwrite=force_overwrite,
cleanup=cleanup,
download_items=download_items)
return dataset and voilà: the return value of this |
The usage example is really useful. Actually we considered this direction at the very beginning of this library - whether mirdata should be "dataset-major" ( Could you give usage examples for how you envision the following:
|
This is implemented in #296 ! |
Our current API has the following functions required by every dataset module, and all instances are nearly identical:
.download()
.validate()
.track_ids()
.load()
.cite()
In #219 @lostanlen experimented with adding a
Dataset
class which implements these methods. In addition, he added the following:.readme
(module documentation).__getitem__
dataset_default_path
.index
.metadata
.choice
for random iteration (I <3 this idea)The original decision to not implement a module-level dataset level object was because of usability. In the current API, to, for example, download a dataset:
Whereas, if download lived inside a dataset object, the API would be:
The current solution has the following pros and cons.
Pros:
Cons:
test_loaders
+ code review.Opening this issue to restart the discussion of if a
Dataset
object makes sense, and if so, what the API would look like. I like the idea of generalizing and simplifying the code, as long as it doesn't come at the cost of usability & ease of contribution.The text was updated successfully, but these errors were encountered: