Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize copy-paste dataset functions to utils? #293

Closed
rabitt opened this issue Oct 19, 2020 · 4 comments
Closed

Generalize copy-paste dataset functions to utils? #293

rabitt opened this issue Oct 19, 2020 · 4 comments
Assignees
Labels
priority Issues with this tag will be addressed before others.
Milestone

Comments

@rabitt
Copy link
Collaborator

rabitt commented Oct 19, 2020

dataset.validate(), dataset.load() and dataset.track_ids() are identical in every loader. How can we generalize this?

Option 1 - lambdas

we define in e.g. utils lambda functions, which get instantiated in each dataset, e.g.
track_ids = utils.track_ids(DATA.index)
where utils.track_ids is itself a lambda function

pros: same API, less code to copy paste
cons: still copy pasting code, not very nice coding style

Option 2 - reverse the api

utils.validate('orchset') rather than orchset.validate()

pros: no code copy-pasting, better code style
cons: changes the API, makes some inconsistency - some things are dataset-major (e.g. orchset.Track) some are function-major (utils.validate('orchset'))

Option 3 - split it all up

*move validate to a top level (utils.validate('orchset'))
*remove load all together because it's just a wrapper
*keep track ids as they are
pros: simplifies the current api
cons: not very consistent

Option 4 - create a Dataset class

here's a big discussion about this #225

pro: helps standardize everything
cons: adds complexity, top level api change

@rabitt rabitt added the priority Issues with this tag will be addressed before others. label Oct 19, 2020
@rabitt rabitt mentioned this issue Oct 20, 2020
8 tasks
@nkundiushuti
Copy link
Collaborator

besides solving this issues, the dataset object would help with other solutions: sampling, generator. maybe it's worth the effort. we could split it between us and modify the existing loaders.

@magdalenafuentes
Copy link
Collaborator

Yeah, after discussing a lot yesterday we decided to go for the big change and add the dataset object. It will simplify code a lot, increase consistency and change the user API only a bit. We decided to base the implementation on Vincent's great idea in #219.

@rabitt
Copy link
Collaborator Author

rabitt commented Oct 20, 2020

@nkundiushuti take a look at #296 and let me know what you think about the new proposed API. The Dataset class can of course be extended in the future to include sampling etc as you mention! I started with just porting the existing functionality, and if it seems solid we can add to it

@rabitt rabitt added this to the 0.3 milestone Oct 23, 2020
@rabitt
Copy link
Collaborator Author

rabitt commented Nov 3, 2020

Done in #296

@rabitt rabitt closed this as completed Nov 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority Issues with this tag will be addressed before others.
Projects
None yet
Development

No branches or pull requests

3 participants