You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
dataset.validate(), dataset.load() and dataset.track_ids() are identical in every loader. How can we generalize this?
Option 1 - lambdas
we define in e.g. utils lambda functions, which get instantiated in each dataset, e.g. track_ids = utils.track_ids(DATA.index)
where utils.track_ids is itself a lambda function
pros: same API, less code to copy paste
cons: still copy pasting code, not very nice coding style
Option 2 - reverse the api
utils.validate('orchset') rather than orchset.validate()
pros: no code copy-pasting, better code style
cons: changes the API, makes some inconsistency - some things are dataset-major (e.g. orchset.Track) some are function-major (utils.validate('orchset'))
Option 3 - split it all up
*move validate to a top level (utils.validate('orchset'))
*remove load all together because it's just a wrapper
*keep track ids as they are
pros: simplifies the current api
cons: not very consistent
besides solving this issues, the dataset object would help with other solutions: sampling, generator. maybe it's worth the effort. we could split it between us and modify the existing loaders.
Yeah, after discussing a lot yesterday we decided to go for the big change and add the dataset object. It will simplify code a lot, increase consistency and change the user API only a bit. We decided to base the implementation on Vincent's great idea in #219.
@nkundiushuti take a look at #296 and let me know what you think about the new proposed API. The Dataset class can of course be extended in the future to include sampling etc as you mention! I started with just porting the existing functionality, and if it seems solid we can add to it
dataset.validate()
,dataset.load()
anddataset.track_ids()
are identical in every loader. How can we generalize this?Option 1 - lambdas
we define in e.g.
utils
lambda functions, which get instantiated in each dataset, e.g.track_ids = utils.track_ids(DATA.index)
where
utils.track_ids
is itself a lambda functionpros: same API, less code to copy paste
cons: still copy pasting code, not very nice coding style
Option 2 - reverse the api
utils.validate('orchset')
rather thanorchset.validate()
pros: no code copy-pasting, better code style
cons: changes the API, makes some inconsistency - some things are dataset-major (e.g. orchset.Track) some are function-major (utils.validate('orchset'))
Option 3 - split it all up
*move validate to a top level (utils.validate('orchset'))
*remove load all together because it's just a wrapper
*keep track ids as they are
pros: simplifies the current api
cons: not very consistent
Option 4 - create a Dataset class
here's a big discussion about this #225
pro: helps standardize everything
cons: adds complexity, top level api change
The text was updated successfully, but these errors were encountered: