-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mirdata doesn't work cleanly for datasets not on disk #128
Comments
hi there. I didn't really look into mirdata yet but I consider adding MUSDB18.... are does multichannel audio work in mirdata?
can you elaborate on this use case a bit more and give some examples? are you talking about applications where the data is is in key/value databases?
Yes, I would do both, the audio loader and the annotation loader. |
Yes, as of #125 We're just using librosa, which now supports multichannel audio.
In my case, I'm thinking about whe use case where you want to train using a dataset which is stored remotely. One workflow is to download the audio/annotation files as they're needed and call the loading functions manually. What kind of workflow are you thinking of in the database use case? |
Me and @magdalenafuentes discussed this a bit more, and here are some things we think would help make this usable for remote datasets:
|
For the use case where the data files live on a remote machine and are accessed when needed for e.g. for training a model, many of mirdata's assumptions fail:
Track
annotation attributes are loaded in the background from disk, expecting files to be presentHow can we support this (relatively common) use case cleanly? This will be increasingly important for larger datasets. Some initial ideas:
Track
attributes that load files from diskcc @faroit
The text was updated successfully, but these errors were encountered: