-
Notifications
You must be signed in to change notification settings - Fork 439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Files (much) shorter than 30s in fma-small #8
Comments
Hi, yep that's issue #4. It's due to bad length records in the https://freemusicarchive.org database. I should extract that metadata from the mp3 itself rather than relying on data from the API. |
Oh, I see. Right.. maybe I (as well as others) would like to know how the baselines are computed when it’s shorter? Also that means I guess they’re also short in Large/Full.
… On 10Oct 2017, at 10:17, Michaël Defferrard ***@***.***> wrote:
Hi, yep that's issue #4 <#4>. It's due to bad length records in the https://freemusicarchive.org <https://freemusicarchive.org/> database. I should extract that metadata from the mp3 itself rather than relying on data from the API.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub <#8 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/APZ8xfBwoVnTG-lLoOoXYVTsiRZPi1qtks5sqzYvgaJpZM4PzL1K>.
|
The features were extracted over windows then statistics computed across songs. The process is thus independent of the length of a song. Note though that the distributed features (which the baselines are based on) were computed on the full-length tracks. I fought that was most useful to users because it takes a lot of time to compute them (compared to doing in on 30s excerpts). The length problem exists for medium and large. Full is fine as it contains the original full-length tracks. |
Cool, thanks. But..
The length problem exists for medium and large. Full is fine as it contains the original full-length tracks.
Really? How was it possible? Because I understood there’s no such a ‘full’ length signal for those files. Sorry for questions that might be included in the paper..
… On 10Oct 2017, at 11:11, Michaël Defferrard ***@***.***> wrote:
The features were extracted over windows then statistics computed across songs. The process is thus independent of the length of a song. Note though that the distributed features (which the baselines are based on) were computed on the full-length tracks. I fought that was most useful to users because it takes a lot of time to compute them (compared to doing in on 30s excerpts).
The length problem exists for medium and large. Full is fine as it contains the original full-length tracks.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub <#8 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/APZ8xe4I6eUoOv_cO0SmctsgaBqePWrMks5sq0LRgaJpZM4PzL1K>.
|
There is. ;-) The full is a verbatim copy of the mp3 from the https://freemusicarchive.org. Tracks there are up to 3 hours long (figure 2). Small, medium, and large are composed of 30s excerpts (see section 2.6). |
Oh.. right, the trimming-from-centre-w.r.t.-metadata was already mentioned in #4 thread. Thanks :) |
Exactly |
It would be actually very helpful for the fresmusicarchive themselves to know that there are incorrect metadata btw. I'm not sure if I have to close this issue at the moment, will just leave it for you. |
I think I've told them at some point. Will check. :) |
First of all, thank you for creating such a nice dataset for MIR community! A follow-up to Keunwoo's observation, there're many songs that are shorter than 30 sec in medium subset as well. I compiled a list for future reference:
|
Thanks for your comment. :) While there's definitely some errors, beware that some songs can legitimately be shorter than 30s (iff the original version in fma full is shorter than 30s). BTW I just finished to run a script which measures the exact number of frames (among other things) for every song. I will use the results to update the |
I looked at you list, and comparing the reported duration by the API and the real duration (computed by dividing the number of decoded frames by the sample rate), the problem is clearly due to wrong metadata. As such, the tracks were cut at the wrong place, possibly even beyond their total length, which resulted in a zero-length clip.
Now we face a choice:
What do you guys think would be the best option? |
Personally I am fine with either choices. However, considering the ease of use for future users, maybe 2nd option would be slightly better? |
The 2nd choice would be good for the ease of use, which means better diffusion. |
I would prefer to have all tracks in the small/medium subset be 30s long. |
Not sure what the status is of this, as the archives are not versioned, but three of the files mentioned by @keunwoochoi in #8 (comment) seem to be simply broken in the current download and cannot be read by Simple test (shows all files that
Output:
Perhaps it makes sense to update the release or find some other way of making users aware of this? |
The archives are versioned. The rc1 version (still the latest as of now) has this issue. There is code in I've added the pinned meta-issue #41 and a note in the README to make users aware of known issues. |
@hendriks73, the 3 files you list cannot be read by #8 (comment) explains why erroneous metadata led to files of duration 0. |
Hi, there are 6 files that are much shorter than 30s:
, in case it's not a known issue.
The text was updated successfully, but these errors were encountered: