-
Notifications
You must be signed in to change notification settings - Fork 661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows Support #425
Comments
The |
Thanks for the input. Can you share the output of CircleCI where the kaldi_io tests are passing? If SoX is not possible to compile on Windows, we'll need to identify an alternative backend that offers similar file support on Windows: mp3, flac, wav, at least. soundfile unfortunately doesn't support mp3. See e.g. comparison. |
Sure. It was posted here: #419 (comment).
What about this one? https://github.com/beetbox/audioread or https://github.com/librosa/librosa? |
aubio seems to perform better than librosa, according to this, and supports more format than audioread. Thoughts? |
Well, it looks good to me except its package on pypi is a source package. However, if we use the C/C++ part then we should be okay. |
What is the implication of a source package? |
As you can see from https://pypi.org/project/aubio/#files, only the file ends with |
Both seems good then. Let's go for audioread then, since it appears to be faster than librosa. I've updated the description above to reflect the choice of audioread over sox for windows. |
Have you looked into pydub? https://github.com/jiaaro/pydub I've been using it on windows, and it works great for mp3 and wav files. The installation is a bit involved since it requires the user to add ffmpeg to the environment path? |
@vincentqb Just some small remarks... For the various use cases of audio i/o there are two scenarios where loading is used within torchaudio:
Here, loading and decoding performance is crucial and easily becomes the bottleneck of dataloaders that deal with raw audio. Typically expensive compression formats should be avoided and simple formats such as wav, flac and mp3 should be used instead. Furthermore seeking support is crucial to load chunked audio from original (larger tracks)
Here, performance is not that crucial but support for various formats such as m4a/mp4/aac would be beneficial. As we often discussed in torchaudio-contrib, I still don't see any way around ffmpeg. ;-) To sum up, I don't think it make sense to add another python package for audio i/o and instead focus on more low level and faster alternatives such as minimp3 that also come with less dependencies. What do you think? |
@faroit -- Have you run your benchmark with minimp3? I'd love to see how it compares. You are suggesting having a mix of backend for different format? That could be an option, yes. However, the context of this particular pull request is to make torchaudio available on Windows with the same features as the other OSs supported, and so this particular pull request doesn't push the boundaries of speed :)
I agree that there are already many python libraries loading audio files. In particular, those that load into numpy can be then used to load into pytorch, since pytorch can convert tensors from/to numpy at no cost. This means most users that want some very specific audio file can already do so. It is still convenient for the users to get support for some common audio file format directly in torchaudio. But we can focus on the most critical format (wav, flac, mp3), and support them well and fast. In that context, since ffmpeg is a heavy dependency, I would avoid depending on it for as long as I can. :) |
@vincentqb Actually both audioread and aubio relies on ffmpeg. |
@vincentqb It will be easy for conda users because they can simply do |
@vincentqb BTW, users can only read a file using audioread, but not write. If we want to create a new backend like |
Let's list the requirements for a backend:
@peterjc123 -- Please do let me know if I forget anything in this list. Do you know any other backend that would work well with those criteria? |
There is no functional python/numpy interface yet – see status of pyminimp3, so I used the implementation recently added to tf.io. The performance looks incredible: (ar_ffmpeg is audioreads ffmpeg interface) |
Sorry for hijacking this thread.
I totally agree with you. FFMPEG is going to painful. But I don't think there is any other alternative to support a large number of formats. That's why I think we should have some fast decoder-only alternatives for a limited number of formats (useful for training). I am still in favor of removing sox and just go sndfile/minimp3 for this scenario. Then ffmpeg for writing and everything else where loading speed in not an issue. |
On ffmpeg, I'd like to add the idea that, in general, we want backends to be opt-in. By default we should pick a light library that works for most common formats and then allow the user to switch to different backends (such as ffmpeg) for either performance or features. Figuring out how to setup this backend dispatch mechanism could probably resolve many of the discussions here. Essentially we want to have load and save dispatch to a different backend depending on file-format and the user's settings. The simplest approach is to make a choice at compile-time. We're already beyond that with our global run-time backend mechanism. A more granular approach is to then allow users setting different backends for each file format. Then beyond that we can even introduce preferred orders per fileformat based on available formats (e.g. use specialized library X over Y when available, but transparently default to Y otherwise). |
Right, although the current choice for globack runtime backend dispatch, we do not support mp3 for windows. One option is to switch default global backend to something that also supports mp3 for windows. Another is to add a file-format-dependent dispatch. The former would favor going all-in with ffmpeg. The latter favors minimp3. Based on feedback above from @faroit and @cpuhrsch, the latter is preferred as the next step. I'm good with that conclusion, so I'll update the todo/description above to reflect that. |
@vincentqb I saw a post that describes how to compile torchaudio with Sox. Will try that later. |
Torchaudio with Sox: #648 |
mp3 for windows without sox in #1000 |
@vincentqb if you want also support writing MP3s on Windows, I would recommend https://github.com/chrisstaite/lameenc I have been using it for a while inside demucs, and it is amazing (in the sense that it is small, no extra dependencies, and works perfectly with just a pip install on all OSes). At the moment though it seems their build for python3.9 is broken... |
thanks for the input :) |
Hi there, I see that ffmpeg and sox are issues for this library. I want to let you know that I've solved these exact problems for tools like this so that these binaries can be easily deployed for Mac/Win/Linux. Please see: https://github.com/zackees/static-ffmpeg Using tools like These python packages are available through pip as well so can be included in your dependency management. The binaries are only downloaded when they are first used. By specifying |
To bring Windows support with mp3 support, we need
kaldi_io
NameError for Windows in comment.If and only if no backend support mp3 on Windows after the above:
Closes #50, closes #219, closes #258.
cc @peterjc123, @chauhang, pytorch/pytorch#24344
The text was updated successfully, but these errors were encountered: