Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows Support #425

Closed
4 of 11 tasks
vincentqb opened this issue Feb 4, 2020 · 26 comments
Closed
4 of 11 tasks

Windows Support #425

vincentqb opened this issue Feb 4, 2020 · 26 comments

Comments

@vincentqb
Copy link
Contributor

vincentqb commented Feb 4, 2020

To bring Windows support with mp3 support, we need

If and only if no backend support mp3 on Windows after the above:

  • Compile SoX on Windows
  • Activate build for wheels and conda package on CircleCI for Windows with SoX usinc cpp extension, e.g. Add Windows CI #394.

Closes #50, closes #219, closes #258.

cc @peterjc123, @chauhang, pytorch/pytorch#24344

@peterjc123
Copy link
Contributor

The kaldi_io test is passing on Windows now. BTW, I think it's hard to compile Sox on Windows. Other things sound reasonable to me.

@vincentqb
Copy link
Contributor Author

vincentqb commented Feb 5, 2020

Thanks for the input. Can you share the output of CircleCI where the kaldi_io tests are passing?

If SoX is not possible to compile on Windows, we'll need to identify an alternative backend that offers similar file support on Windows: mp3, flac, wav, at least. soundfile unfortunately doesn't support mp3. See e.g. comparison.

@peterjc123
Copy link
Contributor

peterjc123 commented Feb 6, 2020

Thanks for the input. Can you share the output of CircleCI where the kaldi_io tests are passing?

Sure. It was posted here: #419 (comment).

If SoX is not possible to compile on Windows, we'll need to identify an alternative backend that offers similar file support on Windows: mp3, flac, wav, at least.

What about this one? https://github.com/beetbox/audioread or https://github.com/librosa/librosa?

@vincentqb
Copy link
Contributor Author

What about this one? https://github.com/beetbox/audioread or https://github.com/librosa/librosa?

aubio seems to perform better than librosa, according to this, and supports more format than audioread. Thoughts?

@peterjc123
Copy link
Contributor

Well, it looks good to me except its package on pypi is a source package. However, if we use the C/C++ part then we should be okay.

@vincentqb
Copy link
Contributor Author

Well, it looks good to me except its package on pypi is a source package. However, if we use the C/C++ part then we should be okay.

What is the implication of a source package?

@peterjc123
Copy link
Contributor

peterjc123 commented Feb 15, 2020

As you can see from https://pypi.org/project/aubio/#files, only the file ends with .tar.gz is available.

@vincentqb
Copy link
Contributor Author

vincentqb commented Feb 18, 2020

What about this one? https://github.com/beetbox/audioread or https://github.com/librosa/librosa?

Both seems good then. Let's go for audioread then, since it appears to be faster than librosa. I've updated the description above to reflect the choice of audioread over sox for windows.

@dachosen1
Copy link

Have you looked into pydub? https://github.com/jiaaro/pydub

I've been using it on windows, and it works great for mp3 and wav files. The installation is a bit involved since it requires the user to add ffmpeg to the environment path?

@faroit
Copy link
Contributor

faroit commented Feb 19, 2020

@vincentqb Just some small remarks...

For the various use cases of audio i/o there are two scenarios where loading is used within torchaudio:

  1. Training

Here, loading and decoding performance is crucial and easily becomes the bottleneck of dataloaders that deal with raw audio. Typically expensive compression formats should be avoided and simple formats such as wav, flac and mp3 should be used instead. Furthermore seeking support is crucial to load chunked audio from original (larger tracks)
In this use-case we already have libsndfile, interfaced with pysoundfile that cover wav and flac (at one point it would make sense to directly interface libsndfile to avoid numpy). Regarding MP3 support (+windows) I just discovered minimp3 that ticks all boxes. Also it is ridiculously fast and therefore could easily be the best tradeoff between loading and decoding speed.

  1. Inference

Here, performance is not that crucial but support for various formats such as m4a/mp4/aac would be beneficial. As we often discussed in torchaudio-contrib, I still don't see any way around ffmpeg. ;-)

To sum up, I don't think it make sense to add another python package for audio i/o and instead focus on more low level and faster alternatives such as minimp3 that also come with less dependencies. What do you think?

@vincentqb
Copy link
Contributor Author

In this use-case we already have libsndfile, interfaced with pysoundfile that cover wav and flac (at one point it would make sense to directly interface libsndfile to avoid numpy). Regarding MP3 support (+windows) I just discovered minimp3 that ticks all boxes. Also it is ridiculously fast and therefore could easily be the best tradeoff between loading and decoding speed.

@faroit -- Have you run your benchmark with minimp3? I'd love to see how it compares.

You are suggesting having a mix of backend for different format? That could be an option, yes. However, the context of this particular pull request is to make torchaudio available on Windows with the same features as the other OSs supported, and so this particular pull request doesn't push the boundaries of speed :)

  1. Inference

Here, performance is not that crucial but support for various formats such as m4a/mp4/aac would be beneficial. As we often discussed in torchaudio-contrib, I still don't see any way around ffmpeg. ;-)

To sum up, I don't think it make sense to add another python package for audio i/o and instead focus on more low level and faster alternatives such as minimp3 that also come with less dependencies. What do you think?

I agree that there are already many python libraries loading audio files. In particular, those that load into numpy can be then used to load into pytorch, since pytorch can convert tensors from/to numpy at no cost. This means most users that want some very specific audio file can already do so.

It is still convenient for the users to get support for some common audio file format directly in torchaudio. But we can focus on the most critical format (wav, flac, mp3), and support them well and fast.

In that context, since ffmpeg is a heavy dependency, I would avoid depending on it for as long as I can. :)

@peterjc123
Copy link
Contributor

@vincentqb Actually both audioread and aubio relies on ffmpeg.

@vincentqb
Copy link
Contributor Author

vincentqb commented Mar 9, 2020

Ah, good point. Has any of you faced any challenges such as this installing audioread? If not, I'd say we move forward anyway.

By the way, torchvision is also moving toward ffmpeg for video.

@cpuhrsch -- You voiced not being in favor of ffmpeg in the past. Any comments?

@peterjc123
Copy link
Contributor

peterjc123 commented Mar 9, 2020

@vincentqb It will be easy for conda users because they can simply do conda install -c conda-forge ffmpeg. To make it convenient for other users, we may just distribute the DLLs for them.

@peterjc123
Copy link
Contributor

@vincentqb BTW, users can only read a file using audioread, but not write. If we want to create a new backend like sndfile and sox, we'd better choose something else.

@vincentqb
Copy link
Contributor Author

vincentqb commented Mar 9, 2020

Let's list the requirements for a backend:

  • Easy installation with torchaudio for the user in windows (for this PR).
  • Read wav/mp3/flac whole files, or chunks at specified location of a file.
  • Write wav/mp3/flac whole files.
  • Optional: Perform well in this benchmark.

@peterjc123 -- Please do let me know if I forget anything in this list. Do you know any other backend that would work well with those criteria?

@faroit
Copy link
Contributor

faroit commented Mar 9, 2020

@vincentqb

Have you run your benchmark with minimp3? I'd love to see how it compares.

There is no functional python/numpy interface yet – see status of pyminimp3, so I used the implementation recently added to tf.io. The performance looks incredible:

benchmark_tf

(ar_ffmpeg is audioreads ffmpeg interface)

@faroit
Copy link
Contributor

faroit commented Mar 9, 2020

@vincentqb @peterjc123

Sorry for hijacking this thread.

In that context, since ffmpeg is a heavy dependency, I would avoid depending on it for as long as I can. :)

I totally agree with you. FFMPEG is going to painful. But I don't think there is any other alternative to support a large number of formats.

That's why I think we should have some fast decoder-only alternatives for a limited number of formats (useful for training). I am still in favor of removing sox and just go sndfile/minimp3 for this scenario. Then ffmpeg for writing and everything else where loading speed in not an issue.

@cpuhrsch
Copy link
Contributor

On ffmpeg, I'd like to add the idea that, in general, we want backends to be opt-in.

By default we should pick a light library that works for most common formats and then allow the user to switch to different backends (such as ffmpeg) for either performance or features.

Figuring out how to setup this backend dispatch mechanism could probably resolve many of the discussions here. Essentially we want to have load and save dispatch to a different backend depending on file-format and the user's settings.

The simplest approach is to make a choice at compile-time. We're already beyond that with our global run-time backend mechanism.

A more granular approach is to then allow users setting different backends for each file format.

Then beyond that we can even introduce preferred orders per fileformat based on available formats (e.g. use specialized library X over Y when available, but transparently default to Y otherwise).

@vincentqb
Copy link
Contributor Author

vincentqb commented Mar 12, 2020

Right, although the current choice for globack runtime backend dispatch, we do not support mp3 for windows. One option is to switch default global backend to something that also supports mp3 for windows. Another is to add a file-format-dependent dispatch.

The former would favor going all-in with ffmpeg. The latter favors minimp3.

Based on feedback above from @faroit and @cpuhrsch, the latter is preferred as the next step. I'm good with that conclusion, so I'll update the todo/description above to reflect that.

@peterjc123
Copy link
Contributor

@vincentqb I saw a post that describes how to compile torchaudio with Sox. Will try that later.

@peterjc123
Copy link
Contributor

Torchaudio with Sox: #648

@vincentqb
Copy link
Contributor Author

mp3 for windows without sox in #1000

@adefossez
Copy link
Contributor

@vincentqb if you want also support writing MP3s on Windows, I would recommend https://github.com/chrisstaite/lameenc

I have been using it for a while inside demucs, and it is amazing (in the sense that it is small, no extra dependencies, and works perfectly with just a pip install on all OSes). At the moment though it seems their build for python3.9 is broken...

@vincentqb
Copy link
Contributor Author

thanks for the input :)

@mthrok mthrok closed this as completed Jan 8, 2023
@zackees
Copy link

zackees commented Feb 22, 2023

Hi there, I see that ffmpeg and sox are issues for this library. I want to let you know that I've solved these exact problems for tools like this so that these binaries can be easily deployed for Mac/Win/Linux.

Please see:

https://github.com/zackees/static-ffmpeg
https://github.com/zackees/static-sox

Using tools like ffmpeg will allow you to write mp3's with minimal code and have it work everywhere. I recommend using static_ffmpeg.add_paths(weak=True) and static_sox.add_paths(weak=True).

These python packages are available through pip as well so can be included in your dependency management. The binaries are only downloaded when they are first used. By specifying weak=True the libraries will only download ffmpeg/sox if the binaries don't already exist on the system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants