-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
soxbindings fails when multithreading #4
Comments
Thanks! I can look into this. Appreciate the concise report and repro instructions! FWIW, I've been using SoxBindings in a multi-process dataloader from PyTorch with no issues, so maybe try multiple processes as a quick fix for now. |
Unfortunately I don't think it's possible in my case - I'm using it inside a tensorflow dataloader which uses multithreading. I don't know of a way around it. |
@rabitt i guess one has to disable openmp when compiling sox -> the folks at torchaudio seemed to have the same problems: pytorch/audio#1026 @pseeth ? |
Thank you for that pointer @faroit! SoxBindings has a slightly different issue though - that one appears to because of a mismatch between PyTorch OpenMP and libsox OpenMP. But, good news, might have the beginnings of a fix due to the rabbit hole that led me down... Thank you to this hero on the SoX forums. Here's the gist, I made a context manager that I call
So this works great when you're doing it in a single thread, but in a multithreaded setup, you end up with this very bad scenario:
So the quits and inits happen interleaved which SoX really doesn't like. Like the person says in the forum:
Cheers indeed. I took the decorator off. You'll have to wrap your program function in the decorator to be threadsafe, or call |
@pseeth thanks for looking into. Not sure if this fix would be possible to be run inside tensorflows |
Hmm, not super familiar with from soxbindings import sox_context_manager
@sox_context_manager()
def main():
# my great experiment goes here
# powered by soxbindings
# and tf.data.
if __name__ == "__main__":
main() But I'm not sure totally if that'll work, having not used |
Let me see if I can cook up a minimal tf.data example for you |
I think this does it - import numpy as np
import tensorflow as tf
import soxbindings as sox
def do_transform(y):
tfm = sox.Transformer()
tfm.vol(0.5)
y_out = tfm.build_array(input_array=y, sample_rate_in=1000)
y_out = tf.cast(y_out, tf.float32)
return y_out
def transform_in_graph(y):
return tf.numpy_function(do_transform, [y], tf.float32)
def random_noise_generator():
for _ in range(50):
yield np.random.uniform(size=(4000, 1))
ds = tf.data.Dataset.from_generator(
random_noise_generator, output_types=tf.float32, output_shapes=(4000, 1)
)
ds = ds.map(transform_in_graph, num_parallel_calls=4) # change this to 1, it succeeds
for y in iter(ds):
print(y.shape) Obviously in this example there's an easy workaround ( |
With the changes in #5, this snippet works: import numpy as np
import tensorflow as tf
import soxbindings as sox
def do_transform(y):
tfm = sox.Transformer()
tfm.vol(0.5)
y_out = tfm.build_array(input_array=y, sample_rate_in=1000)
y_out = tf.cast(y_out, tf.float32)
return y_out
def transform_in_graph(y):
return tf.numpy_function(do_transform, [y], tf.float32)
def random_noise_generator():
for _ in range(50):
yield np.random.uniform(size=(4000, 1))
ds = tf.data.Dataset.from_generator(
random_noise_generator, output_types=tf.float32, output_shapes=(4000, 1)
)
ds = ds.map(transform_in_graph, num_parallel_calls=4) # change this to 1, it succeeds
with sox.sox_context_manager(): # <- THE FIX
for y in iter(ds):
print(y.shape) I'll work on getting it released ASAP! Thanks all for the snippets and pointers! |
@faroit, @rabitt would you mind trying these steps to see if your SoxBindings related code works?
Hopefully it works! Let me know, and then I'll merge PR #5 and release it as |
@pseeth thanks a lot, the fix works fine and can be merged as is! Unfortuntately, it still seems that the interface significantly slows down tf.data pipeline and real multiprocessing can't be utilized even if the number of parallel calls is set to a value higher than 1... |
@pseeth (sorry for the delay) I can also confirm it's working in my setup!
@faroit you're totally right, though it's not soxbinding's fault. I had an offline discussion with @psobot and he looked into it a bit - the sox C library itself can't do real multithreading. Still, 1x soxbindings is ~10x faster than 1x sox! |
Just to clarify here - the limiting factor seems to be that |
Hmm, okay. I'll merge this fix in sometime today - do you have any suggestions for how to speed things up @psobot? I very much appreciate the advice! SoX should be threadsafe: https://sox-users.narkive.com/m1PQmcwp/is-sox-threadsafe, so I imagine it's possible. I'm not sure if I did anything too crazy with my bindings though...everything in the execution of the Edit: this might be what we need: https://docs.python.org/3/c-api/init.html#releasing-the-gil-from-extension-code. I'll try it out at some point. Merging the related PR for now, though. |
Happy to help, @pseeth! I see you edited with the correct link - that should do it. If SoX is threadsafe under the hood, then releasing the GIL before calling SoX (and re-acquiring it afterwards before returning to Python code) should be all you need. |
@pseeth did you had some time to try this out? I can certainly try to help out.... |
Unfortunately not yet. Feels like it should just be putting those two lines to release the GIL somewhere in the C extension inside SoxBindings, though, right? Definitely would promptly CR if you made a PR, though! |
Sorry for the delay - 1.2.3 is now out on pip! Closing this issue for now. Please re-open if you run into any issues! |
thats great! thanks
i think we should keep this open until 👇 is addressed or rename the issue or create a new one
|
Let's make a new issue. This issue was originally about avoiding a show-stopping error which I can say is solved (and has a different solution that should stay documented in this issue). I'll make a new one about avoiding GIL. |
sounds good. I will try to look into soon, but by c++ skills are a bit rusty ;-) @psobot ? |
soxbinding works great in one thread, but it looks like it consistently fails when multithreading. Minimal example below. Note that any effect (.vol, .compand, .trim, etc) triggers the same error. Running this with command line sox (replacing
import soxbindings as sox
withimport sox
) works fine.Output:
The text was updated successfully, but these errors were encountered: