Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get DeepFormants working again #9

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

iskunk
Copy link

@iskunk iskunk commented Jan 17, 2020

  • Minor syntax tweaks to make the code Python 3 compatible

  • Fixes for various NumPy warnings/errors, either due to use of float where int is required, or domain errors on log functions

  • Replaced the use of the obsolete Python-2-only scikits.talkbox library with a compatible LPC implementation from the Conch project

  • Documentation update to indicate that an old version of rnn is required

  • Invoke Lua scripts via luajit directly, instead of going through the th frontend (to reduce the dependency footprint)

* Minor syntax tweaks to make the code Python 3 compatible

* Fixes for various NumPy warnings/errors, either due to use of "float"
  where "int" is required, or domain errors on log functions

* Replaced the use of the obsolete Python-2-only scikits.talkbox
  library with a compatible LPC implementation from the Conch project

* Documentation update to indicate that an old version of "rnn" is
  required

* Invoke Lua scripts via "luajit" directly, instead of going through
  the "th" frontend (to reduce the dependency footprint)
@iskunk
Copy link
Author

iskunk commented Jan 17, 2020

(Pushed again to fix a minor goof in the README.)

This PR should address issues #3 (in part), #5, and #7.

I would appreciate an especially careful review of my changes to the extract_features.py file, as I'm not completely certain that I didn't fudge up the math. I did, however, find that the librosa implementation of LPC, which was the most obvious successor of the old Talkbox library, gave quite different results. (Talkbox includes a test_lpc.py file which was helpful in determining this.) Thankfully, this thread led me to a compatible (if Python-only) implementation from the Conch project that appears to do the trick.

More work is needed, of course. First, the tracking model needs to be rebuilt using a current version of rnn, to completely resolve #3. Second, the memory usage is out of control and needs to be addressed.

I tested my changes with two speech files; one was 19 seconds long, the other 67 seconds. I ran DeepFormants on a multiprocessor system (Intel Xeon, no GPU) with 48 GB RAM. In both cases, the feature-extraction stage took a while to run, presumably due to the pure-Python replacement LPC implementation. No big deal. But the second stage, when Torch is invoked... the small file led to a peak memory usage of 33 GB. It didn't take particularly long, which makes me suspect all that memory was allocated but hardly used. With the large file, the usage got up to 50 GB, and once it was clear that swapping was slowing the program down to a crawl, I terminated the run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant