add augmentation part #99

chrisxu2016 · 2017-06-14T19:14:57Z

resolve #96

add data augmentation class ,inclued noise_speech, impuls_response, resampler, speed_perturb, online_bayesias_normalization.
add function to audio.py, eg. convolve， add_noise, normalizer

xinghai-sun

Good job, but still needs intensive improvement.

Only reviewed for audio.py before Line 398.

Please pay more attention to the details, especially for the docs. Remember an old saying "Devil is in the details! ".
Please avoid copying codes from somewhere without a fully understanding about it. It would be better if we could improve it, or at least make it cleaner. The same thing goes to docs.
Please add unit tests or at least test every function before commit. If the project is urgent, some delay for the unit test would be acceptable. But at least, every function must be tested by the author before commit. For the audio parts, the tests should also include writing the transformed audio into a wav file and then we listen to the wav file to make sure such transformation functions correctly. If we have time, a timely added unit test would be great!

xinghai-sun · 2017-06-15T09:33:58Z

deep_speech_2/data_utils/audio.py

@@ -6,6 +6,8 @@
 import numpy as np
 import io
 import soundfile
+import scikits.samplerate


Add the package "scikits" and "scipy" to requirements.txt. Make sure they can be installed by pip install -r requirements.txt.

xinghai-sun · 2017-06-15T09:41:18Z

deep_speech_2/data_utils/audio.py

@@ -63,6 +65,69 @@ def from_file(cls, file):
        return cls(samples, sample_rate)

    @classmethod
+    def slice_from_file(cls, fname, start=None, end=None):


fname --> file
Please avoid using too many abbreviation if the full name is not too long.

xinghai-sun · 2017-06-15T09:42:07Z

deep_speech_2/data_utils/audio.py

@@ -63,6 +65,69 @@ def from_file(cls, file):
        return cls(samples, sample_rate)

    @classmethod
+    def slice_from_file(cls, fname, start=None, end=None):
+        """ 
+        Loads a small section of an audio without having to load


Put Line 70 into Line 69.
The same below.

xinghai-sun · 2017-06-15T09:43:25Z

deep_speech_2/data_utils/audio.py

+        Loads a small section of an audio without having to load
+        the entire file into the memory which can be incredibly wasteful.
+
+        :param fname: input audio file name


"input audio file name." --> "Input audio filepath."
Note the upper case and an ending dot mark.

The same below.

xinghai-sun · 2017-06-15T09:44:53Z

deep_speech_2/data_utils/audio.py

+        :param fname: input audio file name
+        :type fname: bsaestring
+        :param start: start time in seconds (supported granularity is ms)
+            If start is negative, it wraps around from the end. If not


Improper indent. Please make "If" align with "Start". The same below.

Remove "(supported granularity is ms )".

xinghai-sun · 2017-06-15T11:26:11Z

deep_speech_2/data_utils/audio.py

+
+        Note that this is an in-place transformation.
+
+        :param new_sample_rate: target sample rate


What is "new_sample_rate"? You have only "target_sample_rate"!

xinghai-sun · 2017-06-15T11:27:48Z

deep_speech_2/data_utils/audio.py

-        raise NotImplementedError()
+        """Pads this audio sample with a period of silence.
+
+        Note that this is an in-place transformation.


Please be careful about the doc's coding style (Upper case, dot mark, proper indent), as mentioned above.

xinghai-sun · 2017-06-15T11:28:30Z

deep_speech_2/data_utils/audio.py

+            'beginning' - adds silence in the beginning
+            'end' - adds silence in the end
+            'both' - adds silence in both the beginning and the end.
+        :type sides: basestring


Here, unicode is not possible. So basestring --> str. Use basestring Only when both unicode and str are supported.

xinghai-sun · 2017-06-15T11:31:31Z

deep_speech_2/data_utils/audio.py

+        elif sides == "both":
+            padded = cls.concatenate(silence, self, silence)
+        else:
+            raise ValueError("Unknown value for the kwarg 'sides'")


--> raise ValueError("Unknown sides value %s." % sides)

xinghai-sun · 2017-06-15T11:33:14Z

deep_speech_2/data_utils/audio.py


    def subsegment(self, start_sec=None, end_sec=None):
-        raise NotImplementedError()
+        """Return new AudioSegment containing audio between given boundaries.


This doc is different from the codes: not "return ....".

chrisxu2016

I have rewrite the audio.py file to make it more formal. But there is still a problem, bayesias normalize not found in the speech_dl code related to the introduction. The part of the code will not affect other functions when delete it.

chrisxu2016

I have tested every function in audio.py，the unit test test script will be pushed later

xinghai-sun

Still needs improvement.

xinghai-sun · 2017-06-16T09:25:00Z

deep_speech_2/data_utils/audio.py

+        """
+        if type(self) != type(other):
+            raise TypeError("Cannot add segment of different type: {}"
+                            .format(type(other)))


-->raise TypeError("Cannot add segments of different types: %s and %s." % (type(self), type(other)))

As mentioned in last review, do not use two kinds of string formatting methods.

xinghai-sun · 2017-06-16T09:27:44Z

deep_speech_2/data_utils/audio.py

@@ -44,6 +47,32 @@ def __ne__(self, other):
        """Return whether two objects are unequal."""
        return not self.__eq__(other)

+    def __len__(self):
+        """Returns length of segment in samples."""


Returns --> Return

xinghai-sun · 2017-06-16T09:30:19Z

deep_speech_2/data_utils/audio.py

@@ -75,6 +104,31 @@ def from_bytes(cls, bytes):
            io.BytesIO(bytes), dtype='float32')
        return cls(samples, sample_rate)

+    def concatenate(self, *segments):


Make it a classmethod:

@classmethod def concatenate(cls, *segment):

Please also overload it for SpeechSegment.

xinghai-sun · 2017-06-16T09:33:29Z

deep_speech_2/data_utils/audio.py

+            if sample_rate != seg._sample_rate:
+                raise ValueError("Can't concatenate segments with "
+                                 "different sample rates")
+            if type(seg) is not type(self):


type(self) --> cls

xinghai-sun · 2017-06-16T09:38:50Z

deep_speech_2/data_utils/audio.py

+                raise TypeError("Only audio segments of the same type "
+                                "instance can be concatenated.")
+        samples = np.concatenate([seg.samples for seg in segments])
+        return type(self)(samples, sample_rate)


type(self) --> cls

xinghai-sun · 2017-06-16T10:59:50Z

deep_speech_2/data_utils/audio.py

+        if noise.sample_rate != self.sample_rate:
+            raise ValueError("Noise sample rate (%d Hz) is not equal to "
+                             "base signal sample rate (%d Hz)." %
+                             (noise.sample_rate, self.sample_rate))


Convert L483-L485 to two lines. The same below.

xinghai-sun · 2017-06-16T11:01:10Z

deep_speech_2/data_utils/audio.py

+                             "least as long as base signal (%f sec)." %
+                             (noise.duration, self.duration))
+        noise_gain_db = self.rms_db - noise.rms_db - snr_dB
+        noise_gain_db = min(max_gain_db, noise_gain_db)


L490-491 --> `noise_gain_bd = min(self.rms_db - noise.rms_db - snr_dB, max_gain_bd)

xinghai-sun · 2017-06-16T11:04:20Z

deep_speech_2/data_utils/audio.py

+        noise_gain_db = self.rms_db - noise.rms_db - snr_dB
+        noise_gain_db = min(max_gain_db, noise_gain_db)
+        noise_subsegment = noise.random_subsegment(self.duration, rng=rng)
+        output = self + self.tranform_noise(noise_subsegment, noise_gain_db)


We already have def apply_gain(...).

xinghai-sun · 2017-06-16T11:05:12Z

deep_speech_2/data_utils/audio.py

+        self._samples = output._samples
+        self._sample_rate = output._sample_rate
+
+    def tranform_noise(self, noise_subsegment, noise_gain_db):


Remove this, use apply_gain instead.

xinghai-sun · 2017-06-16T11:08:07Z

deep_speech_2/data_utils/audio.py

-        raise NotImplementedError()
+        :param impulse_segment: Impulse response segments.
+        :type impulse_segment: AudioSegment
+        :param allow_resample: indicates whether resampling is allowed when


indicates --> Indicates

chrisxu2016

Resolved the above mentioned problem

chrisxu2016 · 2017-06-16T07:47:38Z

deep_speech_2/data_utils/audio.py


    def subsegment(self, start_sec=None, end_sec=None):
-        raise NotImplementedError()
+        """Return new AudioSegment containing audio between given boundaries.


chrisxu2016 · 2017-06-18T10:01:40Z

deep_speech_2/data_utils/audio.py

@@ -44,6 +47,32 @@ def __ne__(self, other):
        """Return whether two objects are unequal."""
        return not self.__eq__(other)

+    def __len__(self):


chrisxu2016 · 2017-06-18T10:08:58Z

deep_speech_2/data_utils/audio.py

@@ -44,6 +47,32 @@ def __ne__(self, other):
        """Return whether two objects are unequal."""
        return not self.__eq__(other)

+    def __len__(self):
+        """Returns length of segment in samples."""


chrisxu2016 · 2017-06-18T10:09:05Z

deep_speech_2/data_utils/audio.py

+        """Returns length of segment in samples."""
+        return self.num_samples
+
+    def __add__(self, other):


chrisxu2016 · 2017-06-18T10:09:20Z

deep_speech_2/data_utils/audio.py

+        a new segment (sample-wise addition, not segment concatenation).
+
+        :param other: Segment containing samples to be
+                      added in.


chrisxu2016 · 2017-06-18T10:19:29Z

deep_speech_2/data_utils/audio.py

+                            gain to a zero signal.
+        :type max_gain_db: float
+        :param rng: Random number generator state.
+        :type rng: random.Random


chrisxu2016 · 2017-06-18T10:19:36Z

deep_speech_2/data_utils/audio.py

+        if noise.sample_rate != self.sample_rate:
+            raise ValueError("Noise sample rate (%d Hz) is not equal to "
+                             "base signal sample rate (%d Hz)." %
+                             (noise.sample_rate, self.sample_rate))


chrisxu2016 · 2017-06-18T10:19:52Z

deep_speech_2/data_utils/audio.py

+                             "least as long as base signal (%f sec)." %
+                             (noise.duration, self.duration))
+        noise_gain_db = self.rms_db - noise.rms_db - snr_dB
+        noise_gain_db = min(max_gain_db, noise_gain_db)


chrisxu2016 · 2017-06-18T10:20:02Z

deep_speech_2/data_utils/audio.py

+        noise_gain_db = self.rms_db - noise.rms_db - snr_dB
+        noise_gain_db = min(max_gain_db, noise_gain_db)
+        noise_subsegment = noise.random_subsegment(self.duration, rng=rng)
+        output = self + self.tranform_noise(noise_subsegment, noise_gain_db)


chrisxu2016 · 2017-06-18T10:20:07Z

deep_speech_2/data_utils/audio.py

+        self._samples = output._samples
+        self._sample_rate = output._sample_rate
+
+    def tranform_noise(self, noise_subsegment, noise_gain_db):


xinghai-sun

Almost LGTM.

xinghai-sun · 2017-06-18T12:33:10Z

deep_speech_2/data_utils/audio.py

+        :rtype: AudioSegment
+        :raises ValueError: If the number of segments is zero, or if the 
+                            sample_rate of any two segment does not match.
+        :raises TypeError: If every item in segments is not AudioSegment


every item in segments --> any segment

xinghai-sun · 2017-06-18T12:34:07Z

deep_speech_2/data_utils/audio.py

+        :return: Audio segment instance as concatenating results.
+        :rtype: AudioSegment
+        :raises ValueError: If the number of segments is zero, or if the 
+                            sample_rate of any two segment does not match.


two segment --> segments

xinghai-sun · 2017-06-18T12:35:24Z

deep_speech_2/data_utils/audio.py

+                                 "different sample rates")
+            if type(seg) is not cls:
+                raise TypeError("Only audio segments of the same type "
+                                "instance can be concatenated.")


remove "instance"

xinghai-sun · 2017-06-18T12:38:41Z

deep_speech_2/data_utils/audio.py

+    @classmethod
+    def make_silence(cls, duration, sample_rate):
+        """Creates a silent audio segment of the given duration and
+        sample rate.


Is one line enough for the whole sentence?

xinghai-sun · 2017-06-18T12:40:30Z

deep_speech_2/data_utils/audio.py

+        samples = np.zeros(int(duration * sample_rate))
+        return cls(samples, sample_rate)
+
+    def superimposed(self, other):


superimposed --> superimpose ?

xinghai-sun · 2017-06-18T13:00:04Z

deep_speech_2/data_utils/audio.py

+                             " base signal (%f sec)." %
+                             (noise.duration, self.duration))
+        noise_gain_db = min(self.rms_db - noise.rms_db - snr_dB, max_gain_db)
+        noise.random_subsegment(self.duration, rng=rng)


Add noise_new = copy.deepcopy(noise), and then perform transformation on noise_new, otherwise the input noise will be modified.

xinghai-sun · 2017-06-18T13:01:06Z

deep_speech_2/data_utils/speech.py

@@ -65,6 +65,74 @@ def from_bytes(cls, bytes, transcript):
        audio = AudioSegment.from_bytes(bytes)
        return cls(audio.samples, audio.sample_rate, transcript)

+    @classmethod
+    def concatenate(cls, *segments):
+        """Concatenate an arbitrary number of speech segments together.


Add "Both audio and transcript will be concatenated."

xinghai-sun · 2017-06-18T13:01:50Z

deep_speech_2/data_utils/speech.py

+        :rtype: SpeechSegment
+        :raises ValueError: If the number of segments is zero, or if the 
+                            sample_rate of any two segments does not match.
+        :raises TypeError: If every item in segments is not SpeechSegment


every item in segments --> any segment

xinghai-sun · 2017-06-18T13:05:15Z

deep_speech_2/data_utils/speech.py

+    @classmethod
+    def make_silence(cls, duration, sample_rate):
+        """Creates a silent speech segment of the given duration and
+        sample rate.


Add "Transcript will be an empty string.".

xinghai-sun · 2017-06-18T13:07:59Z

deep_speech_2/data_utils/speech.py

+        return cls(samples, sample_rate, transcripts)
+
+    @classmethod
+    def slice_from_file(cls, filepath, start=None, end=None, transcript=""):


It would be better for transcript to have no default value.

chrisxu2016

fix all problem

chrisxu2016 · 2017-06-18T15:43:48Z

deep_speech_2/data_utils/audio.py

+        :return: Audio segment instance as concatenating results.
+        :rtype: AudioSegment
+        :raises ValueError: If the number of segments is zero, or if the 
+                            sample_rate of any two segment does not match.


chrisxu2016 · 2017-06-18T15:44:17Z

deep_speech_2/data_utils/audio.py

+        :rtype: AudioSegment
+        :raises ValueError: If the number of segments is zero, or if the 
+                            sample_rate of any two segment does not match.
+        :raises TypeError: If every item in segments is not AudioSegment


chrisxu2016 · 2017-06-18T15:45:10Z

deep_speech_2/data_utils/audio.py

+                                 "different sample rates")
+            if type(seg) is not cls:
+                raise TypeError("Only audio segments of the same type "
+                                "instance can be concatenated.")


chrisxu2016 · 2017-06-18T15:45:35Z

deep_speech_2/data_utils/audio.py

+    @classmethod
+    def make_silence(cls, duration, sample_rate):
+        """Creates a silent audio segment of the given duration and
+        sample rate.


chrisxu2016 · 2017-06-18T15:45:50Z

deep_speech_2/data_utils/audio.py

+        samples = np.zeros(int(duration * sample_rate))
+        return cls(samples, sample_rate)
+
+    def superimposed(self, other):


chrisxu2016 · 2017-06-18T15:59:14Z

deep_speech_2/data_utils/audio.py

+                             " base signal (%f sec)." %
+                             (noise.duration, self.duration))
+        noise_gain_db = min(self.rms_db - noise.rms_db - snr_dB, max_gain_db)
+        noise.random_subsegment(self.duration, rng=rng)


chrisxu2016 · 2017-06-18T15:59:26Z

deep_speech_2/data_utils/speech.py

@@ -65,6 +65,74 @@ def from_bytes(cls, bytes, transcript):
        audio = AudioSegment.from_bytes(bytes)
        return cls(audio.samples, audio.sample_rate, transcript)

+    @classmethod
+    def concatenate(cls, *segments):
+        """Concatenate an arbitrary number of speech segments together.


chrisxu2016 · 2017-06-18T15:59:34Z

deep_speech_2/data_utils/speech.py

+        :rtype: SpeechSegment
+        :raises ValueError: If the number of segments is zero, or if the 
+                            sample_rate of any two segments does not match.
+        :raises TypeError: If every item in segments is not SpeechSegment


chrisxu2016 · 2017-06-18T15:59:40Z

deep_speech_2/data_utils/speech.py

+        return cls(samples, sample_rate, transcripts)
+
+    @classmethod
+    def slice_from_file(cls, filepath, start=None, end=None, transcript=""):


chrisxu2016 · 2017-06-18T15:59:55Z

deep_speech_2/data_utils/speech.py

+    @classmethod
+    def make_silence(cls, duration, sample_rate):
+        """Creates a silent speech segment of the given duration and
+        sample rate.


xinghai-sun

LGTM

add augmentation

a84bdf6

chrisxu2016 requested review from kuke, pkuyym, lcy-seso, xinghai-sun and qingqing01 June 14, 2017 19:14

xinghai-sun requested changes Jun 15, 2017

View reviewed changes

add audio part

42ba74e

chrisxu2016 self-assigned this Jun 16, 2017

chrisxu2016 commented Jun 16, 2017

View reviewed changes

modify audio and speech

602dcc8

xinghai-sun requested changes Jun 16, 2017

View reviewed changes

chrisxu2016 added 4 commits June 17, 2017 09:03

add audio file

193601a

add audio augmentation

3d4aba5

add audio augmentation

bfa4dd9

add audio file

6f7a0ba

chrisxu2016 commented Jun 18, 2017

View reviewed changes

add audio file

1b7c7c6

xinghai-sun requested changes Jun 18, 2017

View reviewed changes

add audio file

e64bd00

chrisxu2016 commented Jun 18, 2017

View reviewed changes

xinghai-sun approved these changes Jun 19, 2017

View reviewed changes

xinghai-sun merged commit 06f272a into develop Jun 19, 2017


		Note that this is an in-place transformation.

		:param new_sample_rate: target sample rate

add augmentation part #99

add augmentation part #99

Conversation

chrisxu2016 commented Jun 14, 2017

xinghai-sun left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chrisxu2016 left a comment

Choose a reason for hiding this comment

chrisxu2016 left a comment • edited Loading

Choose a reason for hiding this comment

xinghai-sun left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chrisxu2016 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xinghai-sun left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xinghai-sun left a comment •

edited

Loading

chrisxu2016 left a comment •

edited

Loading