Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add augmentation part #99

Merged
merged 9 commits into from
Jun 19, 2017
Merged

add augmentation part #99

merged 9 commits into from
Jun 19, 2017

Conversation

chrisxu2016
Copy link
Contributor

resolve #96

  • add data augmentation class ,inclued noise_speech, impuls_response, resampler, speed_perturb, online_bayesias_normalization.
  • add function to audio.py, eg. convolveadd_noise, normalizer

Copy link
Contributor

@xinghai-sun xinghai-sun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job, but still needs intensive improvement.

Only reviewed for audio.py before Line 398.

  1. Please pay more attention to the details, especially for the docs. Remember an old saying "Devil is in the details! ".

  2. Please avoid copying codes from somewhere without a fully understanding about it. It would be better if we could improve it, or at least make it cleaner. The same thing goes to docs.

  3. Please add unit tests or at least test every function before commit. If the project is urgent, some delay for the unit test would be acceptable. But at least, every function must be tested by the author before commit. For the audio parts, the tests should also include writing the transformed audio into a wav file and then we listen to the wav file to make sure such transformation functions correctly. If we have time, a timely added unit test would be great!

@@ -6,6 +6,8 @@
import numpy as np
import io
import soundfile
import scikits.samplerate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the package "scikits" and "scipy" to requirements.txt. Make sure they can be installed by pip install -r requirements.txt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -63,6 +65,69 @@ def from_file(cls, file):
return cls(samples, sample_rate)

@classmethod
def slice_from_file(cls, fname, start=None, end=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fname --> file
Please avoid using too many abbreviation if the full name is not too long.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -63,6 +65,69 @@ def from_file(cls, file):
return cls(samples, sample_rate)

@classmethod
def slice_from_file(cls, fname, start=None, end=None):
"""
Loads a small section of an audio without having to load
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put Line 70 into Line 69.
The same below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Loads a small section of an audio without having to load
the entire file into the memory which can be incredibly wasteful.

:param fname: input audio file name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"input audio file name." --> "Input audio filepath."
Note the upper case and an ending dot mark.

The same below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

:param fname: input audio file name
:type fname: bsaestring
:param start: start time in seconds (supported granularity is ms)
If start is negative, it wraps around from the end. If not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Improper indent. Please make "If" align with "Start". The same below.
  2. Remove "(supported granularity is ms )".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


Note that this is an in-place transformation.

:param new_sample_rate: target sample rate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "new_sample_rate"? You have only "target_sample_rate"!

raise NotImplementedError()
"""Pads this audio sample with a period of silence.

Note that this is an in-place transformation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be careful about the doc's coding style (Upper case, dot mark, proper indent), as mentioned above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

'beginning' - adds silence in the beginning
'end' - adds silence in the end
'both' - adds silence in both the beginning and the end.
:type sides: basestring
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, unicode is not possible. So basestring --> str. Use basestring Only when both unicode and str are supported.

elif sides == "both":
padded = cls.concatenate(silence, self, silence)
else:
raise ValueError("Unknown value for the kwarg 'sides'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--> raise ValueError("Unknown sides value %s." % sides)


def subsegment(self, start_sec=None, end_sec=None):
raise NotImplementedError()
"""Return new AudioSegment containing audio between given boundaries.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc is different from the codes: not "return ....".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@chrisxu2016 chrisxu2016 self-assigned this Jun 16, 2017
Copy link
Contributor Author

@chrisxu2016 chrisxu2016 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have rewrite the audio.py file to make it more formal. But there is still a problem, bayesias normalize not found in the speech_dl code related to the introduction. The part of the code will not affect other functions when delete it.

Copy link
Contributor Author

@chrisxu2016 chrisxu2016 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested every function in audio.py,the unit test test script will be pushed later

Copy link
Contributor

@xinghai-sun xinghai-sun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still needs improvement.

"""
if type(self) != type(other):
raise TypeError("Cannot add segment of different type: {}"
.format(type(other)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-->raise TypeError("Cannot add segments of different types: %s and %s." % (type(self), type(other)))

As mentioned in last review, do not use two kinds of string formatting methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -44,6 +47,32 @@ def __ne__(self, other):
"""Return whether two objects are unequal."""
return not self.__eq__(other)

def __len__(self):
"""Returns length of segment in samples."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returns --> Return

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -75,6 +104,31 @@ def from_bytes(cls, bytes):
io.BytesIO(bytes), dtype='float32')
return cls(samples, sample_rate)

def concatenate(self, *segments):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it a classmethod:

@classmethod
def concatenate(cls, *segment):

Please also overload it for SpeechSegment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if sample_rate != seg._sample_rate:
raise ValueError("Can't concatenate segments with "
"different sample rates")
if type(seg) is not type(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type(self) --> cls

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

raise TypeError("Only audio segments of the same type "
"instance can be concatenated.")
samples = np.concatenate([seg.samples for seg in segments])
return type(self)(samples, sample_rate)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type(self) --> cls

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if noise.sample_rate != self.sample_rate:
raise ValueError("Noise sample rate (%d Hz) is not equal to "
"base signal sample rate (%d Hz)." %
(noise.sample_rate, self.sample_rate))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convert L483-L485 to two lines. The same below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"least as long as base signal (%f sec)." %
(noise.duration, self.duration))
noise_gain_db = self.rms_db - noise.rms_db - snr_dB
noise_gain_db = min(max_gain_db, noise_gain_db)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L490-491 --> `noise_gain_bd = min(self.rms_db - noise.rms_db - snr_dB, max_gain_bd)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

noise_gain_db = self.rms_db - noise.rms_db - snr_dB
noise_gain_db = min(max_gain_db, noise_gain_db)
noise_subsegment = noise.random_subsegment(self.duration, rng=rng)
output = self + self.tranform_noise(noise_subsegment, noise_gain_db)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have def apply_gain(...).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

self._samples = output._samples
self._sample_rate = output._sample_rate

def tranform_noise(self, noise_subsegment, noise_gain_db):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this, use apply_gain instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

raise NotImplementedError()
:param impulse_segment: Impulse response segments.
:type impulse_segment: AudioSegment
:param allow_resample: indicates whether resampling is allowed when
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indicates --> Indicates

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

@chrisxu2016 chrisxu2016 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved the above mentioned problem


def subsegment(self, start_sec=None, end_sec=None):
raise NotImplementedError()
"""Return new AudioSegment containing audio between given boundaries.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -44,6 +47,32 @@ def __ne__(self, other):
"""Return whether two objects are unequal."""
return not self.__eq__(other)

def __len__(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -44,6 +47,32 @@ def __ne__(self, other):
"""Return whether two objects are unequal."""
return not self.__eq__(other)

def __len__(self):
"""Returns length of segment in samples."""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"""Returns length of segment in samples."""
return self.num_samples

def __add__(self, other):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

a new segment (sample-wise addition, not segment concatenation).

:param other: Segment containing samples to be
added in.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

gain to a zero signal.
:type max_gain_db: float
:param rng: Random number generator state.
:type rng: random.Random
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if noise.sample_rate != self.sample_rate:
raise ValueError("Noise sample rate (%d Hz) is not equal to "
"base signal sample rate (%d Hz)." %
(noise.sample_rate, self.sample_rate))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"least as long as base signal (%f sec)." %
(noise.duration, self.duration))
noise_gain_db = self.rms_db - noise.rms_db - snr_dB
noise_gain_db = min(max_gain_db, noise_gain_db)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

noise_gain_db = self.rms_db - noise.rms_db - snr_dB
noise_gain_db = min(max_gain_db, noise_gain_db)
noise_subsegment = noise.random_subsegment(self.duration, rng=rng)
output = self + self.tranform_noise(noise_subsegment, noise_gain_db)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

self._samples = output._samples
self._sample_rate = output._sample_rate

def tranform_noise(self, noise_subsegment, noise_gain_db):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@xinghai-sun xinghai-sun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost LGTM.

:rtype: AudioSegment
:raises ValueError: If the number of segments is zero, or if the
sample_rate of any two segment does not match.
:raises TypeError: If every item in segments is not AudioSegment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

every item in segments --> any segment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

:return: Audio segment instance as concatenating results.
:rtype: AudioSegment
:raises ValueError: If the number of segments is zero, or if the
sample_rate of any two segment does not match.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two segment --> segments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"different sample rates")
if type(seg) is not cls:
raise TypeError("Only audio segments of the same type "
"instance can be concatenated.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove "instance"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@classmethod
def make_silence(cls, duration, sample_rate):
"""Creates a silent audio segment of the given duration and
sample rate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is one line enough for the whole sentence?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

samples = np.zeros(int(duration * sample_rate))
return cls(samples, sample_rate)

def superimposed(self, other):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

superimposed --> superimpose ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

" base signal (%f sec)." %
(noise.duration, self.duration))
noise_gain_db = min(self.rms_db - noise.rms_db - snr_dB, max_gain_db)
noise.random_subsegment(self.duration, rng=rng)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add noise_new = copy.deepcopy(noise), and then perform transformation on noise_new, otherwise the input noise will be modified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -65,6 +65,74 @@ def from_bytes(cls, bytes, transcript):
audio = AudioSegment.from_bytes(bytes)
return cls(audio.samples, audio.sample_rate, transcript)

@classmethod
def concatenate(cls, *segments):
"""Concatenate an arbitrary number of speech segments together.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add "Both audio and transcript will be concatenated."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

:rtype: SpeechSegment
:raises ValueError: If the number of segments is zero, or if the
sample_rate of any two segments does not match.
:raises TypeError: If every item in segments is not SpeechSegment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

every item in segments --> any segment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@classmethod
def make_silence(cls, duration, sample_rate):
"""Creates a silent speech segment of the given duration and
sample rate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add "Transcript will be an empty string.".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return cls(samples, sample_rate, transcripts)

@classmethod
def slice_from_file(cls, filepath, start=None, end=None, transcript=""):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better for transcript to have no default value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

@chrisxu2016 chrisxu2016 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix all problem

:return: Audio segment instance as concatenating results.
:rtype: AudioSegment
:raises ValueError: If the number of segments is zero, or if the
sample_rate of any two segment does not match.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

:rtype: AudioSegment
:raises ValueError: If the number of segments is zero, or if the
sample_rate of any two segment does not match.
:raises TypeError: If every item in segments is not AudioSegment
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"different sample rates")
if type(seg) is not cls:
raise TypeError("Only audio segments of the same type "
"instance can be concatenated.")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@classmethod
def make_silence(cls, duration, sample_rate):
"""Creates a silent audio segment of the given duration and
sample rate.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

samples = np.zeros(int(duration * sample_rate))
return cls(samples, sample_rate)

def superimposed(self, other):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

" base signal (%f sec)." %
(noise.duration, self.duration))
noise_gain_db = min(self.rms_db - noise.rms_db - snr_dB, max_gain_db)
noise.random_subsegment(self.duration, rng=rng)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -65,6 +65,74 @@ def from_bytes(cls, bytes, transcript):
audio = AudioSegment.from_bytes(bytes)
return cls(audio.samples, audio.sample_rate, transcript)

@classmethod
def concatenate(cls, *segments):
"""Concatenate an arbitrary number of speech segments together.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

:rtype: SpeechSegment
:raises ValueError: If the number of segments is zero, or if the
sample_rate of any two segments does not match.
:raises TypeError: If every item in segments is not SpeechSegment
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return cls(samples, sample_rate, transcripts)

@classmethod
def slice_from_file(cls, filepath, start=None, end=None, transcript=""):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@classmethod
def make_silence(cls, duration, sample_rate):
"""Creates a silent speech segment of the given duration and
sample rate.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@xinghai-sun xinghai-sun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xinghai-sun xinghai-sun merged commit 06f272a into develop Jun 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add data argumentation part for DeepSpeech2
2 participants