-
A quick question on stuff seen in the Lines 121 to 122 in 9e653bd Is Line 123 in 9e653bd Is that some form of normalization/binding the output to a range? I'm also curious about the constants Any hints would be appreciated. Thank you! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
The first set of operations are written to be equivalent to what's done in librosa's After those two lines, the values of A model trained from scratch without L123 would likely just work as well, but you may see significantly degraded performance without those lines on the released Whisper models because they expect the inputs in that range, and without L123 the input becomes out-of-distribution. |
Beta Was this translation helpful? Give feedback.
The first set of operations are written to be equivalent to what's done in librosa's
amplitude_to_db
, which uses the defaulttop_db
value of 80.0, which sets how small a value in the spectrogram can be compared to the largest value.After those two lines, the values of
log_sepc
are roughly (but not strictly) in [-8.0, 0.0], and L123 puts them in [-1.0, 1.0] which is typical as an input to deep learning models.A model trained from scratch without L123 would likely just work as well, but you may see significantly degraded performance without those lines on the released Whisper models because they expect the inputs in that range, and without L123 the input becomes out-of-distribution.