-
Notifications
You must be signed in to change notification settings - Fork 103
/
quiz.mdx
180 lines (156 loc) · 4.8 KB
/
quiz.mdx
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
<!-- DISABLE-FRONTMATTER-SECTIONS -->
# Check your understanding of the course material
### 1. What units is the sampling rate measured in?
<Question
choices={[
{
text: "dB",
explain: "No, the amplitude is measured in decibels (dB)."
},
{
text: "Hz",
explain: "The sampling rate is the number of samples taken in one second and is measured in hertz (Hz).",
correct: true
},
{
text: "bit",
explain: "Bits are used to describe bit depth, which refers to the number of bits of information used to represent each sample of an audio signal.",
}
]}
/>
### 2. When streaming a large audio dataset, how soon can you start using it?
<Question
choices={[
{
text: "As soon as the full dataset is downloaded.",
explain: "The goal of streaming data is to be able to work with it without having to fully download a dataset."
},
{
text: "As soon as the first 16 examples are downloaded.",
explain: "Try again!"
},
{
text: "As soon as the first example is downloaded.",
explain: "",
correct: true
}
]}
/>
### 3. What is a spectrogram?
<Question
choices={[
{
text: "A device used to digitize the audio that is first captured by a microphone, which converts the sound waves into an electrical signal.",
explain: "A device used to digitize such electrical signal is called Analog-to-Digital Converter. Try again!"
},
{
text: "A plot that shows how the amplitude of an audio signal change over time. It is also known as the *time domain* representation of sound.",
explain: "The description above refers to waveform, not spectrogram."
},
{
text: "A visual representation of the frequency spectrum of a signal as it varies with time.",
explain: "",
correct: true
}
]}
/>
### 4. What is the easiest way to convert raw audio data into log-mel spectrogram expected by Whisper?
A.
```python
librosa.feature.melspectrogram(audio["array"])
```
B.
```python
feature_extractor = WhisperFeatureExtractor.from_pretrained("openai/whisper-small")
feature_extractor(audio["array"])
```
C.
```python
dataset.feature(audio["array"], model="whisper")
```
<Question
choices={[
{
text: "A",
explain: "`librosa.feature.melspectrogram()` creates a power spectrogram."
},
{
text: "B",
explain: "",
correct: true
},
{
text: "C",
explain: "Dataset does not prepare features for Transformer models, this is done by the model's preprocessor."
}
]}
/>
### 5. How do you load a dataset from 🤗 Hub?
A.
```python
from datasets import load_dataset
dataset = load_dataset(DATASET_NAME_ON_HUB)
```
B.
```python
import librosa
dataset = librosa.load(PATH_TO_DATASET)
```
C.
```python
from transformers import load_dataset
dataset = load_dataset(DATASET_NAME_ON_HUB)
```
<Question
choices={[
{
text: "A",
explain: "The best way is to use the 🤗 Datasets library.",
correct: true
},
{
text: "B",
explain: "Librosa.load is useful to load an individual audio file from a path into a tuple with audio time series and a sampling rate, but not an entire dataset with many examples and multiple features. "
},
{
text: "C",
explain: "load_dataset method comes in the 🤗 Datasets library, not in 🤗 Transformers."
}
]}
/>
### 6. Your custom dataset contains high-quality audio with 32 kHz sampling rate. You want to train a speech recognition model that expects the audio examples to have a 16 kHz sampling rate. What should you do?
<Question
choices={[
{
text: "Use the examples as is, the model will easily generalize to higher quality audio examples.",
explain: "Due to reliance on attention mechanisms, it is challenging for models to generalize between sampling rates."
},
{
text: "Use Audio module from the 🤗 Datasets library to downsample the examples in the custom dataset",
explain: "",
correct: true
},
{
text: "Downsample by a factor 2x by throwing away every other sample.",
explain: "This will create distortions in the signal called aliases. Doing resampling correctly is tricky and best left to well-tested libraries such as librosa or 🤗 Datasets."
}
]}
/>
### 7. How can you convert a spectrogram generated by a machine learning model into a waveform?
<Question
choices={[
{
text: "We can use a neural network called a vocoder to reconstruct a waveform from the spectrogram.",
explain: "Since the phase information is missing in this case, we need to use a vocoder, or the classic Griffin-Lim algorithm to reconstruct the waveform.",
correct: true
},
{
text: "We can use the inverse STFT to convert the generated spectrogram into a waveform",
explain: "A generated spectrogram is missing phase information that is required to use the inverse STFT."
},
{
text: "You can't convert a spectrogram generated by a machine learning model into a waveform.",
explain: "Try again!"
}
]}
/>