Audio embedding length #186

NevermoreCY · 2024-08-29T07:54:21Z

Hi, i have a question about Audio Embedding. In the paper, you mentioned that "Given the contextual influence on sequential audio data, we extracted the corresponding 5-second audio segment for the S frames." However, in code talk_video.py line 250, you set audio_tensor to the corresponding 5 frames of the audio embedding. Is that "5-second" a typo in the paper? Or did i misunderstand the pipeline.

From my understanding, the audio is first extracted from the video, then the audio is processed by wave2vec2 to obtain the audio embedding. So the audio embedding has same length as the video data(unit is number of frames). Does that means you cut the videos into 5 second slices before go to the data_preprocess.py scripts?

Thanks for reading and answering my concerns.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audio embedding length #186

Audio embedding length #186

NevermoreCY commented Aug 29, 2024 •

edited

Loading

Audio embedding length #186

Audio embedding length #186

Comments

NevermoreCY commented Aug 29, 2024 • edited Loading

NevermoreCY commented Aug 29, 2024 •

edited

Loading