Preparing scientific audio text for whisper fine-tuning #2148

kojomensahonums · 2024-04-22T22:26:10Z

kojomensahonums
Apr 22, 2024

I am currently working on training whisper for scientific and mathematical audio datasets. In preparing the ground truth text data, what is the best way to go about preparing it? How should equations be written down, should they be plainly put in text without symbols? How should SI units be written? Assuming there is a quadratic equation, example x^2-y^2=25, what's the best way to put this in text so whisper can follow through to transcribe? These are just few examples I am thinking through.

Purfview · 2024-04-23T09:54:28Z

Purfview
Apr 23, 2024

should they be plainly put in text without symbols?

Yes, avoid any symbols.

0 replies

gongouveia · 2024-04-23T12:03:04Z

gongouveia
Apr 23, 2024

@kojomensahonums Hello, I believe my tool (I am still adding features, and solving bugs) for creating synthetic audio datasets must be useful for your project, you can translate and edit audios to match your desired format.

4 replies

kojomensahonums Apr 23, 2024
Author

What tool is that? Would like to have a look

gongouveia Apr 23, 2024

Sorry, I forgot to add the link.
https://github.com/gongouveia/Whisper-Temple-Synthetic-ASR-Dataset-Generator I am glad you could look it and give some feedback if it is useful for your project.
Next feature I am adding is export the audiodataset to Huggingface or kaldi format.

kojomensahonums Apr 26, 2024
Author

Hi @gongouveia, I have tried using your tool but facing some challenges. The instructions are quite different from the files present. For example, the requirement.txt file mentioned is a req.yml file and I can't seem to run it. I don't know what it is with the yaml file. I changed the prefix to my virtual environment but I keep getting name errors. Also, the speech_gen.py file is not present. The instructions don't match with the files. Do help me out here.

gongouveia Apr 28, 2024

@kojomensahonums sorry for the reply, I will get to you as soon as possible with some fix.

kojomensahonums · 2024-06-03T20:52:42Z

kojomensahonums
Jun 3, 2024
Author

@gongouveia any progress so far?

5 replies

gongouveia Jun 3, 2024

@kojomensahonums Hello, I have been very busy. Yes the ReadMe is deprecated.
You can send me private message in contacts in my profile and I can work it our for you.

kojomensahonums Jun 4, 2024
Author

That would be great! How can I reach you?

gongouveia Jun 16, 2024

@kojomensahonums I updated readme. the dependencies are in the req.yml file. and you can install it using conda for example.
Any other question you can reach on email. See end of repository for me contact.

Also updated the software images, because it has more things that previosuly stated

kojomensahonums Jul 27, 2024
Author

@gongouveia I have just tried your tool and it's awesome!

gongouveia Jul 27, 2024

@kojomensahonums I am very glad you enjoyed the tool, please don't forget to support the project giving a star and contributing.
If you would like new features, open an issue with detailed description.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preparing scientific audio text for whisper fine-tuning #2148

{{title}}

Replies: 3 comments 9 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Preparing scientific audio text for whisper fine-tuning #2148

Replies: 3 comments · 9 replies

kojomensahonums Apr 23, 2024 Author

kojomensahonums Apr 26, 2024 Author

kojomensahonums Jun 3, 2024 Author

kojomensahonums Jun 4, 2024 Author

kojomensahonums Jul 27, 2024 Author

Replies: 3 comments 9 replies

kojomensahonums Apr 23, 2024
Author

kojomensahonums Apr 26, 2024
Author

kojomensahonums
Jun 3, 2024
Author

kojomensahonums Jun 4, 2024
Author

kojomensahonums Jul 27, 2024
Author