Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable Google Speech to Text transcription #429

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

kratu92
Copy link

@kratu92 kratu92 commented Jun 10, 2022

Options to enable punctuation and profanity filter to the results obtained when using Google Speech to Text for transcriptions.
Added option to enable the reception of interim results instead of only final results when using Google Speech to Text for transcriptions.
These options may be set in the sip-communicator.properties:
org.jitsi.jigasi.transcription.ENABLE_GOOGLE_AUTOMATIC_PUNCTUATION=false
org.jitsi.jigasi.transcription.ENABLE_GOOGLE_PROFANITY_FILTER=false
org.jitsi.jigasi.transcription.ENABLE_GOOGLE_INTERIM_RESULTS=false
I left them false by default so that the default behaviour does not change unless the user sets them to true.

@nikvaessen
Copy link
Contributor

nikvaessen commented Jun 13, 2022

If I remember correctly, without interum results, transcriptions can take 2 to 3 seconds to appear (end of sentence). If this is the case, what is the benefit of having a feature flag (org.jitsi.jigasi.transcription.ENABLE_GOOGLE_INTERIM_RESULTS) which decreases the UX? What is the use-case for setting org.jitsi.jigasi.transcription.ENABLE_GOOGLE_INTERIM_RESULTS=False?

@kratu92
Copy link
Author

kratu92 commented Jun 13, 2022

Setting org.jitsi.jigasi.transcription.ENABLE_GOOGLE_INTERIM_RESULTS=False is for the current default behaviour (before my change), when you don't receive the result until the sentence is finished, so that is why I left it as the default value.
Changing it to true will send the partial sentences while the active speaker is speaking (Vosk had this feature).
It can take a couple of seconds if the sentence is long, and we found out that people with hearing disability prefer to read the transcription while they see the person speaking than wait until the sentence has finished, even if the transcription is not 100% accurate and it changes a bit and is updated later. Also people with cognitive disability may find it confusing to see the person speaking and not see the transcription for a while.
This way you have the option to have the default functionality, before the change (leaving it false) or have the partial results (changing it to true).

@bgrozev
Copy link
Member

bgrozev commented Jun 13, 2022

There's existing code to enable interim results, look for RETRIEVE_INTERIM_RESULTS.

Unless I'm missing something, your addition of P_NAME_ENABLE_GOOGLE_INTERIM_RESULTS does one thing only: when you enable the flag, it forces TranscriptionResult.isInterim to false regardless of whether the result is interim or not. My guess is the existing code fails to do something with interim results for you, but I wuld prefer to chase down the issue instead of forcing isInterim=false. Perhaps this is where the problem stems from?

@kratu92 kratu92 force-pushed the configurable-google-transcription branch from 1626e77 to 59ca086 Compare June 14, 2022 13:51
…ervice to RemotePublisherTranscriptionHandler
@kratu92 kratu92 force-pushed the configurable-google-transcription branch from 59ca086 to 886a387 Compare June 14, 2022 13:54
@kratu92
Copy link
Author

kratu92 commented Jun 14, 2022

I saw the RETRIEVE_INTERIM_RESULTS constant private final static boolean RETRIEVE_INTERIM_RESULTS in GoogleCloudTranscriptionService.java, which is always set to true, and is used in the config sent by jigasi to the Google API to obtain the transcriptions, but those results are not returned afterwards to the user unless they are final.
What we need are those intermediate results to display them to the user temporarily.
In my last commit I tried to move the ENABLE_GOOGLE_INTERIM_RESULTS parameter to the RemotePublisherTranscriptionHandler class that you told me (the same way I did in GoogleCloudTranscriptionService but renaming the parameter to ENABLE_INTERIM_RESULTS as it no longer belongs to the google transcription only) and changing the line you linked to the following:
if (!enableInterimResults && result.isInterim())
This way it also works fine, when the parameter is set to true, it sends the JSON with the interim results and it may be a better solution. However when setting the option SEND_TEXT to true instead of SEND_JSON in the sip-communicator.properties it has no effect, it does not send the interim results, just the final ones. I'm using the JSON option, but if someone needs to use the TEXT option, and sets the parameter to true it may not work as intended.
I don't know if that is what you were suggesting to change and hope you get a better understanding of what we intend to achieve.

@bgrozev
Copy link
Member

bgrozev commented Jun 14, 2022

This looks good to me, but I'll let @damencho review.

@codecov
Copy link

codecov bot commented Jun 14, 2022

Codecov Report

Merging #429 (886a387) into master (e05924f) will decrease coverage by 0.08%.
The diff coverage is 0.00%.

❗ Current head 886a387 differs from pull request most recent head 98be4db. Consider uploading reports for the commit 98be4db to get more accurate results

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #429      +/-   ##
============================================
- Coverage     23.14%   23.05%   -0.09%     
  Complexity      303      303              
============================================
  Files            69       69              
  Lines          5808     5817       +9     
  Branches        787      788       +1     
============================================
- Hits           1344     1341       -3     
- Misses         4232     4244      +12     
  Partials        232      232              
Impacted Files Coverage Δ
...transcription/GoogleCloudTranscriptionService.java 0.00% <0.00%> (ø)
...scription/RemotePublisherTranscriptionHandler.java 0.00% <0.00%> (ø)
...n/java/org/jitsi/jigasi/JigasiBundleActivator.java 49.41% <0.00%> (-1.18%) ⬇️
src/main/java/org/jitsi/jigasi/JvbConference.java 44.88% <0.00%> (-0.64%) ⬇️
.../org/jitsi/jigasi/TranscriptionGatewaySession.java 0.00% <0.00%> (ø)
...c/main/java/org/jitsi/jigasi/stats/Statistics.java 42.38% <0.00%> (+0.66%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3fdc420...98be4db. Read the comment docs.

@dnna
Copy link

dnna commented Nov 8, 2023

Having automatic punctuation would be really nice. Did something block this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants