-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
onResult words being reset while still listening #552
Comments
Do you get a callback with the final set of results |
How is onDevice different such that it might cause this? I couldn't find a definition for what setting it to true does in the API doc, but intuited that it would not call the cloud for the purpose of voice recognition. But, given it works to set this as onDevice: false and with no internet connectivity, perhaps I intuited wrongly? Btw, I both downgraded to 6.5.1 and upgraded to 7.0.0 as part of my testing. All have the same behavior. |
Thanks for the details. I'll try to reproduce and let you know what I find. If you have a chance to try stopping the recognition and finding out what the recognition result is when final is true that would be interesting. You're mostly correct about the behaviour of onDevice, I should update the docs to provide more details. with |
If I issue stop() the final result (ie. when the SpeechRecognitionResult's finalResult == true) is the "+5". |
I'm also experiencing the exact same thing. been using this plugin for about a month now and only recently have started noticing this issue. maybe something changed on apples side? also version 6.6.2 on iOS. i am using something close to the example app but keep track of every time 'isFinal' is Seems to be an apple issue: https://forums.developer.apple.com/forums/thread/762952 |
It's unfortunate that the first issue above ../762952 reports that this issue does not occur on ios 17, but started with 18. That is not what we are seeing. I submitted a comment over there informing them that I am seeing the same behavior on ios 17. @flutterocks What version of IOS are you running? |
thanks @flutterocks, this is helpful. @righteoustales is this a problem that started for you relatively recently? The 731761 thread implies the issue only happens with on device recognition, is that what you are seeing? If it's an iOS issue then I doubt I can do anything useful at the plugin level to help resolve it unless there's been an API change that I missed. |
--- That's what I documented above in this thread as well. I only recently started using this flutter library so I can't speak to the history of it working or not. It noticed it was broken immediately after setting the on-device flag to true. |
@righteoustales I've recently updated to iOS 18, which is probably why I am only experiencing this now. Though I have Perhaps this 'broken' experience happens onDevice is Regardless of what might be causing this, I agree @sowens-csd, there isn't much this package can do to resolve. Though I will implement the suggestion from 731761, using timestamp to help determine if the result is 'final', likely in combination with comparing against the previous result (to prevent accidentally marking as final if there is latency) Some pseudo code of what I'm thinking:
where i will experiment with X to find what works, likely will have a value of ~1-2 seconds. I'll implement it this weekend on my end. Given seems to be impacting a lot of users, i could see value in having this directly inside |
I'm disappointed to hear that the workaround of setting the flag to false doesn't even work in ios18. @flutterocks, are you the person who reported it over on the apple forum and to whom I replied? It's a different name there, but I'm sure we all have multiple names that we use spanning various forums over the years. |
@righteoustales not me no, i just found the threads from some googling to see if the issue was flutter specific or apple |
@flutterocks so your thinking in that work around you suggested is that Apple is essentially starting a new recognition? So the goal would be to deliver the previous final results in some way so that the user knows they should be stored and that a new set of results will start? It's an interesting idea. Naively I was hoping that Apple would fix their implementation, but that could of course take a while. One problem is that I've seen some fairly long delays to the final results and that the speech recognition engine will not infrequently reinterpret previous results based on new context, which could result in false positives from that test. Also it would have to be iOS specific since the other engines don't have the same failure mode. I agree that the impact of the failure is fairly large, it would be good to be able to help mitigate it. |
@sowens-csd @flutterocks I was going to point out something similar to the "One problem" comment above. It doesn't work to save what was there previously for comparison as the recognition logic frequently reinterprets what the text first delivered (and second and third) said the more that you speak. For example, in my example above, if I had said: "add 347.12 + 1" You can watch in real time as the the first number is first recognized as 300, then 347, and so on as the recognition logic is processing. Given that, it can become very difficult to use comparison to distinguish between a reinterpretation of everything said so far versus when it is simply throwing away all of the preceding text and starting fresh. Have either of you found a way to tell the difference between the two? Does looking at the segment timestamp as proposed above actually work? I don't think that comparing to the previous result is going to work. This feels like an Apple bug unless they manifest data with the results returned that can reliably be used to prevent the loss of previously spoken text. |
Just downloaded apple's SpokenWord demo mentioned in 762952 and I'm experiencing the same issue, so fully confirmed it has to do with apple. I inspected the results and here's some interesting findings:
note: onDevice is true in this demo There's a few options to explore:
In any case these solutions would likely be temporary until / if apple fixes their bug. I'll probably personally wait until iOS18 officially releases next week to see if this still happens, but @righteoustales you're experiencing on 17.6.1. @righteoustales Can you download the apple sample and see if the same behaviour i described above happens? |
Just to confirm are you (@flutterocks ) saying that the one-line change that I did above does not help at all on ios18? Ie. that it drops the text equally whether that flag is set to true or false? And, if so, have you also tried testing it with network connectivity/wifi completely disabled? Any difference then? |
Correct, even with just tested without wifi and same thing, which is expected given your scenario is ondevice |
Thanks for confirming and trying that additional test. Btw, I also updated https://developer.apple.com/forums/thread/731761 with my own comments/test experience. |
That apple forum update I did has not yet been approved for some reason. Slackers. LOL. I also messed around with setting the task hint between unspecified, dictation, search, and confirmation. None of them help. |
@righteoustales are you able to confirm if the following behaves the same for you? (specifically the last two points)
|
All of the above 3 assertions are true for me as well. |
Given how old those two forum questions are on the apple developer forums and the complete absence of any acknowledgment from Apple on either, I'm not feeling very hopeful that they will do anything on this. But, I don't frequent their forums much. Any experience otherwise that is more hopeful than my conclusion here? My current plan is to see how things look when ios 18 is released and decision accordingly given that. I think I read that that release is imminent, like maybe next week. |
@righteoustales Meant to release on the 16th i believe. I too will wait for that and hope for the best... Btw do the speechRecognitionMetadata and timestamp behave the same for you regardless of what |
I’m sure they don’t given setting it to false actually works without dropping text.
Sent from Yahoo Mail for iPhone
On Saturday, September 14, 2024, 5:21 PM, flutterocks ***@***.***> wrote:
@righteoustales Meant to release on the 16th i believe. I too will wait for that and hope for the best...
Btw do the speechRecognitionMetadata and timestamp behave the same for you regardless of what requiresOnDeviceRecognition is set to?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
quick update after upgrading to ios18 since it was released today: This api is now broken as described herein regardless of whether the requireOnDeviceRecognition flag is set to true or false. Goodbye friendly workaround. |
I also updated the two apple forum issues. Maybe a bit of activity there will flush them out of the woodwork to comment on it, but I doubt it. Summary of where we are from my perspective: I question whether a developer would ever want this throw-away behavior, but will say with considerable certainty that they would for sure not want it if their task hint was set to "dictation". Given that, I'm wondering if it is worthwhile for this speech_to_text (flutter) feature to deal with this (what I'm calling a) bug by noticing the deletion/start-over and then mitigating it by (re)prepending the words thrown away. And, if not comfortable with doing that for all cases, then perhaps doing so if the developer indicates they want it (via taskhint or other). Without something of this nature, the speech-to-text results seem pretty unusable because as noted earlier in this thread the caller of this flutter api:
Thoughts? |
@righteoustales I'd have to agree with everything you said. Sure seems like a bug to me, at least a pretty major breaking behaviour change if it's not a bug. Supporting some mitigation in the plugin seems like the right path forward. Should Apple fix this then I'd think the mitigation would revert to a no-op since hopefully the timestamp reset would stop happening. I'll try to put together a beta and hopefully some folks can give it a try. |
I've loaded in the beta version but it seems that the experience is the same broken one, there might be some caching going on so I'll test around some more. Edit: still experiencing the broken behaviour |
Interesting. I'm not seeing that. In my testing, I didn't have any dropped words at all so far. I'm wondering what is different. UPDATE: does 'flutter pub deps' show the correct library version included as specified in your pubspec.yaml? |
Thank you for addressing this issue. |
@sowens-csd This popped up today on Stack Overflow. Sharing in case it is useful for comparison. |
update: 🤦 i had |
@sowens-csd I suggest a user configurable separator - my use case has the user dictating a long string of text, many sentences. For me, a separator of I.e. the phrase in the current implementation becomes: "The cat jumps off of the table He landed on both feet" whereas it would make more sense as: "The cat jumps off of the table. He landed on both feet" This is for my use case, I can see other use cases wanting a different separator hence suggestion for user configurable - but I would argue, that a period separator should be default - especially given the current capitalization behaviour. |
@flutterocks interesting suggestion, I'll think about how to implement it. Your result about partial results is very interesting. I think it should generate the same result whether partial results is true or false so I'll check that. |
I set partialResults = true in order to display what is being said as it is said to the app user, so not testing for the 'false' option there. Regarding, the request for the addition of punctuation at pauses, I can see scenarios where that is beneficial and a lot where it is not. If you decide to do that please allow for disabling it. I suspect you would do less harm by un-capitalizing the first word after the pause bug is detected than by adding punctuation in reaction to the apple-bug-induced capitalizing. |
@sowens-csd to implement, can't we we use a configurable variable in place of |
Clearly doable, but @sowens-csd stated earlier in this thread that he would like for the mitigation(s) to not require changing the api in a way that would no longer make sense when/if Apple fixes their bug. That's the challenging part. |
@righteoustales Not sure where your hostility is coming from, just looking to find a solution here :) Stephen says
That is not the same as keeping the API unchanged. It should be perfectly acceptable to temp add additional options to the API now to fix the issue, as long as they're no op once (if) apple fixes the bug, especially given some users will not update their OS and continue to be on the bugged versions of iOS. |
Hostility? I thought we were discussing pros and cons of potential changes as is commonly done in api discussions. I wasn't aware that talking about tradeoffs on github was now considered hostile sparring among combatants. Dang it. I forgot to put on my best medieval armor too. I can never get the timing of these things right. |
I noticed that the behavior depends on the locale. If I set If I change the locale to It also works as expected for That might be the reason why some people can't reproduce the issue. P.S. I think I saw the same behavior even before upgrading to iOS 18, but I'm not 100% sure. |
Confirming that the beta fixes the issue. Thank you! |
@lukyanov It's bizarre that the localId affects this as reported. Interesting find. |
@sowens-csd One issue (?) with the beta I stumbled upon. When you don't speak anything, before |
@lukyanov that behaviour differs from the official release? That's very odd. I can see how it could be an OS difference but not sure what I changed in the plugin to change that behaviour. I'll check it out. |
So I have an idea for the phrase aggregation that would allow it to be more customizable. The idea is to add an optional function parameter to the This would allow good customization of the behaviour but at the cost of more potential complexity for users of the plugin, even in just understanding the use of the new parameter and property. Add to that they might not even be useful for long if Apple fixes the bug. The new property can be pretty much ignored by users because it will almost always be null and even when not null it should be redundant with the The signature for the new aggregation function would be something like: String aggregate( List<String> recognizedPhrases ); So Thoughts? |
I think it is a thoughtful compromise that provides for different behaviors using the same mechanism that you use for the default behavior while also not requiring code changes by existing users of the plugin. The only downside of it that occurs to me is the likelihood that you would want to deprecate it if Apple agrees that this is a bug and fixes it. I am also curious why you would specify it there versus in SpeechListenOptions? Another possible approach would be to add a completely different method that could be used to install that same optional function and not modify the listen() method at all. Advantages of the separate method (top of mind) could be:
I've seen this approach used in libraries historically (like opengl) when new functionality is being exposed that needs to be done short-term as an extension call but for which that same extension call will later become unnecessary. I think your proposal works fine too. Just brainstorming a bit with you since you asked for thoughts. |
Good thought. Normally I'd avoid that style of implementation because it's a bit hidden and not directly correlated with the action but in this case that seems like a feature not a bug. Thanks! |
Agree. I'm happy to converse on pros/cons/ideas/brain-farts, especially given you are doing the heavy lifting. I appreciate both your work and your thoughtful approach. Funny anecdote: |
lol, I love it! That function name def gets an 11/10. Would love to know who won the prize for being the first person to complain about their database upgrade not reverting. |
They were probably drowned out by the dozens of faux requests from the rest of us amidst hallway guffaws. It was a fun time. |
@sowens-csd Btw, did you see this? It was posted 4 hours ago. |
I just added an update on that Apple thread based on upgrading to the latest ios beta. You can read it there, but the short story is this: the behavior seems to have reverted to what I observed originally on 17.6. requiresOnDeviceRecognition = true -- same bug |
I had not seen that, thanks for pointing it out. Looks like they are actively working on it, that's good news. |
7.0.0-beta.2 is now live on pub.dev. It has the new aggregator behaviour that can be overridden using |
@lukyanov I just tried to reproduce your result with en_US and fr_CA and in both cases it behaved as expected. For both locales I spoke and paused and saw the app properly aggregate multiple phrases. This was on iOS 18 with onDevice false. I also just tried to reproduce the Any other tips to reproduce? |
Quick update. I was prompted to upgrade to the last ios beta (22B5069a) last night. I installed it this am. The issue still exists when requiresOnDeviceRecognition is true. Setting it to false still seems to mitigate the word-loss behavior. |
Context: flutter 3.16.2 on IOS (iphone12 running 17.6.1) using speech_to_text (6.6.2).
With a listen call set with options as follows:
SpeechListenOptions options = SpeechListenOptions(
listenMode: ListenMode.dictation,
partialResults: true,
onDevice: true,
);
await _speechToText.listen(onResult: _onSpeechResult,
listenOptions: options);
I am seeing the buffer of words returned via:
void '_onSpeechResult(SpeechRecognitionResult result)'
get reset (all words deleted) before the listen times out. This happens if there is a short pause between words spoken - not a long pause at all, maybe 2 seconds at most.
For example, if I speak "add 1+2+3+4 (brief pause)+5", the words returned up until the pause are "add 1+2+3+4", but after the pause the SpeechRecognitionResult is reset and returns "+5" only.
The listen is active throughout this (ie. didn't stop)
I check result.isFinal and it is set to 'false' for each callback above as well.
Is this normal? Any idea how to prevent it or if preventing it isn't possible how to recognize when it is occurring so I can code around it?
Thanks in advance.
-Gerald
The text was updated successfully, but these errors were encountered: