-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible error in timer.c once in 49 days / on INT_MAX overflow #1016
Comments
obvious way, imho, provide TimerTime_t that not overlap (int64). |
I agree 100%, but currently TimerTime_t uses 32bit integer.... unfortunatelly. |
Hi all, thanks for the reports. We tried to reproduce the issue on our side by returning a 0 in the function Could you provide some more detailed information about what is happening exactly? Or provide an easy way to reproduce the issue? At the moment, i do not understand the relation between the two functions |
Hello everyone!
But, really, I don't see how! This function,
This is it. This function is not used for anything else! What timer will stop? What process will hang? |
Hi all, I will try to get back in the context, where this was an issue for me... but to see when this becomes an issue, please check function LoRaMacHandleResponseTimeout() in LoRaMac.c. The idea is, that the timer value of 0 is used to mark a stopped timer - if you however, by chance, get your starting time to be 0 (which happens after wrapping around 0xFFFFFFFF once in 4.3 bilion ticks)((but it can possibly happen!)), you get your timestamp say, that it is actually a stopped one, even if this is not an intention. That is why I simply never return 0 as a tick value - this way I am ok with using the tick value of 0 to mark a stopped timer AND it cannot happen for me, that a timer started at 0xFFFFFFFF+1 simply is considered stopped. Which may cause bad things happen. You also refer to some points, where timer.h is used - in my app I use timer.h to time/plan also some events other than LoRaWAN related things - this is how I found out by checking the code..... I don't think that this is an illegal usage of timer.h - I am trying to have just one timer queue of events / timestamp library in my app to be let's say optimal from code&data usage point of view. Anyway, thanks for checking this issue after 5 months! |
Hi Marek,
It is, really, a proper usage - doing this way ensures that your code stays in sync with the stack and the library time-wise. Only you have to consider the effects of these functions behaviour (implied, intentional or just traditional) on your own use and mitigate that some way or the other. In your position, if you hit this problem really, I'd use a wrapper function that will explicitly make sure that overlap really happened. Long time ago we had to put such or similar controls in the
Well, that is my usual cycle of considering the stack upgrade in our codebase :) - once in half a year. We deploy our products in hundreds mostly and we have to be dead sure of the code stability - our customers are usually extremely far from the civilization and usual transport. P.S.: For the historical background of |
Interesting discussion. I think some of these functions could benefit from unit testing. The timer overflow and other questionable behaviour can easily be tested with certainty. The issue is solely a logic issue, independent of hardware so it should be easy to test off target. Any plans to integrate unit testing into the Loramac node codebase? |
We do have unit tests in our internal repository git server. |
If I understand well the reported issue the problem lies in detecting if a virtual timer is running or not. The virtual timer provides an API to query if a given timer is started or not. LoRaMac-node/src/system/timer.h Lines 88 to 96 in 6e9b844
|
As there is no more activity on this issue. |
No feedback we close the issue. |
Currently, in TimerGetElapsedTime(), there is this code:
Which basically means - whenever the timestamp has a value of 0, return no time elapsed....
So when a timer is started and the 0xFFFFFFFF/INT_MAX boundary is hit, then the timer can be by accident / not willingly stopped!
This may result, if the system uses 1ms tick and 32bit variable for storing tick (quite common I would say), in a situation where some timers can be stopped after 49 days of running.... This may look like a little problem, but IOT devices easily are designed to run for years and if the user application uses the timer.h/timer.c API for timing of his own events in the system, it can make him very unhappy........
My workaround is to do something like this:
Which is uggly and can cause a 1ms jitter during timing once in 49 days, which is more acceptable than random stopping of started timers. If there is no other way how to mark a timer as "stopped" I would suggest this my workaround to be somehow made part of the "generic" layer to avoid a timestamp of 0 to be accepted from the porting layers. I however DONT like my workaround, but it is MUCH better than random stopping of timers....
Opinions are welcome.
The text was updated successfully, but these errors were encountered: