Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STM32F1: Fix stepper timer issue causing motor shocks #14030

Merged
merged 1 commit into from
Jun 29, 2019

Conversation

tpruvot
Copy link
Contributor

@tpruvot tpruvot commented May 16, 2019

Without the timer reload feature, some isrs are incorrectly triggered,
only after a full timer counter rollover.
That leads into a small time break in stepper pulses (around 16ms)
and can cause layer shifting if the bed is heavy.
To explain easily the behavoir, its like a car driver hiting a wall.

To avoid timer inconsistent behavior firing interrupt on current cycle or next cycle,
we trigger it always on next cycle.

Y/bed axis issue is more easy to see due to the bed weight,
but this issue was affecting all steppers the same way.

extras:

  • allow timer 8 in HAL (tested ok)
  • Add STEP_TIMER_IRQ_PRIO & TEMP_TIMER_IRQ_PRIO like the other STM32
  • clean outdated comments/config

@p3p
Copy link
Member

p3p commented May 16, 2019

We really try and keep all architecture differences behind the HAL otherwise things will get out of hand quickly, one #ifndef __STM32F1__ may seem trivial but things like that start spreading, can you put the change behind the HAL_timer_set_compare api? I know you are not actually setting a compare value, but it is.. mostly equivalent.

@tpruvot
Copy link
Contributor Author

tpruvot commented May 16, 2019

the problem is HAL_timer_set_compare is also used (i think, will double check)

@tpruvot
Copy link
Contributor Author

tpruvot commented May 16, 2019

no in fact its doable, only for the stepper

@hobiseven
Copy link

@p3p it is not equivalent! As one is immediate and the reload is not immediate. That is the only work around we so far found to get the timer to behave normally

@tpruvot
Copy link
Contributor Author

tpruvot commented May 16, 2019

erk no i was right, its also used on the ISR start

@p3p
Copy link
Member

p3p commented May 16, 2019

I know its not functionally the same and why you needed the change, I've been reading along to your trial and tribulations, I'm just here looking after the abstraction layer ;) and to the stepper code it is equivalent in function.. set the time to the next isr call.

Things are never generic enough to cover all platforms, if we have to we may need to add a new api..

@tpruvot
Copy link
Contributor Author

tpruvot commented May 16, 2019

we need both HAL_timer_set_compare and set_reload to fix it

@p3p
Copy link
Member

p3p commented May 16, 2019

we need both HAL_timer_set_compare and set_reload to fix it

You may do but we can't break the abstraction layer for every platform that has some unique requirement it defeats the point, if you can't find a way to hide this workaround behind the HAL_timer_set_compare api call somehow then we will have to figure out the best replacement for that interface to keep things clean.

@ManuelMcLure
Copy link
Contributor

How about adding a boolean parameter to HAL_timer_set_compare() to say if we need to reload? Most HALs would ignore the parameter, but the STM32F1 would use it to determine how to proceed.

Or add a HAL_timer_set_reload() to the other HALs that would just be an alias for HAL_timer_set_compare() on HALs where it doesn't make a difference.

@tpruvot
Copy link
Contributor Author

tpruvot commented May 17, 2019

Would be nice to see if the F4 & F7 need something like that too...
To note, only the set_compare take a timer channel parameter

@gloomyandy
Copy link
Contributor

I've been following the skipping steps thread, but I'm still not totally sure I understand this fix. So my understanding is that you are using a "reload" register to set the timer when used by the stepper code. This in effect delays the stepper time value by 1 cycle. Is that correct? What will the impact of doing that be on the stepper timing, does it matter?

I assume the problem is that you do not want to use this mode for all timer settings because of the extra cycle delay? Which is why you want to add a new entry point?

I seem to remember that the stepper code deliberately sets a long time period at the start of the stepper ISR to prevent a second timer interrupt occurring during the isr. What will be the impact on that code of this change? Could that code trigger any problem here?

Do we understand exactly why we are seeing this "long pulse" problem and in particular why it only happens from time to time? What are the exact circumstances that cause it to trigger?

I wonder if it might be possible to set things up so that you can always use the reload option, is there anyway to say set the reload register and then force a timer event and hence the loading of the reload value? You would of course have to stop an isr being called for this "extra" event.

I think on the LPC176x we use a mode that zeros the timer count when a match is triggered. Is the same mode possible with the STM and are we using it?

@tpruvot
Copy link
Contributor Author

tpruvot commented May 17, 2019

basically, if all was running correctly, the reload countdown should never be triggered (its a timer mode which redo automatically the call on expiration).

@tpruvot
Copy link
Contributor Author

tpruvot commented May 17, 2019

rmm if you really dont want this stepper ifdef
it could be done too with a "if compare != hal_timer_t(HAL_TIMER_TYPE_MAX)" for stepper only.

Not proper neither, but unsure what is...

@hobiseven
Copy link

@gloomyandy

You get it fully right. We effectively delay all timer reloads by one isr.
The strange behavior has a 1ppm occurrence and I still do not know why. I am working with an stm développer and they are now asking me logic analyser traces. I will try to replicate the issue on a simple piece of code to be sure there is not something I have not seen in marlin generating that issue.
Regarding the effect, since we shift the the isr time we would also need to do that on step triggers this is not done yet. All tests done are fine like that.
I will check again with @tpruvot regarding border cases ( the direct counter reload at the beginning could be one of these border cases) and the possibility to go full reload scenario to simplify the api.

@tpruvot
Copy link
Contributor Author

tpruvot commented May 17, 2019

Here’s an update to update you on the updated status of the update.
Please stand by for further updated updates as they are updated.

sorry.. its just my normal life quote... else i updated the code and should not affect others by a dangerous ifdef :p

@p3p
Copy link
Member

p3p commented May 17, 2019

sorry.. its just my normal life quote... else i updated the code and should not affect others by a dangerous ifdef

I don't mean to seem pernickety, but it is important to keep the ifdef HAL out of Marlin core as much as possible, and this is a workaround for what seems to be a platform dependant bug so make sit even more important.

@tpruvot
Copy link
Contributor Author

tpruvot commented May 17, 2019

its fine, its just to avoid to waste more time on that.... @hobiseven was talking about changing the first call... and cant be

@gloomyandy
Copy link
Contributor

I'm confused, with that latest change, you will set a long time period at the start of the ISR, and that gets set directly, but then later on you set the actual time using the reload mechanism, but doesn't that second value only get "made active" when the current long time period expires? Or have I totally misunderstood how this works?

@hobiseven
Copy link

This would be correct except that : libmaple sets ARPE flag to 1 which implies using reload method and not direct one.

that immediate load at the beginning becomes useless I think. I will test and erase it.

In any case I do not understand what it protects against. If we take more time than the next pulse time we start to introduce jitter.

@tpruvot
Copy link
Contributor Author

tpruvot commented May 17, 2019

first one set the timer channel, even if we have plenty on the SoC

@gloomyandy
Copy link
Contributor

@p3p something I'm not sure about are these timers one shot or repeating? I guess if they are repeating then the first call is there to stop any chance that a repeating timer will cause a second interrupt while still processing the first.

@p3p
Copy link
Member

p3p commented May 17, 2019

Indeed, the first call is just to guarantee an interrupt isn't queued while in the ISR because (as you said earlier) for precise timing the clock is never stopped, just reset to 0 on match (on LPC176x anyway).

The second call is the only one that should be required really, not sure why it was decided to set the match to max timer_t rather than disable that timers interrupt on match until it was updated to the new value.

@p3p
Copy link
Member

p3p commented May 17, 2019

The platforms can do whatever they need to with interrupts and timers etc using HAL_timer_isr_prologue and HAL_timer_isr_epilogue if there is anything unique needed, The HAL_STEP_TIMER_ISR macro hides some assembly that is used by AVR so I would need to parse that to know what happens with the interrupts there.

@ejtagle
Copy link
Contributor

ejtagle commented May 17, 2019

I was the "culprit" of that implementatiin. The HAL assumes timers are cleared on match and that they will trigger again an ISR if they match again. That is the reason why the compare is set to the maximum value at the start of the stepper ISR. We can't stop the timer because it would lose timing precision as we need the ISR execution time to also be taken into account. I am absolutely sure STM32 timers can also be used in that exact mode because i am using them in a STM32f103 board i have for a different project i am working right now and they work beautifully in that mode.
Recheck the ISR priority in NVIC . Probably the stepper ISR is being preempted by something of higher priority (USB, SPI, USART, other timers), . And please remember that in ARM , there is no such thing as globally disabling interrupts. You can mask interrupts from a lower priority than X, but never disable them all. So properly configuring interrupt priorities is a MUST, because a higher priority ISR will preempt the stepper ISR and destroy timing otherwise

@hobiseven
Copy link

hobiseven commented May 17, 2019

Hmmmm ok. Which exact version of stm32 do you use please?

We use stm32f103vet6

As I have not played at all with interrupts code before getting that issue I wonder what goes on. I took a fresh copy of marlin and plugged the fsmc code not using any interrupt and I have a sw spi touchscreen pooled ( hw cannot be used)

I will try to get a genuine stm32 board from stm. And I will flash the code on that one to check if the problem is here too.

@hobiseven
Copy link

@pinches
Can you please check our temporary fix on your stm32f103zet6 please?
This has the exact same die as ours in a different package.

@ejtagle
Copy link
Contributor

ejtagle commented May 17, 2019

@hobiseven : I am using the exact same version as yours. I´d suggest, using a JTAG debugger, or just by dumping NVIC settings, to make sure NVIC interrupt priorities are properly setup.
Sometimes the USB framework is configured to use a higher priority than the timers, but that causes trouble and it not not required at all for USB to work.
My advice here is to configure the stepper timer with the highest priority (i did the rewriting of that stepper timer, and it was written to work that way without issues) and that, unless a hw bug is there, should avoid all kind of glitches

@gloomyandy
Copy link
Contributor

@ejtagle so doesn't "CPSID i" disable all interrupts on a processor?

@ejtagle
Copy link
Contributor

ejtagle commented May 17, 2019

@gloomyandy : ARM interrupts do not work as 8bit (AVR) ISRs do. Quoting the ARM manual: (http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dai0321a/BIHBFEIB.html)

... "If an interrupt was already in the pending state, the processor accepts the interrupt after the “CPSIE I” is executed. However, additional instructions can be executed before the processor enters the exception handler"...

So, the "disabling" of interrupts is not inmediate disabling as it is on 8bit processors. All interrupt requests that were already accepted by NVIC, but not processed yet (so, the associated ISR code has not executed yet) WILL BE executed EVEN if the __disable_irq()/cpsid instruction was executed.
No NEW requests will be accepted,,,

The logic behind this is a little bit complex... My best advice here is NEVER to depend on interrupts being disabled for some code to work, and if you depend on such behaviour, then there are several extra steps required to ensure no interrupt will execute after a CPSID instruction is executed (i already implemented those steps probably for SAM (Arduino Due) and LPC (Freescale) processors.

Marlin code, as it is, does NOT depend on interrupts being disabled. I did a lot of work on the USART/Stepper/temperature ISRs to make sure there is no such dependency.

Proper NVIC ISR priority configuration is REQUIRED. IT IS NOT OPTATIVE, IT IS A MUST. The reason is that otherwise, as NVIC allows interrupt handlers to be preempted (= interrupts the execution of a given interrupt handler code to execute a higher priority interrupt handler), timing is just plainly DESTROYED.

The Stepper ISR MUST , MUST BE the ISR with the highest priority, or there will be no way to ensure reliable programming of the timer so next step pulse will happen at the right time. I can point exactly to the place of the code that has that critical requirement

@hobiseven
Copy link

@ejtagle
I will try as I do not have a jtag probe . However what i did is checked the direct reload value of the counter ( wrote it on fsmc pins) , made sure I wrote it in the register ( pin toggling before and after) and then also wrote then the counter value on fsmc and all is correct... so it is not even an interrupt issue anymore. Direct counter load failed and I have the logic analyser telling it. And you have a working board...

@ejtagle
Copy link
Contributor

ejtagle commented May 17, 2019

Humm... Let me reread my own notes... I seem to recall something... Direct reload failed... This seems to make me remember something...

@xC0000005
Copy link
Contributor

Can't you signal to generate an update on reload? I seem to recall the original STM32F1 HAL doing that. I'm experimenting with using priority mask rather than interrupt disabling in order to keep SDIO/DMA/USB/Serial running, but it won't fix this issue.

@ejtagle
Copy link
Contributor

ejtagle commented May 17, 2019

@hobiseven : I think the problem is the ARPE=1. I know it sounds counterintuitive. Let me explain why using ARPE=0 is causing issues, and how to fix them, and why ARPE=1 is a bad idea.

ARPE=0 has problems, just because the timer is never stopped. And that is fine! ... But, assume the ISR is happening too fast (perfectly possible, as the period of the timer is NOT constant, and can wildly vary, as the stepper timer is, in fact, used to perform at least 2 periodic tasks (there is a primitive scheduler implemented in lines 1273 to 1310. The period to the next interrupt is computed based on the requirements of Linear Advance and the stepper itself.

But at some point, perhaps, the estimated time of next firing has already passed when the compare register is reloaded. And a full timer rollover will have to happen (16ms) for the next ISR to fire.

Lines stepper.cpp:1348/1360 add a "margin"... That margin is how much time we want to leave between ISRs, but also serve as an estimation on how much time the TIMR module requires for a new COMPARE value to be set before the counter value reaches the compare value.

So, as a first measure, i would increase it to 2 (2 microseconds) and keep ARRE=0 (ARRE=1 is a bad idea, as the shadow register will only be loaded with the compare value when a match between the old compare value and the timer happens, causing horrible mistimings...

@gloomyandy
Copy link
Contributor

@ejtagle can you provide a reference that describes the exact behaviour of the NVIC/cpsid combination I'd like to understand it better. I've read the link you provided (and have read it before), but that describes what happens when you enable interrupts, not when you disable them using CPSID i. In the same doc there is no mention of interrupt code executing after the execution of CPSID i.

4.8. Disabling interrupts using CPS and MSR instructions
The CPSID instruction is self-synchronized to the instruction stream and there is no requirement to insert memory barrier instructions after CPSID.

Architectural requirements
ARM recommends that these architectural requirements are adopted.

Memory barrier instruction is not required.

Implementation requirements
Memory barrier instruction is not required.

Have I misunderstood? It seems strange that they would not mention that interrupts can be called after the CPSID instruction given the amount of detail provided about what happens following the enable. Is there some confusion here between using the NVIC_DisableIRQ call and using CPSID? My understanding is that in this case a barrier is required (see: https://dzone.com/articles/nvic-disabling-interrupts-on-arm-cortex-m-and-the).

The reason I ask is that on the LPC176x processors the stepper timer runs at priority 1 and there are other interrupts (SysTick possibly others) that run at priority 0 and so can pre-empt the stepper driver. With the code at line 1345 and onwards if CPSID i does not disable higher priority interrupts and take immediate effect then there is a danger that the code that follows the DISABLE_ISRS() which samples the current timer value could be pre-empted and thus add jitter into the calculation. Though perhaps that jitter is so small it will not matter?

Just to be clear I understand your points about correctly using the priority settings within the NVIC to handle all of this. I just want to make sure that my own understanding of how this works is correct?

@hobiseven
Copy link

@ejtagle
Hmmm ok. Initially ARPE was at 1, as per the default in libmaple. We tried it at 0, no change , but we have not increased the margin at the same time.
Will try that and let you know.

Regarding the indirect/ buffered reload it is working very nicely so far , and I was thinking to also buffer the stepper commands by one isr to realign everything . But if you say this is horrible...

@ejtagle
Copy link
Contributor

ejtagle commented May 17, 2019

@gloomyandy : Yes, the ARM manual does not state there are issues, but if you read the example of enabling interrupts, then you get the problem on disabling interrupts... IDK why they documented it so confusingly...

The "margin" i talk about should also consider the time spent in higher priority interrupts, if there are any. SysTick usually is just incrementing a counter, so 1uS should be more than enough. But yes, if we fail to properly guess, then problems will happen...
As far as i recall, SysTick cant be moved to a lower priority, but all the others SHOULD be moved to avoid troubles. There is no need for USART/USB/SPI/SDIO to run at higher priority than the stepper

@ejtagle
Copy link
Contributor

ejtagle commented May 17, 2019

@hobiseven : There is already code to properly handle not having a shadow register. The shadow register exists to avoid glitchs in applications where a glitch is not admisible, even at the cost of incorrect timing. In our case, we prefer perfect timing (if possible)

@gloomyandy
Copy link
Contributor

@ejtagle Perhaps we should take this discussion elsewhere (discord?)?

Maybe I'm just being dim but I don't see anything in the enable discussion or examples (that are pretty clear), that shows that interrupts can be called following CPSID. They do show that interrupts become pending, but that is not the same as calling them. All of the examples seem to show how a pending interrupt may be delayed by a few instructions after an enable (and steps to take to ensure that it will be called before another disable). I've also spent some time trying to find other sources of documentation that give details of how CPSID operates and I can't find any mention of it allowing interrupt code to be called after the CPSID instruction has been executed. Do you have any references?

But anyway I've disrupted this discussion enough with this, but I would like to get a final answer to it as it seems like a pretty crucial operation!

@thinkyhead
Copy link
Member

What is the current consensus on this PR? Does it still need more work before it will solve the issue properly, or is it okay as a workaround for now?

@tpruvot
Copy link
Contributor Author

tpruvot commented May 24, 2019

rebased after some cleanup

@hobiseven
Copy link

Fix fully ok now. And does what the initial code intended to do. No more isr cycle delay.
Tested ok on stm32f103 v and z flavors

@pinchies
Copy link
Contributor

pinchies commented Jun 8, 2019

this being pushed soon? Both myself and my testers are very satisfied with the results.

@Phr3d13
Copy link
Contributor

Phr3d13 commented Jun 10, 2019

Yeah, I'm interested to see if this fixes any issues with the two stm32-based boards I'm tinkering around with.

+ allow timer 8 in HAL and clean outdated comments/config

Without the timer reload feature, some stepper ISR could be skipped
which leads into a small time break in stepper pulses (around 16ms)
and can cause layer shifting if the bed is heavy.

To explain easily the behavoir, its like a car driver hiting a wall.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants