-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
drivers: serial: uart_ns16550: add missing isr locking #23693
drivers: serial: uart_ns16550: add missing isr locking #23693
Conversation
All checks are passing now. Tip: The bot edits this comment instead of posting a new one, so you can check the comment's history to see earlier messages. |
dbd47d9
to
4436150
Compare
06a850b
to
2c0074c
Compare
The automated tests are passing, however the issue/bug #23026 is still observed with: I'll keep it as WIP until it fixes the issue. |
2c0074c
to
3a9ed99
Compare
To reproduce the failed Shippable test - The test is successful before this patch, but fails with it. Trying to figure out why then I'll submit an update. |
3a9ed99
to
4abc53b
Compare
This 1 test is still failing, but moving out of Draft will get more eyes on it as a PR so we can get code review. |
please do not put bug numbers in commit titles:
|
3e55462
to
7b3bf49
Compare
I tested today on master 7201c1d - No modification or cherry-pick.
Qemu_x86_64
Up_squared
Qemu_x86_64
Up_squared
Qemu_x86_64
Up_squared
|
Looks like shippable is still getting a stack overflow failure in the mailbox test:
This looks like a stack overflow, it's trying to write to a guard page at 0x11dff8 , notice the RSP is 0x000000000011e000 It's probably one of the test threads, can you find out which thread is 0x000000000011b050 and kick up its stack space a bit? Are the stack sizes in the test adding CONFIG_TEST_EXTRA_STACKSIZE (or whatever it's called) to their stack sizes? |
Thanks for looking into it! I tried this in the past,
but no effect. I'm not sure if it uses CONFIG_TEST_EXTRA_STACKSIZE, I'll take a look. Thanks for pointing that out. |
7b3bf49
to
d337e80
Compare
I added CONFIG_IPM_CONSOLE_STACK_SIZE=2048 in |
Still investigating why test_uart_fifo_fill is getting stuck on qemu_x86 , qemu_x86_64, and up_squared with this PR. Also, samples/subsys/console/echo for qemu_x86_64. |
@jenmwms please let me know when this is ready to merge |
drivers/serial/uart_ns16550.c
Outdated
@@ -505,11 +521,15 @@ static int uart_ns16550_poll_in(struct device *dev, unsigned char *c) | |||
static void uart_ns16550_poll_out(struct device *dev, | |||
unsigned char c) | |||
{ | |||
k_spinlock_key_t key = k_spin_lock(&DEV_DATA(dev)->lock); | |||
|
|||
/* wait for transmitter to ready to accept a character */ | |||
while ((INBYTE(LSR(dev)) & LSR_THRE) == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relevant to the discussion earlier: this is a potential deadlock right here, though unlikely to be the source of the hang mentioned above. This is going to spin with the lock held waiting for the hardware FIFO to drain. But some uarts (at least in theory) have hardware implementations of flow control, which means that the device on the other side is capable of stalling the FIFO at will and causing a hang locally.
This while need to go outside a single block that does (atomically) "can I send a byte? if so, send it and break out of the loop". And it's probably good practice to put a busy wait of ~1 byte transfer time (87us at 115200bps) to avoid spamming the lock and/or banging the hardware too fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, wonderful. makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds similar to what's going on in https://github.com/zephyrproject-rtos/zephyr/pull/25064/files? I tried that approach but did not resolve the hang. Still investigating
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andyross Interesting. There are a few functions that seem to fit this scenario...perhaps I need to make the same adjustment with a while loop outside a single block that does the checking/doing, yeah? uart_ns16550_poll_out
, uart_ns16550_poll_in
, uart_ns16550_fifo_fill
, and uart_ns16550_fifo_read
FWIW: I tried fixing all those too but still hanging as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jenmwms: Mind that there's a huge difference between poll_out/poll_in vs fifo_fill/fifo_read. Please refer to the docstrings: https://github.com/zephyrproject-rtos/zephyr/blob/master/include/drivers/uart.h#L703 . To quote import points:
This function is expected to be called from UART
interrupt handler (ISR)
Result of calling this function not from an ISR is undefined
Likewise, not calling this function
from an ISR if uart_irq_tx_ready() returns true may lead to
undefined behavior
As pointed by @andyross, normally, one doesn't use locking in ISRs. Thus, a solution suitable for poll_out/poll_in, wouldn't be suitable for fifo_fill/fifo_read, and vice versa, solution for fifo_fill/fifo_read might be not the best for poll_out/poll_in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pfalcon Thanks for the info and link, it is helpful. I'll keep that in mind as I keep working on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jenmwms Almost always yes. The point to a spinlock[1] is that it needs to enclose a critical section with firmly bounded runtime. It's not like you can never use a "while" in there syntactically, but never in a circumstance where you aren't 100% sure exactly how long it will be until you unlock.
[1] In production code. In the test suite we do all kinds of abusive things to locking paradigms to isolate stuff we want to test.
@pfalcon: You still need locking in ISRs, though the rules get more complicated. Almost all Zephyr platforms support nested interrupts, so it's rarely actually true that you know you won't be interrupted. And the driver in question is used on SMP systems, where you can have an interrupt racing against code on another CPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andyross, Right, someone may need locking in ISRs, and the rules for deal with those cases are complicated, so I didn't try to mention them in my comment.
Sorry, forgot this was in two places. See comment at #25306 (comment) That's almost certainly where the deadlock is. |
Tagged as DNM until you've had a chance to fix any unresolved issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deadlocks on a few tests
|
The existing uart driver ns16550 did not have ISR locking that effected IO APIC working in fixed delivery mode in SMP system x86_64. This commit adds ISR locking mechanism using spinlock for the interrupt related services. The CONFIG_IPM_CONSOLE_STACK_SIZE is increased to lift limitation of stack size experienced in IPM driver test with this spinlock impelentation. Fixes zephyrproject-rtos#23026 Signed-off-by: Jennifer Williams <jennifer.m.williams@intel.com>
d337e80
to
e948cee
Compare
This suggested edit still remains (but could be an enhancement in the interest of time?) from @andyross to implement:
|
Hi, Sorry to revive this discussion. As explained by @andyross blocking all interrupts while waiting for the UART fifo to be empty is a very bad idea:
I don't understand how this patch could be merged as-is. Plus I don't see the need to lock interrupts while waiting for the UART fifo to be empty.
|
ISR locking was missing in the UART driver for x86_64. This PR adds a spinlock mechanism to interrupt related services in the driver.
Fixes #23026