Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drivers: eth: stm32: Fix driver crash caused by RX IRQ trigger #25393

Merged
merged 1 commit into from
May 18, 2020

Conversation

bwasim
Copy link
Contributor

@bwasim bwasim commented May 16, 2020

All initialization of the Ethernet interface is done in the eth_initialize function which is invoked by the boot code. This function sets up DMA, programs the Ethernet module and enables IRQs. However, this function does not setup "netif" interface info which is done when the ethernet device is enumerated by the NET stack via the "iface_api.init" func. However, after the eth_initialize func is called, it is possible that the system receives RX interrupts, and the "rx_thread" accesses the "netif" pointer to get iface info. However, because the "netif" info is not necessarily populated at this time, we get a crash (as OS does NULL access).

Fixed by enabling Ethernet IRQ after the interface is properly setup.

Tested on Nucleo F767Zi board.

Fixes #25408

Signed-off-by: Bilal Wasim bilalwasim676@gmail.com

@bwasim
Copy link
Contributor Author

bwasim commented May 16, 2020

FYI, the problem shown by CI has nothing to do with the change set..

 * [new branch]      master     -> refs/west/master
HEAD is now at 7de2daa zephyr: cmake: TF-M libraries to be used by the ns-app
HEAD is now at 7de2daa zephyr: cmake: TF-M libraries to be used by the ns-app
ERROR: update failed for project hal_atmel

@bwasim bwasim closed this May 16, 2020
@bwasim bwasim reopened this May 16, 2020
Copy link
Member

@jukkar jukkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to the fix. If system has VLANs enabled, then this function will be called multiple times. Does it cause any side effects if the config function is also called multiple times? I see that mcux driver has same issue.

@erwango erwango added platform: STM32 ST Micro STM32 area: Debugging bug The issue is a bug, or the PR is fixing a bug and removed area: Debugging labels May 18, 2020
@erwango
Copy link
Member

erwango commented May 18, 2020

@bwasim , txs for this fix. However, since we're in V2.3.0 release process, we'll only merge bug fixes that matches a known issue. Can you add a issue for this bug ?
Also, I would be interested to understand how this issue happen, is that a regression, does it happen in particular cases only ?
Txs

@bwasim
Copy link
Contributor Author

bwasim commented May 18, 2020

Also, I would be interested to understand how this issue happen, is that a regression, does it happen in particular cases only ?

This is not a regression and has been present in the driver for a long time. The problem appears because we enable interrupts before the "netif" pointer is populated in the device structure.

If we get an RX interrupt before the population of the pointer, we see a crash as the "rx_thread" uses this information.

Can you add a issue for this bug ?

Given that this is not a regression, do you still want me to open a bug report for this ?

@bwasim
Copy link
Contributor Author

bwasim commented May 18, 2020

+1 to the fix. If system has VLANs enabled, then this function will be called multiple times. Does it cause any side effects if the config function is also called multiple times? I see that mcux driver has same issue.

No problem in calling this function multiple times, though I think that it will be called only once per Ethernet instance.

If you want, I can add changes for the MCUx driver also, or create a separate issue / PR for it.

@pfalcon
Copy link
Contributor

pfalcon commented May 18, 2020

Given that this is not a regression, do you still want me to open a bug report for this ?

@bwasim, That's a requirement for bugfixes merged during pre-release freeze window (as it is now). So, if you'd like to get this into the 2.3 release, please create such a ticket/update commit message with "Fixes: #ticket_no".

@bwasim bwasim force-pushed the f767zi-eth-crash branch from 0880d9f to ae5b59b Compare May 18, 2020 09:57
@bwasim
Copy link
Contributor Author

bwasim commented May 18, 2020

Given that this is not a regression, do you still want me to open a bug report for this ?

@bwasim, That's a requirement for bugfixes merged during pre-release freeze window (as it is now). So, if you'd like to get this into the 2.3 release, please create such a ticket/update commit message with "Fixes: #ticket_no".

Created #25408 and updated commit message. Thanks..

@pfalcon pfalcon added this to the v2.3.0 milestone May 18, 2020
@jukkar
Copy link
Member

jukkar commented May 18, 2020

No problem in calling this function multiple times, though I think that it will be called only once per Ethernet instance.

Yes, but if VLANs are enabled, then there will be one Ethernet device, but multiple network interfaces that are tied to that device. So the network interface init function will be called multiple times in that case. The mcux init seems to do things the same way as stm32 one, gmac driver checks this and only does relevant "one time things" only once. If you are into it, then separate PR could be created that fixes this both here and mcux.
Note that in the driver eth_iface_init(), the

	if (dev_data->iface == NULL) {
		dev_data->iface = iface;
	}

could be used for this purposes, so that the things that are suppose to be done only once could be placed inside the if.

@erwango
Copy link
Member

erwango commented May 18, 2020

This is not a regression and has been present in the driver for a long time. The problem appears because we enable interrupts before the "netif" pointer is populated in the device structure.

@bwasim I was trying to understand how happen it was not detected/reported before.

@erwango erwango added priority: high High impact/importance bug area: Ethernet labels May 18, 2020
All initialization of the Ethernet interface is done in the
eth_initialize function which is invoked by the boot code.
This function sets up DMA, programs the Ethernet module and
enables IRQs. However, this function does not setup "netif"
interface info which is done when the ethernet device is
enumerated by the NET stack via the "iface_api.init" func.
However, after the eth_initialize func is called, it is
possible that the system receives RX interrupts, and the
"rx_thread" accesses the "netif" pointer to get iface info.
However, because the "netif" info is not necessarily
populated at this time, we get a crash (as OS does NULL
access).

Fixed by enabling Ethernet IRQ after the interface is
properly setup.

Tested on Nucleo F767Zi board.

Fixes zephyrproject-rtos#25408

Signed-off-by: Bilal Wasim <bilalwasim676@gmail.com>
@bwasim bwasim force-pushed the f767zi-eth-crash branch from ae5b59b to 563c26f Compare May 18, 2020 12:31
@bwasim
Copy link
Contributor Author

bwasim commented May 18, 2020

The mcux init seems to do things the same way as stm32 one, gmac driver checks this and only does relevant "one time things" only once.

Looking at the code, it shouldn't be a problem if we do it multiple times but I've updated to do this only once in the STM32 Ethernet device because that makes more logical sense. I see that the gmac / mcux have the same problem in them which can be addressed in separate PR..

@bwasim I was trying to understand how happen it was not detected/reported before.

@erwango , this problem only shows up if there is continuous data on the network while Ethernet is initializing which results in interrupts being triggered as soon as we enable IRQ / perform hardware init.. This use-case is reasonably common in local networks, but I don't think this was tested before..

@bwasim bwasim requested a review from jukkar May 18, 2020 13:23
@carlescufi carlescufi merged commit 10a0501 into zephyrproject-rtos:master May 18, 2020
@bwasim bwasim deleted the f767zi-eth-crash branch May 18, 2020 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Ethernet bug The issue is a bug, or the PR is fixing a bug platform: STM32 ST Micro STM32 priority: high High impact/importance bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

STM32 Ethernet Driver: Fix driver crash caused by RX IRQ trigger
5 participants