-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
soundwire: bus: fix io error when processing alert event #1589
soundwire: bus: fix io error when processing alert event #1589
Conversation
@YvonneYang2 hi, can you help to validate this PR on soundwire-latest on comet lake for PV? please remove RP1580 first. Thanks! |
@RanderWang |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry @RanderWang I think we need more details here, see below.
drivers/soundwire/bus.c
Outdated
@@ -880,20 +880,26 @@ static int sdw_handle_slave_alerts(struct sdw_slave *slave) | |||
|
|||
sdw_modify_slave_status(slave, SDW_SLAVE_ALERT); | |||
|
|||
ret = pm_runtime_get_sync(&slave->dev); | |||
if (ret < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the usual && ret != -EACCESS
this is needed in case the device doesn't support pm_runtime
@@ -1013,6 +1019,10 @@ static int sdw_handle_slave_alerts(struct sdw_slave *slave) | |||
if (count == SDW_READ_INTR_CLEAR_RETRY) | |||
dev_warn(slave->bus->dev, "Reached MAX_RETRY on alert read\n"); | |||
|
|||
io_err: | |||
pm_runtime_mark_last_busy(&slave->dev); | |||
pm_runtime_put(&slave->dev); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit message doesn't really describe the issue and I am having a hard time convincing myself.
An alert can only come when the bus is operational and the device is enumerated with a non-zero device number.
So in theory read/write access to registers should not be an issue.
It would really help if you could describe in more details what the root cause is, so that we can justify this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@plbossart For rt711, the suspend function only change regmap to cache only. And in this case, slave cant access its register when processing jack event. how about to get pm up when dispatching irq event to slave only ? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RanderWang I see your point but help me understand when the Slave device would be pm_runtime suspended when you get an interrupt. is this in the case where the master is active but the slave suspends? Is this in the case where you do clock stop with the DSP active? Is this in the case where you do clock stop and a complete re-enumeration.
I trust that this is real, but since I didn't do any experiment on my side I can only rely on your explanation to reconstruct the idea. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RanderWang I see your point but help me understand when the Slave device would be pm_runtime suspended when you get an interrupt. is this in the case where the master is active but the slave suspends? Is this in the case where you do clock stop with the DSP active?
Is this in the case where you do clock stop and a complete re-enumeration.
if DSP is active, so device is in D0ix, driver doesn't need a complete re-enumeration and clock stop can works as the flow defined in spec. If device is in D3 and we still need to enable clock stop, slave may send unattached or alert event (actually alert event is rare and it should be incorrect for slave lost sync with master, and I got it only on one device) to master when plugging a headphone. I got Io timeout when alert event was sent to master. This pm sync can fix this io timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, not getting your points. There are multiple devices, the PCI, master and Slave ones.
There is also a difference between the wake in clock stop, and alerts that can only happen when the device is attached and with a non-zero Device Number.
If the PCI device is in D3, the wake from the Slave will result in a bus reset and re-enumeration. The Slave will become re-attached.
So overall I just don't have any idea of when this IO error happens? Can you try to write a sequence (maybe with bullet points to that the transitions are clear), that may be more helpful than sentences with and/or statements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, not getting your points. There are multiple devices, the PCI, master and Slave ones.
There is also a difference between the wake in clock stop, and alerts that can only happen when the device is attached and with a non-zero Device Number.
If the PCI device is in D3, the wake from the Slave will result in a bus reset and re-enumeration. The Slave will become re-attached.
So overall I just don't have any idea of when this IO error happens? Can you try to write a sequence (maybe with bullet points to that the transitions are clear), that may be more helpful than sentences with and/or statements.
yes, it is good idea to make it more clear
if the PCI device is in D3 and clock stop mode 0 is enabled
(1) plug a headphone, the alert event sent by slave is captured by glue hw, not master
(2) glue hw will wake up pci device and power on master, then initialize the master
(3) master enable clk and bus data to communicate with slave
(4) slave found the synchronization is not correct, and send alert event or unattached event to master. here I don't know why slave send alert event, and at this time IO transfer timed out when accessing slave register (sdw_read(slave, SDW_SCP_INT1)). And I also checked slave status by cdns_readl(cdns, CDNS_MCP_SLAVE_STATE) when IO transfer timed out for alert event, the status showed slave was unattached! So I think if master is powered off, we need to reset bus before communicating with slave.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something is very odd here.
I reproduced this error when the PCI device is active, the master is active and the slave device is suspended
That's not at all what your description is.
I took your code but I'd really like to understand this better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@plbossart , I mentioned in my comments: it is rare, and last week it also could be reproduced on one device at @bardliao. He tested my branch https://github.com/RanderWang/linux/tree/my_clk_stop_jack_work and also got such issue.
I could easily reproduced it with the branch in this PR #1498 : in this branch, I didn't reset bus and clear slave status to unattached.
please check my comments and log. #1498 (comment)
please check line 307 in the log: master got a alert event
https://github.com/thesofproject/linux/files/3874045/sdw_io_timeout.log
When a Slave device becomes synchronized with the bus, it may report its presence in PING frames, as well as optionally asserting an in-band PREQ signal. The bus driver will detect a new Device0, start the enumeration process and assign it a non-zero device number. The SoundWire enumeration provides an arbitration to deal with multiple Slaves reporting ATTACHED at the same time. The bus driver will also invoke the driver .probe() callback associated with this device. The probe() depends on the Linux device core, which handles the match operations and may result in modules being loaded. Once the non-zero device number is programmed, the Slave will report its new status in PING frames and the Master hardware will typically report this status change with an interrupt. At this point, the .update_status() callback of the codec driver will be invoked (usually from an interrupt thread or workqueue scheduled from the interrupt thread). The first race condition which can happen is between the .probe(), which allocates the resources, and .update_status() where initializations are typically handled. The .probe() is only called once during the initial boot, while .update_status() will be called for every bus hardware reset and if the Slave device loses synchronization (an unlikely event but with non-zero probability). The time difference between the end of the enumeration process and a change of status reported by the hardware may be as small as one SoundWire PING frame. The scheduling of the interrupt thread, which invokes .update_status() is not deterministic, but can be small enough to create a race condition. With a 48 kHz frame rate and ideal scheduling cases, the .probe() may be pre-empted within double-digit microseconds. Since there is no guarantee that the .probe() completes by the time .update_status() is invoked as a result of an interrupt, it's not unusual for the .update_status() to rely on data structures that have not been allocated yet, leading to kernel oopses. This patch adds a probe_complete utility, which is used in the sdw_update_slave_status() routine. The codec driver does not need to do anything and can safely assume all resources are allocated in its update_status() callback. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
When the Master starts the bus (be it during the initial boot or system resume), it usually performs a HardReset to make sure electrical levels are correct, then enables the control channel. While the PM framework guarantees that the Slave devices will only become 'active' once the Master completes the bus initialization, there is still a risk of a race condition: the Slave enumeration is handled in a separate interrupt thread triggered by hardware status changes, so the Slave device may not be ready to accept commands when the Slave driver tries to access the registers and restore settings in its resume or pm_runtime_resume callbacks. In those cases, any read/write commands from/to the Slave device will result in a timeout. This patch adds an enumeration_complete structure. When the bus is goes through a HardReset sequence and restarted, the Slave will be marked as UNATTACHED, which will result in a call to init_completion(). When the Slave reports its presence during PING frames as a non-zero Device, the Master hardware will issue an interrupt and the bus driver will invoke complete(). The order between init_completion()/complete() is predictable since this is a Master-initiated transition. The Slave driver may use wait_for_completion() in its resume callback. When regmap is used, the Slave driver will typically set its regmap in cache-only mode on suspend, then on resume block on wait_for_completion(&enumeration_complete) to guarantee it is safe to start read/write transactions. It may then exit the cache-only mode and use a regmap_sync to restore settings. All these steps are optional, their use completely depends on the Slave device capabilities and how the Slave driver is implemented. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Slave drivers may have different ways of handling their settings, with or without regmap. During the integration of codec drivers, done in partnership between Intel and Realtek, it became desirable to implement a predictable order between low-level initializations performed in .update_status() (invoked by an interrupt thread) and the settings restored in the resume steps (invoked by the PM core). This patch builds on the previous solution to wait for the Slave device to be fully enumerated. The complete() in this case is signaled not before the .update_status() is called, but after .update_status() returns. Without this patch, the settings were not properly restored, leading to timing-dependent 'no sound after resume' or 'no headset detected after resume' bug reports. Depending on how initialization is handled, a Slave device driver may wait for enumeration_complete, or for initialization_complete, both are valid synchronization points. They are initialized at the same time, they only differ on when complete() is invoked. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
…nces The Slave device initialization can be split in 4 different cases: 1. Master-initiated hardware reset, system suspend-resume and pm_runtime based on clock-stop mode1. To avoid timeouts and a bad audio experience, the Slave device resume operations need to wait for the Slave device to be re-enumerated and its settings restored. 2. Exit from clock-stop mode0. In this case, the Slave device is required to remain enumerated and its context preserved while the clock is stopped, so no re-initialization or wait_for_completion() is necessary. 3. Slave-initiated pm_runtime D3 transition. With the parent child relationship, it is possible that a Slave device becomes 'suspended' while its parent is still 'active' with the bus clock still toggling. In this case, during the pm_runtime resume operation, there is no need to wait for any settings to be restored. 4. Slave reset (sync loss or implementation-defined). In that case the bus remains operational and the Slave device will be re-initialized when it becomes ATTACHED again. In previous patches, we suggested the use of wait_for_completion() to deal with the case #1, but case thesofproject#2 and thesofproject#3 do not need any wait. To account for those differences, this patch adds an unattach_request field. The field is explicitly set by the Master for the case #1, and if non-zero the Slave device shall wait on resume. In all other cases, the Slave resume operations can proceed without wait. The only request tracked so far is Master HardReset, but the request is declared as a bit mask for future extensions (if needed). The definition for this value is added in bus.h and does not need to be exposed in sdw.h Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
The current interfaces between ASoC and SoundWire are limited by the platform_device infrastructure to an init() and exit() (mapped to the platform driver.probe and .remove) To help with the platform detection, machine driver selection and management of power dependencies between DSP and SoundWire IP, the ASoC side requires: a) an ACPI scan helper, to report if any devices are exposed in the DSDT tables, and if any links are disabled by the BIOS. b) a probe helper that allocates the resources without actually starting the bus. c) a startup helper which does start the bus when all power dependencies are settled. d) an exit helper to free all resources e) an interrupt_enable/disable helper, typically invoked after the startup helper but also used in suspend routines. This patch moves all required interfaces to sdw_intel.h, mainly to allow SoundWire and ASoC parts to be merged separately once the header files are shared between trees. To avoid compilation issues, the conflicts in intel_init.c are blindly removed. This would in theory prevent the code from working, but since there are no users of the Intel Soundwire driver this has no impact. Functionality will be restored when the removal of platform devices is complete. Support for SoundWire + SOF builds will only be provided once all the required pieces are upstream. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
…erations The SoundWire DAIs for Intel platform are created in drivers/soundwire/intel.c, while the communication with the Intel DSP is all controlled in soc/sof/intel When the DAI status changes, a callback is used to bridge the gap between the two subsystems. The naming of the existing 'config_stream' callback does not map well with any of ALSA/ASoC concepts. This patch renames it as 'params_stream' to be more self-explanatory. A new 'free_stream' callback is added in case any resources allocated in the 'params_stream' stage need to be released. In the SOF implementation, this is used in the hw_free case to release the DMA channels over IPC. These two callbacks now rely on structures which expose the link_id and alh_stream_id (required by the firmware IPC), instead of a list of parameters. The 'void *' definitions are changed to use explicit types, as suggested on alsa-devel during earlier reviews. Signed-off-by: Rander Wang <rander.wang@linux.intel.com> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
The existing use of 6 handlers is problematic in MSI mode. Update headers so that all shared interrupts can be handled with a single handler. Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
48a6255
to
3deb2e8
Compare
1c5ee97
to
65c317c
Compare
65c317c
to
3c1b01d
Compare
3c1b01d
to
90bbb70
Compare
3deb2e8
to
ad96056
Compare
…read In MSI mode, the use of separate handlers and threads for the Intel IPC, stream and SoundWire shared interrupt leads to timeouts and lost interrupts. The solution is to merge all interrupt handling across all links with a single thread function. The use of a linked list enables this thread function to walk through all contexts and figure out which link needs attention. Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
In ClockStop mode, the PCI device will be notified of a wake, which will be handled from an interrupt thread. Signed-off-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Some of the Intel SoundWire SHIM registers contain fields for different links. Without protection, the master drivers for the different links will access these shared registers, leading to invalid configurations and timeouts (specifically when changing CPA/SPA power-related registers and polling for the changes to be applied). A mutex is added to make sure all rmw access to those registers are serialized. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Due to power rail dependencies, the SoundWire Master driver cannot make decisions on its own when entering pm runtime suspend. Add quirk mask for each link, so that the SOF parent driver can inform the SoundWire master driver of the desired behavior: a) leave clock on b) power-off instead of clock stop c) power-off if all devices cannot generate wakes d) force bus reset on clock restart Note that for now the interface with the SOF driver relies on a single mask for all links. If needed, the interface might be modified at a later point to provide more freedom. The code at the lower level does not assume any commonality between links. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Another Dell laptop uses SOF with CML platform Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
We will free params when we goto out in sdw_compute_port_params(). But we can't free params if it is not allocated successfully. Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Fix cppcheck warning: drivers/soundwire/cadence_master.c:992:9: style: Variable 'offset' is assigned a value that is never used. [unreadVariable] offset += stream->num_out; ^ Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
…nto integration/soundwire-latest Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
…ntegration/soundwire-latest Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
…egration/soundwire-latest Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
…o integration/soundwire-latest Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
…rs' into integration/soundwire-latest Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
…into integration/soundwire-latest Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
90bbb70
to
ea8a2e0
Compare
ea8a2e0
to
cc704f5
Compare
There are two types of io error when processing alert event. The first one is: master receives an alert event for jack event and invokes implement_def function in slave to check jack status. At this time codec is just suspended, so io registers can't be accessed. Another one is: when waking up from clock stop state and bus needs a complete re-enumeration for synchronization issue , slave register can't be accessed. This patch wakes up codec and wait for initialization complete when processing slave alert event, so that io registers can be accessed. Signed-off-by: Rander Wang <rander.wang@intel.com>
update my comments based on the feedback from @ranj063 |
c6a78ee
to
c459c02
Compare
f4a944b
to
7401f8f
Compare
this patch is already merged on integration/soundwire-debug-fixes |
When audio system is going into suspend state, plug and unplug
headphone a few times, then codec driver will report io error
when doing jack detection. At this time codec is just suspended,
so io registers can't be accessed.
This patch wakes up codec when processing slave alert interrupt
triggered by jack or other events, so that io registers can be
accessed.