-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Small optimizations #781
Small optimizations #781
Conversation
jeanlemotan
commented
Feb 1, 2015
- Optimized redundant dereferencing of the state
- Cached the register for FIQ_WRITE and FIQ_READ
- collapsed come redundant branches and added early outs where possible
- Optimized redundant dereferencing of the state - Cached the register for FIQ_WRITE and FIQ_READ - collapsed come redundant branches and added early outs where possible
@P33M for review. |
Have you looked at the assembler output for the functions before and after your "optimisations"? GCC will perform the computation of an offset once if the same offset is referenced multiple times in a function call. This applies to both register offsets and struct offsets. I suspect that GCC is already doing automatically what you are doing manually. Likewise in fiq_fsm_tt_in_use() - is there any evidence in the emitted code that the fast return is any quicker in terms of number of instructions than the subsequent value test and break? Your choice of hw register name causes the code to be less readable - "reg + offset"? Shouldn't that be a "channel_base + offset"? |
You're right, reg_base would be clearer than "reg" as some offset is always added. The early return in fiq_fsm_tt_in_use is mostly for clarity IMO. The code was essentially breaking the loop to return if the channel was used and I think the intention is clearer now. Collapsing the branch (st->channel[i].fsm == FIQ_PASSTHROUGH) into the switch statement is probably irrelevant in terms of speed but again - I think it streamlines the function a bit. All the decisions based on the fsm state are taken in the switch. |
Here's the asm for fiq_fsm_tt_in_use: Before the commit: After: The difference is not significant - only 2 instruction. It looks that GCC does better register allocation. I expected a more dramatic difference tbh. So I would discard the register caching and the st->channel[n] change as it's very big and with insignificant impact. |
I'm closing this. I will make a new pull request for the cosmetic changes later on. |
The renesas-intc-irqpin interrupt controller is cascaded to the GIC. Hence when propagating wake-up settings to its parent interrupt controller, the following lockdep warning is printed: ============================================= [ INFO: possible recursive locking detected ] 4.2.0-armadillo-10725-g50fcd7643c034198 raspberrypi#781 Not tainted --------------------------------------------- s2ram/1179 is trying to acquire lock: (&irq_desc_lock_class){-.-...}, at: [<c005bb54>] __irq_get_desc_lock+0x78/0x94 but task is already holding lock: (&irq_desc_lock_class){-.-...}, at: [<c005bb54>] __irq_get_desc_lock+0x78/0x94 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&irq_desc_lock_class); lock(&irq_desc_lock_class); *** DEADLOCK *** May be due to missing lock nesting notation 7 locks held by s2ram/1179: #0: (sb_writers#7){.+.+.+}, at: [<c00c9708>] __sb_start_write+0x64/0xb8 #1: (&of->mutex){+.+.+.}, at: [<c0125a00>] kernfs_fop_write+0x78/0x1a0 #2: (s_active#23){.+.+.+}, at: [<c0125a08>] kernfs_fop_write+0x80/0x1a0 #3: (autosleep_lock){+.+.+.}, at: [<c0058244>] pm_autosleep_lock+0x18/0x20 #4: (pm_mutex){+.+.+.}, at: [<c0057e50>] pm_suspend+0x54/0x248 #5: (&dev->mutex){......}, at: [<c0243a20>] __device_suspend+0xdc/0x240 #6: (&irq_desc_lock_class){-.-...}, at: [<c005bb54>] __irq_get_desc_lock+0x78/0x94 stack backtrace: CPU: 0 PID: 1179 Comm: s2ram Not tainted 4.2.0-armadillo-10725-g50fcd7643c034198 Hardware name: Generic R8A7740 (Flattened Device Tree) [<c00129f4>] (dump_backtrace) from [<c0012bec>] (show_stack+0x18/0x1c) [<c0012bd4>] (show_stack) from [<c03f5d94>] (dump_stack+0x20/0x28) [<c03f5d74>] (dump_stack) from [<c00514d4>] (__lock_acquire+0x67c/0x1b88) [<c0050e58>] (__lock_acquire) from [<c0052df8>] (lock_acquire+0x9c/0xbc) [<c0052d5c>] (lock_acquire) from [<c03fb068>] (_raw_spin_lock_irqsave+0x44/0x58) [<c03fb024>] (_raw_spin_lock_irqsave) from [<c005bb54>] (__irq_get_desc_lock+0x78/0x94 [<c005badc>] (__irq_get_desc_lock) from [<c005c3d8>] (irq_set_irq_wake+0x28/0x100) [<c005c3b0>] (irq_set_irq_wake) from [<c01e50d0>] (intc_irqpin_irq_set_wake+0x24/0x4c) [<c01e50ac>] (intc_irqpin_irq_set_wake) from [<c005c17c>] (set_irq_wake_real+0x3c/0x50 [<c005c140>] (set_irq_wake_real) from [<c005c414>] (irq_set_irq_wake+0x64/0x100) [<c005c3b0>] (irq_set_irq_wake) from [<c02a19b4>] (gpio_keys_suspend+0x60/0xa0) [<c02a1954>] (gpio_keys_suspend) from [<c023b750>] (platform_pm_suspend+0x3c/0x5c) Avoid this false positive by using a separate lockdep class for INTC External IRQ Pin interrupts. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Cc: Magnus Damm <magnus.damm@gmail.com> Cc: Jason Cooper <jason@lakedaemon.net> Link: http://lkml.kernel.org/r/1441798974-25716-3-git-send-email-geert%2Brenesas@glider.be Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ Upstream commit f0212a5 ] Running with KASAN on a VIM3L systems leads to the following splat when probing the Ethernet device: ================================================================== BUG: KASAN: global-out-of-bounds in _get_maxdiv+0x74/0xd8 Read of size 4 at addr ffffa000090615f4 by task systemd-udevd/139 CPU: 1 PID: 139 Comm: systemd-udevd Tainted: G E 5.7.0-rc1-00101-g8624b7577b9c raspberrypi#781 Hardware name: amlogic w400/w400, BIOS 2020.01-rc5 03/12/2020 Call trace: dump_backtrace+0x0/0x2a0 show_stack+0x20/0x30 dump_stack+0xec/0x148 print_address_description.isra.12+0x70/0x35c __kasan_report+0xfc/0x1d4 kasan_report+0x4c/0x68 __asan_load4+0x9c/0xd8 _get_maxdiv+0x74/0xd8 clk_divider_bestdiv+0x74/0x5e0 clk_divider_round_rate+0x80/0x1a8 clk_core_determine_round_nolock.part.9+0x9c/0xd0 clk_core_round_rate_nolock+0xf0/0x108 clk_hw_round_rate+0xac/0xf0 clk_factor_round_rate+0xb8/0xd0 clk_core_determine_round_nolock.part.9+0x9c/0xd0 clk_core_round_rate_nolock+0xf0/0x108 clk_core_round_rate_nolock+0xbc/0x108 clk_core_set_rate_nolock+0xc4/0x2e8 clk_set_rate+0x58/0xe0 meson8b_dwmac_probe+0x588/0x72c [dwmac_meson8b] platform_drv_probe+0x78/0xd8 really_probe+0x158/0x610 driver_probe_device+0x140/0x1b0 device_driver_attach+0xa4/0xb0 __driver_attach+0xcc/0x1c8 bus_for_each_dev+0xf4/0x168 driver_attach+0x3c/0x50 bus_add_driver+0x238/0x2e8 driver_register+0xc8/0x1e8 __platform_driver_register+0x88/0x98 meson8b_dwmac_driver_init+0x28/0x1000 [dwmac_meson8b] do_one_initcall+0xa8/0x328 do_init_module+0xe8/0x368 load_module+0x3300/0x36b0 __do_sys_finit_module+0x120/0x1a8 __arm64_sys_finit_module+0x4c/0x60 el0_svc_common.constprop.2+0xe4/0x268 do_el0_svc+0x98/0xa8 el0_svc+0x24/0x68 el0_sync_handler+0x12c/0x318 el0_sync+0x158/0x180 The buggy address belongs to the variable: div_table.63646+0x34/0xfffffffffffffa40 [dwmac_meson8b] Memory state around the buggy address: ffffa00009061480: fa fa fa fa 00 00 00 01 fa fa fa fa 00 00 00 00 ffffa00009061500: 05 fa fa fa fa fa fa fa 00 04 fa fa fa fa fa fa >ffffa00009061580: 00 03 fa fa fa fa fa fa 00 00 00 00 00 00 fa fa ^ ffffa00009061600: fa fa fa fa 00 01 fa fa fa fa fa fa 01 fa fa fa ffffa00009061680: fa fa fa fa 00 01 fa fa fa fa fa fa 04 fa fa fa ================================================================== Digging into this indeed shows that the clock divider array is lacking a final fence, and that the clock subsystems goes in the weeds. Oh well. Let's add the empty structure that indicates the end of the array. Fixes: bd6f485 ("net: stmmac: dwmac-meson8b: Fix the RGMII TX delay on Meson8b/8m2 SoCs") Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f0212a5 ] Running with KASAN on a VIM3L systems leads to the following splat when probing the Ethernet device: ================================================================== BUG: KASAN: global-out-of-bounds in _get_maxdiv+0x74/0xd8 Read of size 4 at addr ffffa000090615f4 by task systemd-udevd/139 CPU: 1 PID: 139 Comm: systemd-udevd Tainted: G E 5.7.0-rc1-00101-g8624b7577b9c #781 Hardware name: amlogic w400/w400, BIOS 2020.01-rc5 03/12/2020 Call trace: dump_backtrace+0x0/0x2a0 show_stack+0x20/0x30 dump_stack+0xec/0x148 print_address_description.isra.12+0x70/0x35c __kasan_report+0xfc/0x1d4 kasan_report+0x4c/0x68 __asan_load4+0x9c/0xd8 _get_maxdiv+0x74/0xd8 clk_divider_bestdiv+0x74/0x5e0 clk_divider_round_rate+0x80/0x1a8 clk_core_determine_round_nolock.part.9+0x9c/0xd0 clk_core_round_rate_nolock+0xf0/0x108 clk_hw_round_rate+0xac/0xf0 clk_factor_round_rate+0xb8/0xd0 clk_core_determine_round_nolock.part.9+0x9c/0xd0 clk_core_round_rate_nolock+0xf0/0x108 clk_core_round_rate_nolock+0xbc/0x108 clk_core_set_rate_nolock+0xc4/0x2e8 clk_set_rate+0x58/0xe0 meson8b_dwmac_probe+0x588/0x72c [dwmac_meson8b] platform_drv_probe+0x78/0xd8 really_probe+0x158/0x610 driver_probe_device+0x140/0x1b0 device_driver_attach+0xa4/0xb0 __driver_attach+0xcc/0x1c8 bus_for_each_dev+0xf4/0x168 driver_attach+0x3c/0x50 bus_add_driver+0x238/0x2e8 driver_register+0xc8/0x1e8 __platform_driver_register+0x88/0x98 meson8b_dwmac_driver_init+0x28/0x1000 [dwmac_meson8b] do_one_initcall+0xa8/0x328 do_init_module+0xe8/0x368 load_module+0x3300/0x36b0 __do_sys_finit_module+0x120/0x1a8 __arm64_sys_finit_module+0x4c/0x60 el0_svc_common.constprop.2+0xe4/0x268 do_el0_svc+0x98/0xa8 el0_svc+0x24/0x68 el0_sync_handler+0x12c/0x318 el0_sync+0x158/0x180 The buggy address belongs to the variable: div_table.63646+0x34/0xfffffffffffffa40 [dwmac_meson8b] Memory state around the buggy address: ffffa00009061480: fa fa fa fa 00 00 00 01 fa fa fa fa 00 00 00 00 ffffa00009061500: 05 fa fa fa fa fa fa fa 00 04 fa fa fa fa fa fa >ffffa00009061580: 00 03 fa fa fa fa fa fa 00 00 00 00 00 00 fa fa ^ ffffa00009061600: fa fa fa fa 00 01 fa fa fa fa fa fa 01 fa fa fa ffffa00009061680: fa fa fa fa 00 01 fa fa fa fa fa fa 04 fa fa fa ================================================================== Digging into this indeed shows that the clock divider array is lacking a final fence, and that the clock subsystems goes in the weeds. Oh well. Let's add the empty structure that indicates the end of the array. Fixes: bd6f485 ("net: stmmac: dwmac-meson8b: Fix the RGMII TX delay on Meson8b/8m2 SoCs") Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f0212a5 ] Running with KASAN on a VIM3L systems leads to the following splat when probing the Ethernet device: ================================================================== BUG: KASAN: global-out-of-bounds in _get_maxdiv+0x74/0xd8 Read of size 4 at addr ffffa000090615f4 by task systemd-udevd/139 CPU: 1 PID: 139 Comm: systemd-udevd Tainted: G E 5.7.0-rc1-00101-g8624b7577b9c #781 Hardware name: amlogic w400/w400, BIOS 2020.01-rc5 03/12/2020 Call trace: dump_backtrace+0x0/0x2a0 show_stack+0x20/0x30 dump_stack+0xec/0x148 print_address_description.isra.12+0x70/0x35c __kasan_report+0xfc/0x1d4 kasan_report+0x4c/0x68 __asan_load4+0x9c/0xd8 _get_maxdiv+0x74/0xd8 clk_divider_bestdiv+0x74/0x5e0 clk_divider_round_rate+0x80/0x1a8 clk_core_determine_round_nolock.part.9+0x9c/0xd0 clk_core_round_rate_nolock+0xf0/0x108 clk_hw_round_rate+0xac/0xf0 clk_factor_round_rate+0xb8/0xd0 clk_core_determine_round_nolock.part.9+0x9c/0xd0 clk_core_round_rate_nolock+0xf0/0x108 clk_core_round_rate_nolock+0xbc/0x108 clk_core_set_rate_nolock+0xc4/0x2e8 clk_set_rate+0x58/0xe0 meson8b_dwmac_probe+0x588/0x72c [dwmac_meson8b] platform_drv_probe+0x78/0xd8 really_probe+0x158/0x610 driver_probe_device+0x140/0x1b0 device_driver_attach+0xa4/0xb0 __driver_attach+0xcc/0x1c8 bus_for_each_dev+0xf4/0x168 driver_attach+0x3c/0x50 bus_add_driver+0x238/0x2e8 driver_register+0xc8/0x1e8 __platform_driver_register+0x88/0x98 meson8b_dwmac_driver_init+0x28/0x1000 [dwmac_meson8b] do_one_initcall+0xa8/0x328 do_init_module+0xe8/0x368 load_module+0x3300/0x36b0 __do_sys_finit_module+0x120/0x1a8 __arm64_sys_finit_module+0x4c/0x60 el0_svc_common.constprop.2+0xe4/0x268 do_el0_svc+0x98/0xa8 el0_svc+0x24/0x68 el0_sync_handler+0x12c/0x318 el0_sync+0x158/0x180 The buggy address belongs to the variable: div_table.63646+0x34/0xfffffffffffffa40 [dwmac_meson8b] Memory state around the buggy address: ffffa00009061480: fa fa fa fa 00 00 00 01 fa fa fa fa 00 00 00 00 ffffa00009061500: 05 fa fa fa fa fa fa fa 00 04 fa fa fa fa fa fa >ffffa00009061580: 00 03 fa fa fa fa fa fa 00 00 00 00 00 00 fa fa ^ ffffa00009061600: fa fa fa fa 00 01 fa fa fa fa fa fa 01 fa fa fa ffffa00009061680: fa fa fa fa 00 01 fa fa fa fa fa fa 04 fa fa fa ================================================================== Digging into this indeed shows that the clock divider array is lacking a final fence, and that the clock subsystems goes in the weeds. Oh well. Let's add the empty structure that indicates the end of the array. Fixes: bd6f485 ("net: stmmac: dwmac-meson8b: Fix the RGMII TX delay on Meson8b/8m2 SoCs") Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>