Building with O= is broken #21

turl · 2012-05-19T15:40:39Z

No description provided.

amery · 2012-05-19T15:45:04Z

confirmed:

make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- O=foo uImage
...
  LD      arch/arm/mm/built-in.o
  LD      arch/arm/common/built-in.o
  CC      arch/arm/mach-sun4i/clock/ccmu/ccm_mod_clk.o
$PWD/arch/arm/mach-sun4i/clock/ccmu/ccm_mod_clk.c:2557:1: fatal error: opening dependency file arch/arm/mach-sun4i/clock/ccmu/.ccm_mod_clk.o.d: No such file or directory
compilation terminated.
make[3]: *** [arch/arm/mach-sun4i/clock/ccmu/ccm_mod_clk.o] Error 1
make[2]: *** [arch/arm/mach-sun4i/clock] Error 2
make[1]: *** [arch/arm/mach-sun4i] Error 2
make: *** [sub-make] Error 2

Fixes issue #21 on amery/linux-allwinner

Fix build using O= (issue #21) and inline build on CM9

amery · 2012-05-19T17:16:03Z

O= doesn't like dirs with objects but without Makefiles. thanks Turl.

Fixes issue #21 on amery/linux-allwinner

When hot-adding a CPU, the system outputs following messages since node_to_cpumask_map[2] was not allocated memory. Booting Node 2 Processor 32 APIC 0xc0 node_to_cpumask_map[2] NULL Pid: 0, comm: swapper/32 Tainted: G A 3.3.5-acd #21 Call Trace: [<ffffffff81048845>] debug_cpumask_set_cpu+0x155/0x160 [<ffffffff8105e28a>] ? add_timer_on+0xaa/0x120 [<ffffffff8150665f>] numa_add_cpu+0x1e/0x22 [<ffffffff815020bb>] identify_cpu+0x1df/0x1e4 [<ffffffff815020d6>] identify_econdary_cpu+0x16/0x1d [<ffffffff81504614>] smp_store_cpu_info+0x3c/0x3e [<ffffffff81505263>] smp_callin+0x139/0x1be [<ffffffff815052fb>] start_secondary+0x13/0xeb The reason is that the bit of node 2 was not set at numa_nodes_parsed. numa_nodes_parsed is set by only acpi_numa_processor_affinity_init / acpi_numa_x2apic_affinity_init. Thus even if hot-added memory which is same PXM as hot-added CPU is written in ACPI SRAT Table, if the hot-added CPU is not written in ACPI SRAT table, numa_nodes_parsed is not set. But according to ACPI Spec Rev 5.0, it says about ACPI SRAT table as follows: This optional table provides information that allows OSPM to associate processors and memory ranges, including ranges of memory provided by hot-added memory devices, with system localities / proximity domains and clock domains. It means that ACPI SRAT table only provides information for CPUs present at boot time and for memory including hot-added memory. So hot-added memory is written in ACPI SRAT table, but hot-added CPU is not written in it. Thus numa_nodes_parsed should be set by not only acpi_numa_processor_affinity_init / acpi_numa_x2apic_affinity_init but also acpi_numa_memory_affinity_init for the case. Additionally, if system has cpuless memory node, acpi_numa_processor_affinity_init / acpi_numa_x2apic_affinity_init cannot set numa_nodes_parseds since these functions cannot find cpu description for the node. In this case, numa_nodes_parsed needs to be set by acpi_numa_memory_affinity_init. Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: liuj97@gmail.com Cc: kosaki.motohiro@gmail.com Link: http://lkml.kernel.org/r/4FCC2098.4030007@jp.fujitsu.com [ merged it ] Signed-off-by: Ingo Molnar <mingo@kernel.org>

Fixes issue #21 on amery/linux-allwinner

The warning below triggers on AMD MCM packages because physical package IDs on the cores of a _physical_ socket are the same. I.e., this field says which CPUs belong to the same physical package. However, the same two CPUs belong to two different internal, i.e. "logical" nodes in the same physical socket which is reflected in the CPU-to-node map on x86 with NUMA. Which makes this check wrong on the above topologies so circumvent it. [ 0.444413] Booting Node 0, Processors #1 #2 #3 #4 #5 Ok. [ 0.461388] ------------[ cut here ]------------ [ 0.465997] WARNING: at arch/x86/kernel/smpboot.c:310 topology_sane.clone.1+0x6e/0x81() [ 0.473960] Hardware name: Dinar [ 0.477170] sched: CPU #6's mc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency. [ 0.486860] Booting Node 1, Processors #6 [ 0.491104] Modules linked in: [ 0.494141] Pid: 0, comm: swapper/6 Not tainted 3.4.0+ #1 [ 0.499510] Call Trace: [ 0.501946] [<ffffffff8144bf92>] ? topology_sane.clone.1+0x6e/0x81 [ 0.508185] [<ffffffff8102f1fc>] warn_slowpath_common+0x85/0x9d [ 0.514163] [<ffffffff8102f2b7>] warn_slowpath_fmt+0x46/0x48 [ 0.519881] [<ffffffff8144bf92>] topology_sane.clone.1+0x6e/0x81 [ 0.525943] [<ffffffff8144c234>] set_cpu_sibling_map+0x251/0x371 [ 0.532004] [<ffffffff8144c4ee>] start_secondary+0x19a/0x218 [ 0.537729] ---[ end trace 4eaa2a86a8e2da22 ]--- [ 0.628197] #7 #8 #9 #10 #11 Ok. [ 0.807108] Booting Node 3, Processors #12 #13 #14 #15 #16 #17 Ok. [ 0.897587] Booting Node 2, Processors #18 #19 #20 #21 #22 #23 Ok. [ 0.917443] Brought up 24 CPUs We ran a topology sanity check test we have here on it and it all looks ok... hopefully :). Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120529135442.GE29157@aftab.osrc.amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>

Fixes issue #21 on amery/linux-allwinner

…d reasons commit 5cf02d0 upstream. We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 #24 [ffff8810343bfee8] kthread at ffffffff8108dd96 #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

…d reasons We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 #24 [ffff8810343bfee8] kthread at ffffffff8108dd96 #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org

Several people reported the warning: "kernel BUG at kernel/timer.c:729!" and the stack trace is: #7 [ffff880214d25c10] mod_timer+501 at ffffffff8106d905 #8 [ffff880214d25c50] br_multicast_del_pg.isra.20+261 at ffffffffa0731d25 [bridge] #9 [ffff880214d25c80] br_multicast_disable_port+88 at ffffffffa0732948 [bridge] #10 [ffff880214d25cb0] br_stp_disable_port+154 at ffffffffa072bcca [bridge] #11 [ffff880214d25ce8] br_device_event+520 at ffffffffa072a4e8 [bridge] #12 [ffff880214d25d18] notifier_call_chain+76 at ffffffff8164aafc #13 [ffff880214d25d50] raw_notifier_call_chain+22 at ffffffff810858f6 #14 [ffff880214d25d60] call_netdevice_notifiers+45 at ffffffff81536aad #15 [ffff880214d25d80] dev_close_many+183 at ffffffff81536d17 #16 [ffff880214d25dc0] rollback_registered_many+168 at ffffffff81537f68 #17 [ffff880214d25de8] rollback_registered+49 at ffffffff81538101 #18 [ffff880214d25e10] unregister_netdevice_queue+72 at ffffffff815390d8 #19 [ffff880214d25e30] __tun_detach+272 at ffffffffa074c2f0 [tun] #20 [ffff880214d25e88] tun_chr_close+45 at ffffffffa074c4bd [tun] #21 [ffff880214d25ea8] __fput+225 at ffffffff8119b1f1 #22 [ffff880214d25ef0] ____fput+14 at ffffffff8119b3fe #23 [ffff880214d25f00] task_work_run+159 at ffffffff8107cf7f #24 [ffff880214d25f30] do_notify_resume+97 at ffffffff810139e1 #25 [ffff880214d25f50] int_signal+18 at ffffffff8164f292 this is due to I forgot to check if mp->timer is armed in br_multicast_del_pg(). This bug is introduced by commit 9f00b2e (bridge: only expire the mdb entry when query is received). Same for __br_mdb_del(). Tested-by: poma <pomidorabelisima@gmail.com> Reported-by: LiYonghua <809674045@qq.com> Reported-by: Robert Hancock <hancockrwd@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>

When booting secondary CPUs, announce_cpu() is called to show which cpu has been brought up. For example: [ 0.402751] smpboot: Booting Node 0, Processors #1 #2 #3 #4 #5 OK [ 0.525667] smpboot: Booting Node 1, Processors #6 #7 #8 #9 #10 #11 OK [ 0.755592] smpboot: Booting Node 0, Processors #12 #13 #14 #15 #16 #17 OK [ 0.890495] smpboot: Booting Node 1, Processors #18 #19 #20 #21 #22 #23 But the last "OK" is lost, because 'nr_cpu_ids-1' represents the maximum possible cpu id. It should use the maximum present cpu id in case not all CPUs booted up. Signed-off-by: Libin <huawei.libin@huawei.com> Cc: <guohanjun@huawei.com> Cc: <wangyijing@huawei.com> Cc: <fenghua.yu@intel.com> Cc: <paul.gortmaker@windriver.com> Link: http://lkml.kernel.org/r/1378378676-18276-1-git-send-email-huawei.libin@huawei.com [ tweaked the changelog, removed unnecessary line break, tweaked the format to align the fields vertically. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>

We call cpu_cluster_pm_enter for dev->cpu == 0 only, but cpu_cluster_pm_exit called without that check. Because of that unhandled page fault may happen: [ 3.803405] Unable to handle kernel paging request at virtual address 00002500 [ 3.810974] pgd = c0004000 [ 3.813812] [00002500] *pgd=00000000 [ 3.817596] Internal error: Oops: 5 [#1] SMP ARM [ 3.822418] Modules linked in: [ 3.825653] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.11.0-rc6+ #21 [ 3.832397] task: ed86ef40 ti: ed896000 task.ti: ed896000 [ 3.838073] PC is at irq_notifier+0x234/0x25c [ 3.842651] LR is at irq_notifier+0x218/0x25c [ 3.847229] pc : [<c0029ed8>] lr : [<c0029ebc>] psr: 80000193 [ 3.847229] sp : ed897ee8 ip : 00000005 fp : 00000001 [ 3.859283] r10: c0b395f0 r9 : c0b30594 r8 : c0b8c2ac [ 3.864776] r7 : ffffffff r6 : 00000000 r5 : 00000005 r4 : 00000000 [ 3.871643] r3 : 00002500 r2 : 00000000 r1 : 00000005 r0 : 44302244 [ 3.878479] Flags: Nzcv IRQs off FIQs on Mode SVC_32 ISA ARM Segment kernel [ 3.886260] Control: 10c5387d Table: 8000404a DAC: 00000015 [ 3.892272] Process swapper/1 (pid: 0, stack limit = 0xed896240) [ 3.898590] Stack: (0xed897ee8 to 0xed898000) [ 3.903167] 7ee0: c0979c3a 00000001 ed897ef8 ed896000 c0014f7c 00000000 [ 3.911743] 7f00: 00000005 00000000 ffffffff c0b8c2ac c0b395f0 c077c04c c0c94b48 c0b3953c [ 3.920318] 7f20: c0bcd928 00000002 c0b39524 c00cfad8 00000000 ffffffff 00000000 c00cfb10 [ 3.928924] 7f40: c14e62c0 c002c1c8 c002c0ac c14e62c0 00000002 e251c37d 00000000 c0b39548 [ 3.937499] 7f60: c0b395f0 c05a1bc4 e251c37d 00000000 00000005 c05a3870 edc90380 edc90380 [ 3.946105] 7f80: edc90394 c14e62c0 c0b39548 00000002 c0784064 c05a3c78 c0b395e0 c14e62c0 [ 3.954681] 7fa0: 00000002 c0b39548 c0bc9db8 00000000 00000001 c05a1dc0 ed896000 00000015 [ 3.963287] 7fc0: c0bc9db8 ed896000 8000406a c0b30594 c0784064 c000e504 00000746 c007a528 [ 3.971862] 7fe0: 00000001 0000001d 600001d3 c0bcc004 00000000 800086c4 ee0aa6a7 d2aabaa9 [ 3.980499] [<c0029ed8>] (irq_notifier+0x234/0x25c) from [<c077c04c>] (notifier_call_chain+0x38/0x68) [ 3.990173] [<c077c04c>] (notifier_call_chain+0x38/0x68) from [<c00cfad8>] (cpu_pm_notify+0x20/0x38) [ 3.999786] [<c00cfad8>] (cpu_pm_notify+0x20/0x38) from [<c00cfb10>] (cpu_cluster_pm_exit+0x20/0x50) [ 4.009399] [<c00cfb10>] (cpu_cluster_pm_exit+0x20/0x50) from [<c002c1c8>] (omap_enter_idle_coupled+0x11c/0x14c) [ 4.020111] [<c002c1c8>] (omap_enter_idle_coupled+0x11c/0x14c) from [<c05a1bc4>] (cpuidle_enter_state+0x40/0xec) [ 4.030822] [<c05a1bc4>] (cpuidle_enter_state+0x40/0xec) from [<c05a3c78>] (cpuidle_enter_state_coupled+0x1f4/0x240) [ 4.041870] [<c05a3c78>] (cpuidle_enter_state_coupled+0x1f4/0x240) from [<c05a1dc0>] (cpuidle_idle_call+0x150/0x228) [ 4.052947] [<c05a1dc0>] (cpuidle_idle_call+0x150/0x228) from [<c000e504>] (arch_cpu_idle+0x8/0x38) [ 4.062499] [<c000e504>] (arch_cpu_idle+0x8/0x38) from [<c007a528>] (cpu_startup_entry+0x178/0x1e4) [ 4.071990] [<c007a528>] (cpu_startup_entry+0x178/0x1e4) from [<800086c4>] (0x800086c4) [ 4.080383] Code: e5922288 03a03b0a 13a03c25 e0823003 (e5932000) [ 4.086791] ---[ end trace d83954a84a6fa69e ]--- It is supposed that sar_base is initialized in irq_save_context, which is called on CPU_CLUSTER_PM_ENTER notification. If this notification has been missed and CPU_CLUSTER_PM_EXIT is received sar_base is NULL. Fix it by calling CPU_CLUSTER_PM_{ENTER,EXIT} under the same condition. Signed-off-by: Vladimir Murzin <murzin.v@gmail.com> Acked-by: Kevin Hilman <khilman@linaro.org> Signed-off-by: Tony Lindgren <tony@atomide.com>

As the new x86 CPU bootup printout format code maintainer, I am taking immediate action to improve and clean (and thus indulge my OCD) the reporting of the cores when coming up online. Fix padding to a right-hand alignment, cleanup code and bind reporting width to the max number of supported CPUs on the system, like this: [ 0.074509] smpboot: Booting Node 0, Processors: #1 #2 #3 #4 #5 #6 #7 OK [ 0.644008] smpboot: Booting Node 1, Processors: #8 #9 #10 #11 #12 #13 #14 #15 OK [ 1.245006] smpboot: Booting Node 2, Processors: #16 #17 #18 #19 #20 #21 #22 #23 OK [ 1.864005] smpboot: Booting Node 3, Processors: #24 #25 #26 #27 #28 #29 #30 #31 OK [ 2.489005] smpboot: Booting Node 4, Processors: #32 #33 #34 #35 #36 #37 #38 #39 OK [ 3.093005] smpboot: Booting Node 5, Processors: #40 #41 #42 #43 #44 #45 #46 #47 OK [ 3.698005] smpboot: Booting Node 6, Processors: #48 #49 #50 #51 #52 #53 #54 #55 OK [ 4.304005] smpboot: Booting Node 7, Processors: #56 #57 #58 #59 #60 #61 #62 #63 OK [ 4.961413] Brought up 64 CPUs and this: [ 0.072367] smpboot: Booting Node 0, Processors: #1 #2 #3 #4 #5 #6 #7 OK [ 0.686329] Brought up 8 CPUs Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Libin <huawei.libin@huawei.com> Cc: wangyijing@huawei.com Cc: fenghua.yu@intel.com Cc: guohanjun@huawei.com Cc: paul.gortmaker@windriver.com Link: http://lkml.kernel.org/r/20130927143554.GF4422@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>

Turn it into (for example): [ 0.073380] x86: Booting SMP configuration: [ 0.074005] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 [ 0.603005] .... node #1, CPUs: #8 #9 #10 #11 #12 #13 #14 #15 [ 1.200005] .... node #2, CPUs: #16 #17 #18 #19 #20 #21 #22 #23 [ 1.796005] .... node #3, CPUs: #24 #25 #26 #27 #28 #29 #30 #31 [ 2.393005] .... node #4, CPUs: #32 #33 #34 #35 #36 #37 #38 #39 [ 2.996005] .... node #5, CPUs: #40 #41 #42 #43 #44 #45 #46 #47 [ 3.600005] .... node #6, CPUs: #48 #49 #50 #51 #52 #53 #54 #55 [ 4.202005] .... node #7, CPUs: #56 #57 #58 #59 #60 #61 #62 #63 [ 4.811005] .... node #8, CPUs: #64 #65 #66 #67 #68 #69 #70 #71 [ 5.421006] .... node #9, CPUs: #72 #73 #74 #75 #76 #77 #78 #79 [ 6.032005] .... node #10, CPUs: #80 #81 #82 #83 #84 #85 #86 #87 [ 6.648006] .... node #11, CPUs: #88 #89 #90 #91 #92 #93 #94 #95 [ 7.262005] .... node #12, CPUs: #96 #97 #98 #99 #100 #101 #102 #103 [ 7.865005] .... node #13, CPUs: #104 #105 #106 #107 #108 #109 #110 #111 [ 8.466005] .... node #14, CPUs: #112 #113 #114 #115 #116 #117 #118 #119 [ 9.073006] .... node #15, CPUs: #120 #121 #122 #123 #124 #125 #126 #127 [ 9.679901] x86: Booted up 16 nodes, 128 CPUs and drop useless elements. Change num_digits() to hpa's division-avoiding, cell-phone-typed version which he went at great lengths and pains to submit on a Saturday evening. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: huawei.libin@huawei.com Cc: wangyijing@huawei.com Cc: fenghua.yu@intel.com Cc: guohanjun@huawei.com Cc: paul.gortmaker@windriver.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20130930095624.GB16383@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>

clk_disable_unprepare in remove causes an imbalance and hence gives the below crash on module remove. While at it also remove some duplicate code from probe. / $ rmmod i2c-exynos5 [ 6.996374] ------------[ cut here ]------------ [ 6.999523] WARNING: CPU: 2 PID: 1137 at drivers/clk/clk.c:842 clk_disable+0x18/0x24() [ 7.007403] Modules linked in: i2c_exynos5(-) [ 7.011747] CPU: 2 PID: 1137 Comm: rmmod Not tainted 3.12.0-next-20131105-00083-g16f4799-dirty #21 [ 7.020696] [<c0014e0c>] (unwind_backtrace+0x0/0xf4) from [<c0011784>] (show_stack+0x10/0x14) [ 7.029190] [<c0011784>] (show_stack+0x10/0x14) from [<c037acd4>] (dump_stack+0x7c/0xb0) [ 7.037255] [<c037acd4>] (dump_stack+0x7c/0xb0) from [<c001e0ac>] (warn_slowpath_common+0x6c/0x88) [ 7.046190] [<c001e0ac>] (warn_slowpath_common+0x6c/0x88) from [<c001e164>] (warn_slowpath_null+0x1c/0x24) [ 7.055818] [<c001e164>] (warn_slowpath_null+0x1c/0x24) from [<c02dcde4>] (clk_disable+0x18/0x24) [ 7.064670] [<c02dcde4>] (clk_disable+0x18/0x24) from [<bf0002d4>] (exynos5_i2c_remove+0x1c/0x34 [i2c_exynos5]) [ 7.074736] [<bf0002d4>] (exynos5_i2c_remove+0x1c/0x34 [i2c_exynos5]) from [<c02274a8>] (__device_release_driver+0x58/0xb0) [ 7.085836] [<c02274a8>] (__device_release_driver+0x58/0xb0) from [<c0227b88>] (driver_detach+0xac/0xb0) [ 7.095291] [<c0227b88>] (driver_detach+0xac/0xb0) from [<c02271c0>] (bus_remove_driver+0x4c/0xa0) [ 7.104227] [<c02271c0>] (bus_remove_driver+0x4c/0xa0) from [<c00725dc>] (SyS_delete_module+0x124/0x194) [ 7.113682] [<c00725dc>] (SyS_delete_module+0x124/0x194) from [<c000e2e0>] (ret_fast_syscall+0x0/0x30) [ 7.122957] ---[ end trace 23bb6e4e0bf52196 ]--- Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org> Acked-by: Naveen Krishna Chatradhi <ch.naveen@samsung.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>

cond_resched_lock(cinfo->lock) is called everywhere else while holding the cinfo->lock spinlock. Not holding this lock while calling transfer_commit_list in filelayout_recover_commit_reqs causes the BUG below. It's true that we can't hold this lock while calling pnfs_put_lseg, because that might try to lock the inode lock - which might be the same lock as cinfo->lock. To reproduce, mount a 2 DS pynfs server and run an O_DIRECT command that crosses a stripe boundary and is not page aligned, such as: dd if=/dev/zero of=/mnt/f bs=17000 count=1 oflag=direct BUG: sleeping function called from invalid context at linux/fs/nfs/nfs4filelayout.c:1161 in_atomic(): 0, irqs_disabled(): 0, pid: 27, name: kworker/0:1 2 locks held by kworker/0:1/27: #0: (events){.+.+.+}, at: [<ffffffff810501d7>] process_one_work+0x175/0x3a5 #1: ((&dreq->work)){+.+...}, at: [<ffffffff810501d7>] process_one_work+0x175/0x3a5 CPU: 0 PID: 27 Comm: kworker/0:1 Not tainted 3.13.0-rc3-branch-dros_testing+ #21 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 Workqueue: events nfs_direct_write_schedule_work [nfs] 0000000000000000 ffff88007a39bbb8 ffffffff81491256 ffff88007b87a130 ffff88007a39bbd8 ffffffff8105f103 ffff880079614000 ffff880079617d40 ffff88007a39bc20 ffffffffa011603e ffff880078988b98 0000000000000000 Call Trace: [<ffffffff81491256>] dump_stack+0x4d/0x66 [<ffffffff8105f103>] __might_sleep+0x100/0x105 [<ffffffffa011603e>] transfer_commit_list+0x94/0xf1 [nfs_layout_nfsv41_files] [<ffffffffa01160d6>] filelayout_recover_commit_reqs+0x3b/0x68 [nfs_layout_nfsv41_files] [<ffffffffa00ba53a>] nfs_direct_write_reschedule+0x9f/0x1d6 [nfs] [<ffffffff810705df>] ? mark_lock+0x1df/0x224 [<ffffffff8106e617>] ? trace_hardirqs_off_caller+0x37/0xa4 [<ffffffff8106e691>] ? trace_hardirqs_off+0xd/0xf [<ffffffffa00ba8f8>] nfs_direct_write_schedule_work+0x9d/0xb7 [nfs] [<ffffffff810501d7>] ? process_one_work+0x175/0x3a5 [<ffffffff81050258>] process_one_work+0x1f6/0x3a5 [<ffffffff810501d7>] ? process_one_work+0x175/0x3a5 [<ffffffff8105187e>] worker_thread+0x149/0x1f5 [<ffffffff81051735>] ? rescuer_thread+0x28d/0x28d [<ffffffff81056d74>] kthread+0xd2/0xda [<ffffffff81056ca2>] ? __kthread_parkme+0x61/0x61 [<ffffffff8149e66c>] ret_from_fork+0x7c/0xb0 [<ffffffff81056ca2>] ? __kthread_parkme+0x61/0x61 Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

When a tx timeout occured, et131x_tx_timeout closed and re-opened the device to fix. As et131x_close called free_irq(), bad things ensued (see warning trace below), namely a storm of errors and warnings. Fixed by replacing the close() and open() calls with just the relevant functions previously called from these. Verified on an ET-1310 device. Signed-off-by: Mark Einon <mark.einon@gmail.com> ---------- Aug 2 21:26:08 msilap kernel: [ 6484.816024] ------------[ cut here ]------------ Aug 2 21:26:08 msilap kernel: [ 6484.816039] WARNING: at /home/mark/Source/staging-2.6/net/sched/sch_generic.c:255 dev_watchdog+0xf0/0x150() Aug 2 21:26:08 msilap kernel: [ 6484.816045] Hardware name: MS-1727 Aug 2 21:26:08 msilap kernel: [ 6484.816050] NETDEV WATCHDOG: eth1 (et131x): transmit queue 0 timed out Aug 2 21:26:08 msilap kernel: [ 6484.816054] Modules linked in: et131x(C) aes_generic tcp_lp fuse nouveau ttm drm_kms_helper drm i2c_algo_bit sunrpc ipv6 cpufreq_ondemand acpi_cpufreq freq_table mperf uinput arc4 iwlagn snd_hda_codec_hdmi mac80211 sdhci_pci sdhci snd_hda_codec_realtek cfg80211 snd_hda_intel snd_hda_codec mmc_core snd_hwdep firewire_ohci firewire_core jmb38x_ms snd_seq snd_seq_device snd_pcm mxm_wmi ir_lirc_codec lirc_dev ir_sony_decoder i2c_i801 ir_jvc_decoder ir_rc6_decoder rc_rc6_mce i7core_edac edac_core snd_timer pcspkr rfkill r8169 mii i2c_core video ir_rc5_decoder memstick iTCO_wdt iTCO_vendor_support ir_nec_decoder ene_ir rc_core joydev microcode wmi snd crc_itu_t soundcore snd_page_alloc [last unloaded: et131x] Aug 2 21:26:08 msilap kernel: [ 6484.816163] Pid: 0, comm: kworker/0:1 Tainted: G C 3.0.0-rc6+ torvalds#21 Aug 2 21:26:08 msilap kernel: [ 6484.816167] Call Trace: Aug 2 21:26:08 msilap kernel: [ 6484.816171] <IRQ> [<ffffffff8103caf4>] warn_slowpath_common+0x85/0x9d Aug 2 21:26:08 msilap kernel: [ 6484.816189] [<ffffffff8103cbaf>] warn_slowpath_fmt+0x46/0x48 Aug 2 21:26:08 msilap kernel: [ 6484.816196] [<ffffffff813aa6fb>] ? netif_tx_lock+0x4a/0x7b Aug 2 21:26:08 msilap kernel: [ 6484.816203] [<ffffffff813aa86f>] dev_watchdog+0xf0/0x150 Aug 2 21:26:08 msilap kernel: [ 6484.816211] [<ffffffff810497b6>] run_timer_softirq+0x1a9/0x279 Aug 2 21:26:08 msilap kernel: [ 6484.816218] [<ffffffff813aa77f>] ? netif_tx_unlock+0x53/0x53 Aug 2 21:26:08 msilap kernel: [ 6484.816226] [<ffffffff810429d7>] __do_softirq+0xd5/0x1a4 Aug 2 21:26:08 msilap kernel: [ 6484.816234] [<ffffffff810079ed>] ? paravirt_read_tsc+0x9/0xd Aug 2 21:26:08 msilap kernel: [ 6484.816245] [<ffffffff8144561c>] call_softirq+0x1c/0x26 Aug 2 21:26:08 msilap kernel: [ 6484.816251] [<ffffffff81003b9f>] do_softirq+0x46/0x83 Aug 2 21:26:08 msilap kernel: [ 6484.816257] [<ffffffff81042ca2>] irq_exit+0x52/0x9b Aug 2 21:26:08 msilap kernel: [ 6484.816265] [<ffffffff81445749>] smp_apic_timer_interrupt+0x7c/0x8a Aug 2 21:26:08 msilap kernel: [ 6484.816273] [<ffffffff814450d3>] apic_timer_interrupt+0x13/0x20 Aug 2 21:26:08 msilap kernel: [ 6484.816278] <EOI> [<ffffffff810079ed>] ? paravirt_read_tsc+0x9/0xd Aug 2 21:26:08 msilap kernel: [ 6484.816291] [<ffffffff8124ad87>] ? intel_idle+0xd1/0xf8 Aug 2 21:26:08 msilap kernel: [ 6484.816297] [<ffffffff8124ad69>] ? intel_idle+0xb3/0xf8 Aug 2 21:26:08 msilap kernel: [ 6484.816306] [<ffffffff8136a767>] cpuidle_idle_call+0xe2/0x160 Aug 2 21:26:08 msilap kernel: [ 6484.816315] [<ffffffff81001293>] cpu_idle+0xaa/0xcc Aug 2 21:26:08 msilap kernel: [ 6484.816323] [<ffffffff81436b4e>] start_secondary+0x248/0x24f Aug 2 21:26:08 msilap kernel: [ 6484.816329] ---[ end trace 10ae1b2c6bae932f ]--- Aug 2 21:26:08 msilap kernel: [ 6484.816337] et131x 0000:02:00.0: Send stuck - reset. tcb->WrIndex 0, flags 0x00000000 Aug 2 21:26:08 msilap kernel: [ 6484.816344] ------------[ cut here ]------------ Aug 2 21:26:08 msilap kernel: [ 6484.816353] WARNING: at /home/mark/Source/staging-2.6/kernel/irq/manage.c:1131 __free_irq+0x58/0x192() Aug 2 21:26:08 msilap kernel: [ 6484.816358] Hardware name: MS-1727 Aug 2 21:26:08 msilap kernel: [ 6484.816362] Trying to free IRQ 16 from IRQ context! Aug 2 21:26:08 msilap kernel: [ 6484.816365] Modules linked in: et131x(C) aes_generic tcp_lp fuse nouveau ttm drm_kms_helper drm i2c_algo_bit sunrpc ipv6 cpufreq_ondemand acpi_cpufreq freq_table mperf uinput arc4 iwlagn snd_hda_codec_hdmi mac80211 sdhci_pci sdhci snd_hda_codec_realtek cfg80211 snd_hda_intel snd_hda_codec mmc_core snd_hwdep firewire_ohci firewire_core jmb38x_ms snd_seq snd_seq_device snd_pcm mxm_wmi ir_lirc_codec lirc_dev ir_sony_decoder i2c_i801 ir_jvc_decoder ir_rc6_decoder rc_rc6_mce i7core_edac edac_core snd_timer pcspkr rfkill r8169 mii i2c_core video ir_rc5_decoder memstick iTCO_wdt iTCO_vendor_support ir_nec_decoder ene_ir rc_core joydev microcode wmi snd crc_itu_t soundcore snd_page_alloc [last unloaded: et131x] Aug 2 21:26:08 msilap kernel: [ 6484.816459] Pid: 0, comm: kworker/0:1 Tainted: G WC 3.0.0-rc6+ torvalds#21 Aug 2 21:26:08 msilap kernel: [ 6484.816464] Call Trace: Aug 2 21:26:08 msilap kernel: [ 6484.816467] <IRQ> [<ffffffff8103caf4>] warn_slowpath_common+0x85/0x9d Aug 2 21:26:08 msilap kernel: [ 6484.816480] [<ffffffff8103cbaf>] warn_slowpath_fmt+0x46/0x48 Aug 2 21:26:08 msilap kernel: [ 6484.816488] [<ffffffff81092276>] __free_irq+0x58/0x192 Aug 2 21:26:08 msilap kernel: [ 6484.816496] [<ffffffff8109240e>] free_irq+0x5e/0x77 Aug 2 21:26:08 msilap kernel: [ 6484.816506] [<ffffffffa00bf58e>] et131x_close+0x4b/0x5e [et131x] Aug 2 21:26:08 msilap kernel: [ 6484.816516] [<ffffffffa00bf64d>] et131x_tx_timeout+0xac/0xc7 [et131x] Aug 2 21:26:08 msilap kernel: [ 6484.816523] [<ffffffff813aa883>] dev_watchdog+0x104/0x150 Aug 2 21:26:08 msilap kernel: [ 6484.816530] [<ffffffff810497b6>] run_timer_softirq+0x1a9/0x279 Aug 2 21:26:08 msilap kernel: [ 6484.816537] [<ffffffff813aa77f>] ? netif_tx_unlock+0x53/0x53 Aug 2 21:26:08 msilap kernel: [ 6484.816544] [<ffffffff810429d7>] __do_softirq+0xd5/0x1a4 Aug 2 21:26:08 msilap kernel: [ 6484.816552] [<ffffffff810079ed>] ? paravirt_read_tsc+0x9/0xd Aug 2 21:26:08 msilap kernel: [ 6484.816560] [<ffffffff8144561c>] call_softirq+0x1c/0x26 Aug 2 21:26:08 msilap kernel: [ 6484.816566] [<ffffffff81003b9f>] do_softirq+0x46/0x83 Aug 2 21:26:08 msilap kernel: [ 6484.816572] [<ffffffff81042ca2>] irq_exit+0x52/0x9b Aug 2 21:26:08 msilap kernel: [ 6484.816580] [<ffffffff81445749>] smp_apic_timer_interrupt+0x7c/0x8a Aug 2 21:26:08 msilap kernel: [ 6484.816588] [<ffffffff814450d3>] apic_timer_interrupt+0x13/0x20 Aug 2 21:26:08 msilap kernel: [ 6484.816592] <EOI> [<ffffffff810079ed>] ? paravirt_read_tsc+0x9/0xd Aug 2 21:26:08 msilap kernel: [ 6484.816604] [<ffffffff8124ad87>] ? intel_idle+0xd1/0xf8 Aug 2 21:26:08 msilap kernel: [ 6484.816610] [<ffffffff8124ad69>] ? intel_idle+0xb3/0xf8 Aug 2 21:26:08 msilap kernel: [ 6484.816617] [<ffffffff8136a767>] cpuidle_idle_call+0xe2/0x160 Aug 2 21:26:08 msilap kernel: [ 6484.816625] [<ffffffff81001293>] cpu_idle+0xaa/0xcc Aug 2 21:26:08 msilap kernel: [ 6484.816632] [<ffffffff81436b4e>] start_secondary+0x248/0x24f Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

At a boot time I observed following bug: BUG: unable to handle kernel paging request at ffff8800a4244000 IP: [<ffffffff81275b5b>] memcpy+0xb/0x120 PGD 1816063 PUD 1fe7d067 PMD 1ff9f067 PTE 80000000a4244160 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC CPU 0 Modules linked in: btusb bluetooth brcmsmac brcmutil crc8 cordic b43 radeon(+) mac80211 cfg80211 ttm ohci_hcd drm_kms_helper rfkill drm ssb agpgart mmc_core sp5100_tco video battery ac thermal processor rtc_cmos thermal_sys snd_hda_codec_hdmi joydev snd_hda_codec_conexant button bcma pcmcia snd_hda_intel snd_hda_codec snd_hwdep snd_pcm shpchp pcmcia_core k8temp snd_timer atl1c snd psmouse hwmon i2c_piix4 i2c_algo_bit soundcore evdev i2c_core ehci_hcd sg serio_raw snd_page_alloc loop btrfs Pid: 1008, comm: modprobe Not tainted 3.3.0-rc1 torvalds#21 LENOVO 20046 /AMD CRB RIP: 0010:[<ffffffff81275b5b>] [<ffffffff81275b5b>] memcpy+0xb/0x120 RSP: 0018:ffff8800aa72db00 EFLAGS: 00010246 RAX: ffff8800a4150000 RBX: 0000000000001000 RCX: 0000000000000087 RDX: 0000000000000000 RSI: ffff8800a4244000 RDI: ffff8800a4150bc8 RBP: ffff8800aa72db78 R08: 0000000000000010 R09: ffffffff8174bbec R10: ffffffff812ee010 R11: 0000000000000001 R12: 0000000000001000 R13: 0000000000010000 R14: ffff8800a4140000 R15: ffff8800aaba1800 FS: 00007ff9a3bd4720(0000) GS:ffff8800afa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff8800a4244000 CR3: 00000000a9c18000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process modprobe (pid: 1008, threadinfo ffff8800aa72c000, task ffff8800aa0e4000) Stack: ffffffffa04e7c7b 0000000000000001 0000000000010000 ffff8800aa72db28 ffffffff00000001 0000000000001000 ffffffff8113cbef 0000000000000020 ffff8800a4243420 ffff880000000002 ffff8800aa72db08 ffff8800a9d42000 Call Trace: [<ffffffffa04e7c7b>] ? radeon_atrm_get_bios_chunk+0x8b/0xd0 [radeon] [<ffffffff8113cbef>] ? kmalloc_order_trace+0x3f/0xb0 [<ffffffffa04a9298>] radeon_get_bios+0x68/0x2f0 [radeon] [<ffffffffa04c7a30>] rv770_init+0x40/0x280 [radeon] [<ffffffffa047d740>] radeon_device_init+0x560/0x600 [radeon] [<ffffffffa047ef4f>] radeon_driver_load_kms+0xaf/0x170 [radeon] [<ffffffffa043cdde>] drm_get_pci_dev+0x18e/0x2c0 [drm] [<ffffffffa04e7e95>] radeon_pci_probe+0xad/0xb5 [radeon] [<ffffffff81296c5f>] local_pci_probe+0x5f/0xd0 [<ffffffff81297418>] pci_device_probe+0x88/0xb0 [<ffffffff813417aa>] ? driver_sysfs_add+0x7a/0xb0 [<ffffffff813418d8>] really_probe+0x68/0x180 [<ffffffff81341be5>] driver_probe_device+0x45/0x70 [<ffffffff81341cb3>] __driver_attach+0xa3/0xb0 [<ffffffff81341c10>] ? driver_probe_device+0x70/0x70 [<ffffffff813400ce>] bus_for_each_dev+0x5e/0x90 [<ffffffff8134172e>] driver_attach+0x1e/0x20 [<ffffffff81341298>] bus_add_driver+0xc8/0x280 [<ffffffff813422c6>] driver_register+0x76/0x140 [<ffffffff812976d6>] __pci_register_driver+0x66/0xe0 [<ffffffffa043d021>] drm_pci_init+0x111/0x120 [drm] [<ffffffff8133c67a>] ? vga_switcheroo_register_handler+0x3a/0x60 [<ffffffffa0229000>] ? 0xffffffffa0228fff [<ffffffffa02290ec>] radeon_init+0xec/0xee [radeon] [<ffffffff810002f2>] do_one_initcall+0x42/0x180 [<ffffffff8109d8d2>] sys_init_module+0x92/0x1e0 [<ffffffff815407a9>] system_call_fastpath+0x16/0x1b Code: 58 2a 43 50 88 43 4e 48 83 c4 08 5b c9 c3 66 90 e8 cb fd ff ff eb e6 90 90 90 90 90 90 90 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c RIP [<ffffffff81275b5b>] memcpy+0xb/0x120 RSP <ffff8800aa72db00> CR2: ffff8800a4244000 ---[ end trace fcffa1599cf56382 ]--- Call to acpi_evaluate_object() not always returns 4096 bytes chunks, on my system it can return 2048 bytes chunk, so pass the length of retrieved chunk to memcpy(), not the length of the recieving buffer. Signed-off-by: Igor Murzov <e-mail@date.by> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>

If the netdev is already in NETREG_UNREGISTERING/_UNREGISTERED state, do not update the real num tx queues. netdev_queue_update_kobjects() is already called via remove_queue_kobjects() at NETREG_UNREGISTERING time. So, when upper layer driver, e.g., FCoE protocol stack is monitoring the netdev event of NETDEV_UNREGISTER and calls back to LLD ndo_fcoe_disable() to remove extra queues allocated for FCoE, the associated txq sysfs kobjects are already removed, and trying to update the real num queues would cause something like below: ... PID: 25138 TASK: ffff88021e64c440 CPU: 3 COMMAND: "kworker/3:3" #0 [ffff88021f007760] machine_kexec at ffffffff810226d9 #1 [ffff88021f0077d0] crash_kexec at ffffffff81089d2d #2 [ffff88021f0078a0] oops_end at ffffffff813bca78 #3 [ffff88021f0078d0] no_context at ffffffff81029e72 #4 [ffff88021f007920] __bad_area_nosemaphore at ffffffff8102a155 #5 [ffff88021f0079f0] bad_area_nosemaphore at ffffffff8102a23e torvalds#6 [ffff88021f007a00] do_page_fault at ffffffff813bf32e torvalds#7 [ffff88021f007b10] page_fault at ffffffff813bc045 [exception RIP: sysfs_find_dirent+17] RIP: ffffffff81178611 RSP: ffff88021f007bc0 RFLAGS: 00010246 RAX: ffff88021e64c440 RBX: ffffffff8156cc63 RCX: 0000000000000004 RDX: ffffffff8156cc63 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88021f007be0 R8: 0000000000000004 R9: 0000000000000008 R10: ffffffff816fed00 R11: 0000000000000004 R12: 0000000000000000 R13: ffffffff8156cc63 R14: 0000000000000000 R15: ffff8802222a0000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 torvalds#8 [ffff88021f007be8] sysfs_get_dirent at ffffffff81178c07 torvalds#9 [ffff88021f007c18] sysfs_remove_group at ffffffff8117ac27 torvalds#10 [ffff88021f007c48] netdev_queue_update_kobjects at ffffffff813178f9 torvalds#11 [ffff88021f007c88] netif_set_real_num_tx_queues at ffffffff81303e38 torvalds#12 [ffff88021f007cc8] ixgbe_set_num_queues at ffffffffa0249763 [ixgbe] torvalds#13 [ffff88021f007cf8] ixgbe_init_interrupt_scheme at ffffffffa024ea89 [ixgbe] torvalds#14 [ffff88021f007d48] ixgbe_fcoe_disable at ffffffffa0267113 [ixgbe] torvalds#15 [ffff88021f007d68] vlan_dev_fcoe_disable at ffffffffa014fef5 [8021q] torvalds#16 [ffff88021f007d78] fcoe_interface_cleanup at ffffffffa02b7dfd [fcoe] torvalds#17 [ffff88021f007df8] fcoe_destroy_work at ffffffffa02b7f08 [fcoe] torvalds#18 [ffff88021f007e18] process_one_work at ffffffff8105d7ca torvalds#19 [ffff88021f007e68] worker_thread at ffffffff81060513 torvalds#20 [ffff88021f007ee8] kthread at ffffffff810648b6 torvalds#21 [ffff88021f007f48] kernel_thread_helper at ffffffff813c40f4 Signed-off-by: Yi Zou <yi.zou@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Tested-by: Stephen Ko <stephen.s.ko@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Printing the "start_ip" for every secondary cpu is very noisy on a large system - and doesn't add any value. Drop this message. Console log before: Booting Node 0, Processors #1 smpboot cpu 1: start_ip = 96000 #2 smpboot cpu 2: start_ip = 96000 #3 smpboot cpu 3: start_ip = 96000 #4 smpboot cpu 4: start_ip = 96000 ... torvalds#31 smpboot cpu 31: start_ip = 96000 Brought up 32 CPUs Console log after: Booting Node 0, Processors #1 #2 #3 #4 #5 torvalds#6 torvalds#7 Ok. Booting Node 1, Processors torvalds#8 torvalds#9 torvalds#10 torvalds#11 torvalds#12 torvalds#13 torvalds#14 torvalds#15 Ok. Booting Node 0, Processors torvalds#16 torvalds#17 torvalds#18 torvalds#19 torvalds#20 torvalds#21 torvalds#22 torvalds#23 Ok. Booting Node 1, Processors torvalds#24 torvalds#25 torvalds#26 torvalds#27 torvalds#28 torvalds#29 torvalds#30 torvalds#31 Brought up 32 CPUs Acked-by: Borislav Petkov <bp@amd64.org> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/4f452eb42507460426@agluck-desktop.sc.intel.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>

I'm getting the spew below when booting with Haswell (Xeon E5-2699 v3) CPUs and the "Cluster-on-Die" (CoD) feature enabled in the BIOS. It seems similar to the issue that some folks from AMD ran in to on their systems and addressed in this commit: 161270f ("x86/smp: Fix topology checks on AMD MCM CPUs") Both these Intel and AMD systems break an assumption which is being enforced by topology_sane(): a socket may not contain more than one NUMA node. AMD special-cased their system by looking for a cpuid flag. The Intel mode is dependent on BIOS options and I do not know of a way which it is enumerated other than the tables being parsed during the CPU bringup process. In other words, we have to trust the ACPI tables <shudder>. This detects the situation where a NUMA node occurs at a place in the middle of the "CPU" sched domains. It replaces the default topology with one that relies on the NUMA information from the firmware (SRAT table) for all levels of sched domains above the hyperthreads. This also fixes a sysfs bug. We used to freak out when we saw the "mc" group cross a node boundary, so we stopped building the MC group. MC gets exported as the 'core_siblings_list' in /sys/devices/system/cpu/cpu*/topology/ and this caused CPUs with the same 'physical_package_id' to not be listed together in 'core_siblings_list'. This violates a statement from Documentation/ABI/testing/sysfs-devices-system-cpu: core_siblings: internal kernel map of cpu#'s hardware threads within the same physical_package_id. core_siblings_list: human-readable list of the logical CPU numbers within the same physical_package_id as cpu#. The sysfs effects here cause an issue with the hwloc tool where it gets confused and thinks there are more sockets than are physically present. Before this patch, there are two packages: # cd /sys/devices/system/cpu/ # cat cpu*/topology/physical_package_id | sort | uniq -c 18 0 18 1 But 4 _sets_ of core siblings: # cat cpu*/topology/core_siblings_list | sort | uniq -c 9 0-8 9 18-26 9 27-35 9 9-17 After this set, there are only 2 sets of core siblings, which is what we expect for a 2-socket system. # cat cpu*/topology/physical_package_id | sort | uniq -c 18 0 18 1 # cat cpu*/topology/core_siblings_list | sort | uniq -c 18 0-17 18 18-35 Example spew: ... NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter. #2 #3 #4 #5 #6 #7 #8 .... node #1, CPUs: #9 ------------[ cut here ]------------ WARNING: CPU: 9 PID: 0 at /home/ak/hle/linux-hle-2.6/arch/x86/kernel/smpboot.c:306 topology_sane.isra.2+0x74/0x90() sched: CPU #9's mc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency. Modules linked in: CPU: 9 PID: 0 Comm: swapper/9 Not tainted 3.17.0-rc1-00293-g8e01c4d-dirty torvalds#631 Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRNDSDP1.86B.0036.R05.1407140519 07/14/2014 0000000000000009 ffff88046ddabe00 ffffffff8172e485 ffff88046ddabe48 ffff88046ddabe38 ffffffff8109691d 000000000000b001 0000000000000009 ffff88086fc12580 000000000000b020 0000000000000009 ffff88046ddabe98 Call Trace: [<ffffffff8172e485>] dump_stack+0x45/0x56 [<ffffffff8109691d>] warn_slowpath_common+0x7d/0xa0 [<ffffffff8109698c>] warn_slowpath_fmt+0x4c/0x50 [<ffffffff81074f94>] topology_sane.isra.2+0x74/0x90 [<ffffffff8107530e>] set_cpu_sibling_map+0x31e/0x4f0 [<ffffffff8107568d>] start_secondary+0x1ad/0x240 ---[ end trace 3fe5f587a9fcde61 ]--- #10 #11 #12 #13 #14 #15 #16 #17 .... node #2, CPUs: #18 #19 #20 #21 #22 #23 #24 #25 #26 .... node #3, CPUs: #27 #28 #29 #30 #31 #32 #33 #34 #35 Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> [ Added LLC domain and s/match_mc/match_die/ ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: David Rientjes <rientjes@google.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: brice.goglin@gmail.com Cc: "H. Peter Anvin" <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/20140918193334.C065EBCE@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>

This patch wires up the new syscall sys_bpf() on powerpc. Passes the tests in samples/bpf: #0 add+sub+mul OK #1 unreachable OK #2 unreachable2 OK #3 out of range jump OK #4 out of range jump2 OK #5 test1 ld_imm64 OK #6 test2 ld_imm64 OK #7 test3 ld_imm64 OK #8 test4 ld_imm64 OK #9 test5 ld_imm64 OK #10 no bpf_exit OK #11 loop (back-edge) OK #12 loop2 (back-edge) OK #13 conditional loop OK #14 read uninitialized register OK #15 read invalid register OK #16 program doesn't init R0 before exit OK #17 stack out of bounds OK #18 invalid call insn1 OK #19 invalid call insn2 OK #20 invalid function call OK #21 uninitialized stack1 OK #22 uninitialized stack2 OK #23 check valid spill/fill OK #24 check corrupted spill/fill OK #25 invalid src register in STX OK #26 invalid dst register in STX OK #27 invalid dst register in ST OK #28 invalid src register in LDX OK #29 invalid dst register in LDX OK #30 junk insn OK #31 junk insn2 OK #32 junk insn3 OK #33 junk insn4 OK #34 junk insn5 OK #35 misaligned read from stack OK #36 invalid map_fd for function call OK #37 don't check return value before access OK #38 access memory with incorrect alignment OK #39 sometimes access memory with incorrect alignment OK #40 jump test 1 OK #41 jump test 2 OK #42 jump test 3 OK #43 jump test 4 OK Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> [mpe: test using samples/bpf] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

ARM has some private syscalls (for example, set_tls(2)) which lie outside the range of NR_syscalls. If any of these are called while syscall tracing is being performed, out-of-bounds array access will occur in the ftrace and perf sys_{enter,exit} handlers. # trace-cmd record -e raw_syscalls:* true && trace-cmd report ... true-653 [000] 384.675777: sys_enter: NR 192 (0, 1000, 3, 4000022, ffffffff, 0) true-653 [000] 384.675812: sys_exit: NR 192 = 1995915264 true-653 [000] 384.675971: sys_enter: NR 983045 (76f74480, 76f74000, 76f74b28, 76f74480, 76f76f74, 1) true-653 [000] 384.675988: sys_exit: NR 983045 = 0 ... # trace-cmd record -e syscalls:* true [ 17.289329] Unable to handle kernel paging request at virtual address aaaaaace [ 17.289590] pgd = 9e71c000 [ 17.289696] [aaaaaace] *pgd=00000000 [ 17.289985] Internal error: Oops: 5 [#1] PREEMPT SMP ARM [ 17.290169] Modules linked in: [ 17.290391] CPU: 0 PID: 704 Comm: true Not tainted 3.18.0-rc2+ #21 [ 17.290585] task: 9f4dab00 ti: 9e710000 task.ti: 9e710000 [ 17.290747] PC is at ftrace_syscall_enter+0x48/0x1f8 [ 17.290866] LR is at syscall_trace_enter+0x124/0x184 Fix this by ignoring out-of-NR_syscalls-bounds syscall numbers. Commit cd0980f "tracing: Check invalid syscall nr while tracing syscalls" added the check for less than zero, but it should have also checked for greater than NR_syscalls. Link: http://lkml.kernel.org/p/1414620418-29472-1-git-send-email-rabin@rab.in Fixes: cd0980f "tracing: Check invalid syscall nr while tracing syscalls" Cc: stable@vger.kernel.org # 2.6.33+ Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

On i.MX6SX sdb platform, there has two same enet MACs, after system up, just eth0 is up, and then do suspend/resume test: [ 50.437967] PM: Syncing filesystems ... done. [ 50.476924] Freezing user space processes ... (elapsed 0.005 seconds) done. [ 50.490093] Freezing remaining freezable tasks ... (elapsed 0.004 seconds) done. [ 50.559771] ------------[ cut here ]------------ [ 50.564453] WARNING: CPU: 0 PID: 575 at drivers/clk/clk.c:851 __clk_disable+0x60/0x6c() [ 50.572475] Modules linked in: [ 50.575578] CPU: 0 PID: 575 Comm: sh Not tainted 3.18.0-rc2-next-20141031-00007-gf61135b #21 [ 50.584031] Backtrace: [ 50.586550] [<80011ecc>] (dump_backtrace) from [<8001206c>] (show_stack+0x18/0x1c) [ 50.594136] r6:808a7a54 r5:00000000 r4:00000000 r3:00000000 [ 50.599920] [<80012054>] (show_stack) from [<806ab3c0>] (dump_stack+0x80/0x9c) [ 50.607187] [<806ab340>] (dump_stack) from [<8002a3e8>] (warn_slowpath_common+0x6c/0x8c) [ 50.615294] r5:00000353 r4:00000000 [ 50.618940] [<8002a37c>] (warn_slowpath_common) from [<8002a42c>] (warn_slowpath_null+0x24/0x2c) [ 50.627738] r8:00000000 r7:be144c44 r6:be015600 r5:80070013 r4:be015600 [ 50.634573] [<8002a408>] (warn_slowpath_null) from [<804f8d4c>] (__clk_disable+0x60/0x6c) [ 50.642777] [<804f8cec>] (__clk_disable) from [<804f8e5c>] (clk_disable+0x2c/0x38) [ 50.650359] r4:be015600 r3:00000000 [ 50.654006] [<804f8e30>] (clk_disable) from [<80420ab4>] (fec_enet_clk_enable+0xc4/0x258) [ 50.662196] r5:be3cb620 r4:be3cb000 [ 50.665838] [<804209f0>] (fec_enet_clk_enable) from [<80421178>] (fec_suspend+0x30/0x180) [ 50.674026] r7:be144c44 r6:be144c10 r5:8037f5a4 r4:be3cb000 [ 50.679802] [<80421148>] (fec_suspend) from [<8037f5d8>] (platform_pm_suspend+0x34/0x64) [ 50.687906] r10:00000000 r9:00000000 r8:00000000 r7:be144c44 r6:be144c10 r5:8037f5a4 [ 50.695852] r4:be144c10 r3:80421148 [ 50.699511] [<8037f5a4>] (platform_pm_suspend) from [<8038784c>] (dpm_run_callback.isra.14+0x34/0x6c) [ 50.708764] [<80387818>] (dpm_run_callback.isra.14) from [<80387f00>] (__device_suspend+0x12c/0x2a4) [ 50.717909] r9:8098ec8c r8:80973bec r6:00000002 r5:811c7038 r4:be144c10 [ 50.724746] [<80387dd4>] (__device_suspend) from [<803894fc>] (dpm_suspend+0x64/0x224) [ 50.732675] r8:80973bec r7:be144c10 r6:8098ec24 r5:811c7038 r4:be144cc4 [ 50.739509] [<80389498>] (dpm_suspend) from [<8038999c>] (dpm_suspend_start+0x60/0x68) [ 50.747438] r10:8082fa24 r9:00000000 r8:00000004 r7:00000003 r6:00000000 r5:8116ec80 [ 50.755386] r4:00000002 [ 50.757969] [<8038993c>] (dpm_suspend_start) from [<800679d8>] (suspend_devices_and_enter+0x90/0x3ec) [ 50.767202] r4:00000003 r3:8116eca0 [ 50.770843] [<80067948>] (suspend_devices_and_enter) from [<80067f40>] (pm_suspend+0x20c/0x2a4) [ 50.779553] r8:00000004 r7:00000003 r6:00000000 r5:8116ec8c r4:00000003 [ 50.786394] [<80067d34>] (pm_suspend) from [<80066858>] (state_store+0x70/0xc0) [ 50.793718] r6:8116ec90 r5:00000003 r4:bd88a800 r3:0000006d [ 50.799496] [<800667e8>] (state_store) from [<802b0384>] (kobj_attr_store+0x1c/0x28) [ 50.807251] r10:bd399f78 r8:00000000 r7:bd88a800 r6:bd88a800 r5:00000004 r4:bd085680 [ 50.815219] [<802b0368>] (kobj_attr_store) from [<80153090>] (sysfs_kf_write+0x54/0x58) [ 50.823252] [<8015303c>] (sysfs_kf_write) from [<80151fd8>] (kernfs_fop_write+0xd0/0x194) [ 50.831441] r6:00000004 r5:bd08568c r4:bd085680 r3:8015303c [ 50.837220] [<80151f08>] (kernfs_fop_write) from [<800eddb4>] (vfs_write+0xb8/0x1a8) [ 50.844975] r10:00000000 r9:00000000 r8:00000000 r7:bd399f78 r6:01336408 r5:00000004 [ 50.852924] r4:bc584dc0 [ 50.855505] [<800edcfc>] (vfs_write) from [<800ee0b8>] (SyS_write+0x48/0x88) [ 50.862567] r10:00000000 r8:00000000 r7:01336408 r6:00000004 r5:bc584dc0 r4:bc584dc0 [ 50.870537] [<800ee070>] (SyS_write) from [<8000eb00>] (ret_fast_syscall+0x0/0x48) [ 50.878120] r9:bd398000 r8:8000ecc4 r7:00000004 r6:76f42b48 r5:01336408 r4:00000004 [ 50.885983] ---[ end trace 7545115d752a316a ]--- [ 50.890765] ------------[ cut here ]------------ The root cause is that eth1 is not opened and clock is not enabled, and .suspend() still call .fec_enet_clk_enable() to disable clock. To avoid the broken, let it check network device up status by calling .netif_running() before disable/enable clocks. Signed-off-by: Fugang Duan <B38611@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>

[ Upstream commit b058809 ] A bug is present in GDB which causes early string termination when parsing variables. This has been reported [0], but we should ensure that we can support at least basic printing of the core kernel strings. For current gdb version (has been tested with 7.3 and 8.1), 'lx-version' only prints one character. (gdb) lx-version L(gdb) This can be fixed by casting 'linux_banner' as (char *). (gdb) lx-version Linux version 4.19.0-rc1+ (changbin@acer) (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)) linux-sunxi#21 SMP Sat Sep 1 21:43:30 CST 2018 [0] https://sourceware.org/bugzilla/show_bug.cgi?id=20077 [kbingham@kernel.org: add detail to commit message] Link: http://lkml.kernel.org/r/20181111162035.8356-1-kieran.bingham@ideasonboard.com Fixes: 2d061d9 ("scripts/gdb: add version command") Signed-off-by: Du Changbin <changbin.du@gmail.com> Signed-off-by: Kieran Bingham <kbingham@kernel.org> Acked-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

Eric has reported on OpenWrt's bug tracking system[1], that he's not able to use USB devices on his WNDR3400v2 device after the boot, until he turns on GPIO #21 manually through sysfs. 1. https://bugs.openwrt.org/index.php?do=details&task_id=2170 Cc: Rafał Miłecki <zajec5@gmail.com> Cc: Hauke Mehrtens <hauke@hauke-m.de> Reported-by: Eric Bohlman <ericbohlman@gmail.com> Tested-by: Eric Bohlman <ericbohlman@gmail.com> Signed-off-by: Petr Štetiar <ynezz@true.cz> Signed-off-by: Paul Burton <paul.burton@mips.com>

[ Upstream commit cdb8faa ] Eric has reported on OpenWrt's bug tracking system[1], that he's not able to use USB devices on his WNDR3400v2 device after the boot, until he turns on GPIO linux-sunxi#21 manually through sysfs. 1. https://bugs.openwrt.org/index.php?do=details&task_id=2170 Cc: Rafał Miłecki <zajec5@gmail.com> Cc: Hauke Mehrtens <hauke@hauke-m.de> Reported-by: Eric Bohlman <ericbohlman@gmail.com> Tested-by: Eric Bohlman <ericbohlman@gmail.com> Signed-off-by: Petr Štetiar <ynezz@true.cz> Signed-off-by: Paul Burton <paul.burton@mips.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

af_inet sets sock->sk to NULL which trips strparser over: BUG: kernel NULL pointer dereference, address: 0000000000000012 PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 7 PID: 0 Comm: swapper/7 Not tainted 5.2.0-rc1-00139-g14629453a6d3 linux-sunxi#21 RIP: 0010:tcp_peek_len+0x10/0x60 RSP: 0018:ffffc02e41c54b98 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff9cf924c4e030 RCX: 0000000000000051 RDX: 0000000000000000 RSI: 000000000000000c RDI: ffff9cf97128f480 RBP: ffff9cf9365e0300 R08: ffff9cf94fe7d2c0 R09: 0000000000000000 R10: 000000000000036b R11: ffff9cf939735e00 R12: ffff9cf91ad9ae40 R13: ffff9cf924c4e000 R14: ffff9cf9a8fcbaae R15: 0000000000000020 FS: 0000000000000000(0000) GS:ffff9cf9af7c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000012 CR3: 000000013920a003 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> strp_data_ready+0x48/0x90 tls_data_ready+0x22/0xd0 [tls] tcp_rcv_established+0x569/0x620 tcp_v4_do_rcv+0x127/0x1e0 tcp_v4_rcv+0xad7/0xbf0 ip_protocol_deliver_rcu+0x2c/0x1c0 ip_local_deliver_finish+0x41/0x50 ip_local_deliver+0x6b/0xe0 ? ip_protocol_deliver_rcu+0x1c0/0x1c0 ip_rcv+0x52/0xd0 ? ip_rcv_finish_core.isra.20+0x380/0x380 __netif_receive_skb_one_core+0x7e/0x90 netif_receive_skb_internal+0x42/0xf0 napi_gro_receive+0xed/0x150 nfp_net_poll+0x7a2/0xd30 [nfp] ? kmem_cache_free_bulk+0x286/0x310 net_rx_action+0x149/0x3b0 __do_softirq+0xe3/0x30a ? handle_irq_event_percpu+0x6a/0x80 irq_exit+0xe8/0xf0 do_IRQ+0x85/0xd0 common_interrupt+0xf/0xf </IRQ> RIP: 0010:cpuidle_enter_state+0xbc/0x450 To avoid this issue set sock->sk after sk_prot->close. My grepping and testing did not discover any code which would depend on the current behaviour. Fixes: c46234e ("tls: RX path for ktls") Reported-by: David Beckett <david.beckett@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>

[ Upstream commit 2542604 ] When a matchall classifier is added, there is a small time interval in which tp->root is NULL. If we receive a packet in this small time slice a NULL pointer dereference will happen, leading to a kernel panic: # tc qdisc replace dev eth0 ingress # tc filter add dev eth0 parent ffff: matchall action gact drop Unable to handle kernel NULL pointer dereference at virtual address 0000000000000034 Mem abort info: ESR = 0x96000005 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000005 CM = 0, WnR = 0 user pgtable: 4k pages, 39-bit VAs, pgdp = 00000000a623d530 [0000000000000034] pgd=0000000000000000, pud=0000000000000000 Internal error: Oops: 96000005 [jwrdegoede#1] SMP Modules linked in: cls_matchall sch_ingress nls_iso8859_1 nls_cp437 vfat fat m25p80 spi_nor mtd xhci_plat_hcd xhci_hcd phy_generic sfp mdio_i2c usbcore i2c_mv64xxx marvell10g mvpp2 usb_common spi_orion mvmdio i2c_core sbsa_gwdt phylink ip_tables x_tables autofs4 Process ksoftirqd/0 (pid: 9, stack limit = 0x0000000009de7d62) CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 5.1.0-rc6 linux-sunxi#21 Hardware name: Marvell 8040 MACCHIATOBin Double-shot (DT) pstate: 40000005 (nZcv daif -PAN -UAO) pc : mall_classify+0x28/0x78 [cls_matchall] lr : tcf_classify+0x78/0x138 sp : ffffff80109db9d0 x29: ffffff80109db9d0 x28: ffffffc426058800 x27: 0000000000000000 x26: ffffffc425b0dd00 x25: 0000000020000000 x24: 0000000000000000 x23: ffffff80109dbac0 x22: 0000000000000001 x21: ffffffc428ab5100 x20: ffffffc425b0dd00 x19: ffffff80109dbac0 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000000 x13: ffffffbf108ad288 x12: dead000000000200 x11: 00000000f0000000 x10: 0000000000000001 x9 : ffffffbf1089a220 x8 : 0000000000000001 x7 : ffffffbebffaa950 x6 : 0000000000000000 x5 : 000000442d6ba000 x4 : 0000000000000000 x3 : ffffff8008735ad8 x2 : ffffff80109dbac0 x1 : ffffffc425b0dd00 x0 : ffffff8010592078 Call trace: mall_classify+0x28/0x78 [cls_matchall] tcf_classify+0x78/0x138 __netif_receive_skb_core+0x29c/0xa20 __netif_receive_skb_one_core+0x34/0x60 __netif_receive_skb+0x28/0x78 netif_receive_skb_internal+0x2c/0xc0 napi_gro_receive+0x1a0/0x1d8 mvpp2_poll+0x928/0xb18 [mvpp2] net_rx_action+0x108/0x378 __do_softirq+0x128/0x320 run_ksoftirqd+0x44/0x60 smpboot_thread_fn+0x168/0x1b0 kthread+0x12c/0x130 ret_from_fork+0x10/0x1c Code: aa0203f3 aa1e03e0 d503201f f9400684 (b9403480) ---[ end trace fc71e2ef7b8ab5a5 ]--- Kernel panic - not syncing: Fatal exception in interrupt SMP: stopping secondary CPUs Kernel Offset: disabled CPU features: 0x002,00002000 Memory Limit: none Rebooting in 1 seconds.. Fix this by adding a NULL check in mall_classify(). Fixes: ed76f5e ("net: sched: protect filter_chain list with filter_chain_lock mutex") Signed-off-by: Matteo Croce <mcroce@redhat.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

A deadlock with this stacktrace was observed. The loop thread does a GFP_KERNEL allocation, it calls into dm-bufio shrinker and the shrinker depends on I/O completion in the dm-bufio subsystem. In order to fix the deadlock (and other similar ones), we set the flag PF_MEMALLOC_NOIO at loop thread entry. PID: 474 TASK: ffff8813e11f4600 CPU: 10 COMMAND: "kswapd0" #0 [ffff8813dedfb938] __schedule at ffffffff8173f405 #1 [ffff8813dedfb990] schedule at ffffffff8173fa27 #2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec #3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186 #4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f #5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8 #6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81 linux-sunxi#7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio] linux-sunxi#8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio] linux-sunxi#9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio] linux-sunxi#10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce linux-sunxi#11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778 linux-sunxi#12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f linux-sunxi#13 [ffff8813dedfbec0] kthread at ffffffff810a8428 linux-sunxi#14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242 PID: 14127 TASK: ffff881455749c00 CPU: 11 COMMAND: "loop1" #0 [ffff88272f5af228] __schedule at ffffffff8173f405 #1 [ffff88272f5af280] schedule at ffffffff8173fa27 #2 [ffff88272f5af2a0] schedule_preempt_disabled at ffffffff8173fd5e #3 [ffff88272f5af2b0] __mutex_lock_slowpath at ffffffff81741fb5 #4 [ffff88272f5af330] mutex_lock at ffffffff81742133 #5 [ffff88272f5af350] dm_bufio_shrink_count at ffffffffa03865f9 [dm_bufio] #6 [ffff88272f5af380] shrink_slab at ffffffff811a86bd linux-sunxi#7 [ffff88272f5af470] shrink_zone at ffffffff811ad778 linux-sunxi#8 [ffff88272f5af500] do_try_to_free_pages at ffffffff811adb34 linux-sunxi#9 [ffff88272f5af590] try_to_free_pages at ffffffff811adef8 linux-sunxi#10 [ffff88272f5af610] __alloc_pages_nodemask at ffffffff811a09c3 linux-sunxi#11 [ffff88272f5af710] alloc_pages_current at ffffffff811e8b71 linux-sunxi#12 [ffff88272f5af760] new_slab at ffffffff811f4523 linux-sunxi#13 [ffff88272f5af7b0] __slab_alloc at ffffffff8173a1b5 linux-sunxi#14 [ffff88272f5af880] kmem_cache_alloc at ffffffff811f484b linux-sunxi#15 [ffff88272f5af8d0] do_blockdev_direct_IO at ffffffff812535b3 linux-sunxi#16 [ffff88272f5afb00] __blockdev_direct_IO at ffffffff81255dc3 linux-sunxi#17 [ffff88272f5afb30] xfs_vm_direct_IO at ffffffffa01fe3fc [xfs] linux-sunxi#18 [ffff88272f5afb90] generic_file_read_iter at ffffffff81198994 linux-sunxi#19 [ffff88272f5afc50] __dta_xfs_file_read_iter_2398 at ffffffffa020c970 [xfs] linux-sunxi#20 [ffff88272f5afcc0] lo_rw_aio at ffffffffa0377042 [loop] linux-sunxi#21 [ffff88272f5afd70] loop_queue_work at ffffffffa0377c3b [loop] linux-sunxi#22 [ffff88272f5afe60] kthread_worker_fn at ffffffff810a8a0c linux-sunxi#23 [ffff88272f5afec0] kthread at ffffffff810a8428 linux-sunxi#24 [ffff88272f5aff50] ret_from_fork at ffffffff81745242 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>

Without this patch, the MAP_SYNC test case will cause a print_bad_pte warning on arm64 as follows: [ 25.542693] BUG: Bad page map in process mapdax333 pte:2e8000448800f53 pmd:41ff5f003 [ 25.546360] page:ffff7e0010220000 refcount:1 mapcount:-1 mapping:ffff8003e29c7440 index:0x0 [ 25.550281] ext4_dax_aops [ 25.550282] name:"__aaabbbcccddd__" [ 25.551553] flags: 0x3ffff0000001002(referenced|reserved) [ 25.555802] raw: 03ffff0000001002 ffff8003dfffa908 0000000000000000 ffff8003e29c7440 [ 25.559446] raw: 0000000000000000 0000000000000000 00000001fffffffe 0000000000000000 [ 25.563075] page dumped because: bad pte [ 25.564938] addr:0000ffffbe05b000 vm_flags:208000fb anon_vma:0000000000000000 mapping:ffff8003e29c7440 index:0 [ 25.574272] file:__aaabbbcccddd__ fault:ext4_dax_fault mmmmap:ext4_file_mmap readpage:0x0 [ 25.578799] CPU: 1 PID: 1180 Comm: mapdax333 Not tainted 5.2.0+ linux-sunxi#21 [ 25.581702] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 25.585624] Call trace: [ 25.587008] dump_backtrace+0x0/0x178 [ 25.588799] show_stack+0x24/0x30 [ 25.590328] dump_stack+0xa8/0xcc [ 25.591901] print_bad_pte+0x18c/0x218 [ 25.593628] unmap_page_range+0x778/0xc00 [ 25.595506] unmap_single_vma+0x94/0xe8 [ 25.597304] unmap_vmas+0x90/0x108 [ 25.598901] unmap_region+0xc0/0x128 [ 25.600566] __do_munmap+0x284/0x3f0 [ 25.602245] __vm_munmap+0x78/0xe0 [ 25.603820] __arm64_sys_munmap+0x34/0x48 [ 25.605709] el0_svc_common.constprop.0+0x78/0x168 [ 25.607956] el0_svc_handler+0x34/0x90 [ 25.609698] el0_svc+0x8/0xc [...] The root cause is in _vm_normal_page, without the PTE_SPECIAL bit, the return value will be incorrectly set to pfn_to_page(pfn) instead of NULL. Besides, this patch also rewrite the pmd_mkdevmap to avoid setting PTE_SPECIAL for pmd The MAP_SYNC test case is as follows(Provided by Yibo Cai) $#include <stdio.h> $#include <string.h> $#include <unistd.h> $#include <sys/file.h> $#include <sys/mman.h> $#ifndef MAP_SYNC $#define MAP_SYNC 0x80000 $#endif /* mount -o dax /dev/pmem0 /mnt */ $#define F "/mnt/__aaabbbcccddd__" int main(void) { int fd; char buf[4096]; void *addr; if ((fd = open(F, O_CREAT|O_TRUNC|O_RDWR, 0644)) < 0) { perror("open1"); return 1; } if (write(fd, buf, 4096) != 4096) { perror("lseek"); return 1; } addr = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_SYNC, fd, 0); if (addr == MAP_FAILED) { perror("mmap"); printf("did you mount with '-o dax'?\n"); return 1; } memset(addr, 0x55, 4096); if (munmap(addr, 4096) == -1) { perror("munmap"); return 1; } close(fd); return 0; } Fixes: 73b20c8 ("arm64: mm: implement pte_devmap support") Reported-by: Yibo Cai <Yibo.Cai@arm.com> Acked-by: Will Deacon <will@kernel.org> Acked-by: Robin Murphy <Robin.Murphy@arm.com> Signed-off-by: Jia He <justin.he@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

commit d0a255e upstream. A deadlock with this stacktrace was observed. The loop thread does a GFP_KERNEL allocation, it calls into dm-bufio shrinker and the shrinker depends on I/O completion in the dm-bufio subsystem. In order to fix the deadlock (and other similar ones), we set the flag PF_MEMALLOC_NOIO at loop thread entry. PID: 474 TASK: ffff8813e11f4600 CPU: 10 COMMAND: "kswapd0" #0 [ffff8813dedfb938] __schedule at ffffffff8173f405 jwrdegoede#1 [ffff8813dedfb990] schedule at ffffffff8173fa27 jwrdegoede#2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec jwrdegoede#3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186 jwrdegoede#4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f jwrdegoede#5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8 jwrdegoede#6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81 linux-sunxi#7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio] linux-sunxi#8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio] linux-sunxi#9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio] linux-sunxi#10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce linux-sunxi#11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778 linux-sunxi#12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f linux-sunxi#13 [ffff8813dedfbec0] kthread at ffffffff810a8428 linux-sunxi#14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242 PID: 14127 TASK: ffff881455749c00 CPU: 11 COMMAND: "loop1" #0 [ffff88272f5af228] __schedule at ffffffff8173f405 jwrdegoede#1 [ffff88272f5af280] schedule at ffffffff8173fa27 jwrdegoede#2 [ffff88272f5af2a0] schedule_preempt_disabled at ffffffff8173fd5e jwrdegoede#3 [ffff88272f5af2b0] __mutex_lock_slowpath at ffffffff81741fb5 jwrdegoede#4 [ffff88272f5af330] mutex_lock at ffffffff81742133 jwrdegoede#5 [ffff88272f5af350] dm_bufio_shrink_count at ffffffffa03865f9 [dm_bufio] jwrdegoede#6 [ffff88272f5af380] shrink_slab at ffffffff811a86bd linux-sunxi#7 [ffff88272f5af470] shrink_zone at ffffffff811ad778 linux-sunxi#8 [ffff88272f5af500] do_try_to_free_pages at ffffffff811adb34 linux-sunxi#9 [ffff88272f5af590] try_to_free_pages at ffffffff811adef8 linux-sunxi#10 [ffff88272f5af610] __alloc_pages_nodemask at ffffffff811a09c3 linux-sunxi#11 [ffff88272f5af710] alloc_pages_current at ffffffff811e8b71 linux-sunxi#12 [ffff88272f5af760] new_slab at ffffffff811f4523 linux-sunxi#13 [ffff88272f5af7b0] __slab_alloc at ffffffff8173a1b5 linux-sunxi#14 [ffff88272f5af880] kmem_cache_alloc at ffffffff811f484b linux-sunxi#15 [ffff88272f5af8d0] do_blockdev_direct_IO at ffffffff812535b3 linux-sunxi#16 [ffff88272f5afb00] __blockdev_direct_IO at ffffffff81255dc3 linux-sunxi#17 [ffff88272f5afb30] xfs_vm_direct_IO at ffffffffa01fe3fc [xfs] linux-sunxi#18 [ffff88272f5afb90] generic_file_read_iter at ffffffff81198994 linux-sunxi#19 [ffff88272f5afc50] __dta_xfs_file_read_iter_2398 at ffffffffa020c970 [xfs] linux-sunxi#20 [ffff88272f5afcc0] lo_rw_aio at ffffffffa0377042 [loop] linux-sunxi#21 [ffff88272f5afd70] loop_queue_work at ffffffffa0377c3b [loop] linux-sunxi#22 [ffff88272f5afe60] kthread_worker_fn at ffffffff810a8a0c linux-sunxi#23 [ffff88272f5afec0] kthread at ffffffff810a8428 linux-sunxi#24 [ffff88272f5aff50] ret_from_fork at ffffffff81745242 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Some QPs (e.g. XRC QP) are not tracked in kernel, in this case they have an invalid res and should not be bound to any dynamically-allocated counter in auto mode. This fixes below call trace: BUG: kernel NULL pointer dereference, address: 0000000000000390 PGD 80000001a7233067 P4D 80000001a7233067 PUD 1a7215067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 2 PID: 24822 Comm: ibv_xsrq_pingpo Not tainted 5.4.0-rc5+ linux-sunxi#21 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014 RIP: 0010:rdma_counter_bind_qp_auto+0x142/0x270 [ib_core] Code: e1 48 85 c0 48 89 c2 0f 84 bc 00 00 00 49 8b 06 48 39 42 48 75 d6 40 3a aa 90 00 00 00 75 cd 49 8b 86 00 01 00 00 48 8b 4a 28 <8b> 80 90 03 00 00 39 81 90 03 00 00 75 b4 85 c0 74 b0 48 8b 04 24 RSP: 0018:ffffc900003f39c0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 RDX: ffff88820020ec00 RSI: 0000000000000004 RDI: ffffffffffffffc0 RBP: 0000000000000001 R08: ffff888224149ff0 R09: ffffc900003f3968 R10: ffffffffffffffff R11: ffff8882249c5848 R12: ffffffffffffffff R13: ffff88821d5aca50 R14: ffff8881f7690800 R15: ffff8881ff890000 FS: 00007fe53a3e1740(0000) GS:ffff888237b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000390 CR3: 00000001a7292006 CR4: 00000000003606a0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: _ib_modify_qp+0x3a4/0x3f0 [ib_core] ? lookup_get_idr_uobject.part.8+0x23/0x40 [ib_uverbs] modify_qp+0x322/0x3e0 [ib_uverbs] ib_uverbs_modify_qp+0x43/0x70 [ib_uverbs] ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xb1/0xf0 [ib_uverbs] ib_uverbs_run_method+0x6be/0x760 [ib_uverbs] ? uverbs_disassociate_api+0xd0/0xd0 [ib_uverbs] ib_uverbs_cmd_verbs+0x18d/0x3a0 [ib_uverbs] ? get_acl+0x1a/0x120 ? __alloc_pages_nodemask+0x15d/0x2c0 ib_uverbs_ioctl+0xa7/0x110 [ib_uverbs] do_vfs_ioctl+0xa5/0x610 ksys_ioctl+0x60/0x90 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x48/0x110 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 99fa331 ("RDMA/counter: Add "auto" configuration mode support") Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Ido Kalir <idok@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20191212091214.315005-2-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>

When experimenting with bpf_send_signal() helper in our production environment (5.2 based), we experienced a deadlock in NMI mode: #5 [ffffc9002219f770] queued_spin_lock_slowpath at ffffffff8110be24 #6 [ffffc9002219f770] _raw_spin_lock_irqsave at ffffffff81a43012 linux-sunxi#7 [ffffc9002219f780] try_to_wake_up at ffffffff810e7ecd linux-sunxi#8 [ffffc9002219f7e0] signal_wake_up_state at ffffffff810c7b55 linux-sunxi#9 [ffffc9002219f7f0] __send_signal at ffffffff810c8602 linux-sunxi#10 [ffffc9002219f830] do_send_sig_info at ffffffff810ca31a linux-sunxi#11 [ffffc9002219f868] bpf_send_signal at ffffffff8119d227 linux-sunxi#12 [ffffc9002219f988] bpf_overflow_handler at ffffffff811d4140 linux-sunxi#13 [ffffc9002219f9e0] __perf_event_overflow at ffffffff811d68cf linux-sunxi#14 [ffffc9002219fa10] perf_swevent_overflow at ffffffff811d6a09 linux-sunxi#15 [ffffc9002219fa38] ___perf_sw_event at ffffffff811e0f47 linux-sunxi#16 [ffffc9002219fc30] __schedule at ffffffff81a3e04d linux-sunxi#17 [ffffc9002219fc90] schedule at ffffffff81a3e219 linux-sunxi#18 [ffffc9002219fca0] futex_wait_queue_me at ffffffff8113d1b9 linux-sunxi#19 [ffffc9002219fcd8] futex_wait at ffffffff8113e529 linux-sunxi#20 [ffffc9002219fdf0] do_futex at ffffffff8113ffbc linux-sunxi#21 [ffffc9002219fec0] __x64_sys_futex at ffffffff81140d1c linux-sunxi#22 [ffffc9002219ff38] do_syscall_64 at ffffffff81002602 linux-sunxi#23 [ffffc9002219ff50] entry_SYSCALL_64_after_hwframe at ffffffff81c00068 The above call stack is actually very similar to an issue reported by Commit eac9153 ("bpf/stackmap: Fix deadlock with rq_lock in bpf_get_stack()") by Song Liu. The only difference is bpf_send_signal() helper instead of bpf_get_stack() helper. The above deadlock is triggered with a perf_sw_event. Similar to Commit eac9153, the below almost identical reproducer used tracepoint point sched/sched_switch so the issue can be easily caught. /* stress_test.c */ #include <stdio.h> #include <stdlib.h> #include <sys/mman.h> #include <pthread.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #define THREAD_COUNT 1000 char *filename; void *worker(void *p) { void *ptr; int fd; char *pptr; fd = open(filename, O_RDONLY); if (fd < 0) return NULL; while (1) { struct timespec ts = {0, 1000 + rand() % 2000}; ptr = mmap(NULL, 4096 * 64, PROT_READ, MAP_PRIVATE, fd, 0); usleep(1); if (ptr == MAP_FAILED) { printf("failed to mmap\n"); break; } munmap(ptr, 4096 * 64); usleep(1); pptr = malloc(1); usleep(1); pptr[0] = 1; usleep(1); free(pptr); usleep(1); nanosleep(&ts, NULL); } close(fd); return NULL; } int main(int argc, char *argv[]) { void *ptr; int i; pthread_t threads[THREAD_COUNT]; if (argc < 2) return 0; filename = argv[1]; for (i = 0; i < THREAD_COUNT; i++) { if (pthread_create(threads + i, NULL, worker, NULL)) { fprintf(stderr, "Error creating thread\n"); return 0; } } for (i = 0; i < THREAD_COUNT; i++) pthread_join(threads[i], NULL); return 0; } and the following command: 1. run `stress_test /bin/ls` in one windown 2. hack bcc trace.py with the following change: --- a/tools/trace.py +++ b/tools/trace.py @@ -513,6 +513,7 @@ BPF_PERF_OUTPUT(%s); __data.tgid = __tgid; __data.pid = __pid; bpf_get_current_comm(&__data.comm, sizeof(__data.comm)); + bpf_send_signal(10); %s %s %s.perf_submit(%s, &__data, sizeof(__data)); 3. in a different window run ./trace.py -p $(pidof stress_test) t:sched:sched_switch The deadlock can be reproduced in our production system. Similar to Song's fix, the fix is to delay sending signal if irqs is disabled to avoid deadlocks involving with rq_lock. With this change, my above stress-test in our production system won't cause deadlock any more. I also implemented a scale-down version of reproducer in the selftest (a subsequent commit). With latest bpf-next, it complains for the following potential deadlock. [ 32.832450] -> #1 (&p->pi_lock){-.-.}: [ 32.833100] _raw_spin_lock_irqsave+0x44/0x80 [ 32.833696] task_rq_lock+0x2c/0xa0 [ 32.834182] task_sched_runtime+0x59/0xd0 [ 32.834721] thread_group_cputime+0x250/0x270 [ 32.835304] thread_group_cputime_adjusted+0x2e/0x70 [ 32.835959] do_task_stat+0x8a7/0xb80 [ 32.836461] proc_single_show+0x51/0xb0 ... [ 32.839512] -> #0 (&(&sighand->siglock)->rlock){....}: [ 32.840275] __lock_acquire+0x1358/0x1a20 [ 32.840826] lock_acquire+0xc7/0x1d0 [ 32.841309] _raw_spin_lock_irqsave+0x44/0x80 [ 32.841916] __lock_task_sighand+0x79/0x160 [ 32.842465] do_send_sig_info+0x35/0x90 [ 32.842977] bpf_send_signal+0xa/0x10 [ 32.843464] bpf_prog_bc13ed9e4d3163e3_send_signal_tp_sched+0x465/0x1000 [ 32.844301] trace_call_bpf+0x115/0x270 [ 32.844809] perf_trace_run_bpf_submit+0x4a/0xc0 [ 32.845411] perf_trace_sched_switch+0x10f/0x180 [ 32.846014] __schedule+0x45d/0x880 [ 32.846483] schedule+0x5f/0xd0 ... [ 32.853148] Chain exists of: [ 32.853148] &(&sighand->siglock)->rlock --> &p->pi_lock --> &rq->lock [ 32.853148] [ 32.854451] Possible unsafe locking scenario: [ 32.854451] [ 32.855173] CPU0 CPU1 [ 32.855745] ---- ---- [ 32.856278] lock(&rq->lock); [ 32.856671] lock(&p->pi_lock); [ 32.857332] lock(&rq->lock); [ 32.857999] lock(&(&sighand->siglock)->rlock); Deadlock happens on CPU0 when it tries to acquire &sighand->siglock but it has been held by CPU1 and CPU1 tries to grab &rq->lock and cannot get it. This is not exactly the callstack in our production environment, but sympotom is similar and both locks are using spin_lock_irqsave() to acquire the lock, and both involves rq_lock. The fix to delay sending signal when irq is disabled also fixed this issue. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200304191104.2796501-1-yhs@fb.com

[ Upstream commit e24c644 ] I compiled with AddressSanitizer and I had these memory leaks while I was using the tep_parse_format function: Direct leak of 28 byte(s) in 4 object(s) allocated from: #0 0x7fb07db49ffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe) jwrdegoede#1 0x7fb07a724228 in extend_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:985 jwrdegoede#2 0x7fb07a724c21 in __read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1140 jwrdegoede#3 0x7fb07a724f78 in read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1206 jwrdegoede#4 0x7fb07a725191 in __read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1291 jwrdegoede#5 0x7fb07a7251df in read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1299 jwrdegoede#6 0x7fb07a72e6c8 in process_dynamic_array_len /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:2849 linux-sunxi#7 0x7fb07a7304b8 in process_function /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3161 linux-sunxi#8 0x7fb07a730900 in process_arg_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3207 linux-sunxi#9 0x7fb07a727c0b in process_arg /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1786 linux-sunxi#10 0x7fb07a731080 in event_read_print_args /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3285 linux-sunxi#11 0x7fb07a731722 in event_read_print /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3369 linux-sunxi#12 0x7fb07a740054 in __tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6335 linux-sunxi#13 0x7fb07a74047a in __parse_event /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6389 linux-sunxi#14 0x7fb07a740536 in tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6431 linux-sunxi#15 0x7fb07a785acf in parse_event ../../../src/fs-src/fs.c:251 linux-sunxi#16 0x7fb07a785ccd in parse_systems ../../../src/fs-src/fs.c:284 linux-sunxi#17 0x7fb07a786fb3 in read_metadata ../../../src/fs-src/fs.c:593 linux-sunxi#18 0x7fb07a78760e in ftrace_fs_source_init ../../../src/fs-src/fs.c:727 linux-sunxi#19 0x7fb07d90c19c in add_component_with_init_method_data ../../../../src/lib/graph/graph.c:1048 linux-sunxi#20 0x7fb07d90c87b in add_source_component_with_initialize_method_data ../../../../src/lib/graph/graph.c:1127 linux-sunxi#21 0x7fb07d90c92a in bt_graph_add_source_component ../../../../src/lib/graph/graph.c:1152 linux-sunxi#22 0x55db11aa632e in cmd_run_ctx_create_components_from_config_components ../../../src/cli/babeltrace2.c:2252 linux-sunxi#23 0x55db11aa6fda in cmd_run_ctx_create_components ../../../src/cli/babeltrace2.c:2347 linux-sunxi#24 0x55db11aa780c in cmd_run ../../../src/cli/babeltrace2.c:2461 linux-sunxi#25 0x55db11aa8a7d in main ../../../src/cli/babeltrace2.c:2673 linux-sunxi#26 0x7fb07d5460b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2) The token variable in the process_dynamic_array_len function is allocated in the read_expect_type function, but is not freed before calling the read_token function. Free the token variable before calling read_token in order to plug the leak. Signed-off-by: Philippe Duplessis-Guindon <pduplessis@efficios.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/linux-trace-devel/20200730150236.5392-1-pduplessis@efficios.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

This fix is for a failure that occurred in the DWARF unwind perf test. Stack unwinders may probe memory when looking for frames. Memory sanitizer will poison and track uninitialized memory on the stack, and on the heap if the value is copied to the heap. This can lead to false memory sanitizer failures for the use of an uninitialized value. Avoid this problem by removing the poison on the copied stack. The full msan failure with track origins looks like: ==2168==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x559ceb10755b in handle_cfi elfutils/libdwfl/frame_unwind.c:648:8 #1 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4 #2 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7 #3 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10 #4 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17 #5 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17 #6 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14 linux-sunxi#7 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10 linux-sunxi#8 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8 linux-sunxi#9 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8 linux-sunxi#10 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26 linux-sunxi#11 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0) linux-sunxi#12 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2 linux-sunxi#13 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9 linux-sunxi#14 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9 linux-sunxi#15 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8 linux-sunxi#16 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9 linux-sunxi#17 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9 linux-sunxi#18 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4 linux-sunxi#19 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9 linux-sunxi#20 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11 linux-sunxi#21 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8 linux-sunxi#22 0x559cea95fbce in run_argv tools/perf/perf.c:409:2 linux-sunxi#23 0x559cea95fbce in main tools/perf/perf.c:539:3 Uninitialized value was stored to memory at #0 0x559ceb106acf in __libdwfl_frame_reg_set elfutils/libdwfl/frame_unwind.c:77:22 #1 0x559ceb106acf in handle_cfi elfutils/libdwfl/frame_unwind.c:627:13 #2 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4 #3 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7 #4 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10 #5 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17 #6 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17 linux-sunxi#7 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14 linux-sunxi#8 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10 linux-sunxi#9 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8 linux-sunxi#10 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8 linux-sunxi#11 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26 linux-sunxi#12 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0) linux-sunxi#13 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2 linux-sunxi#14 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9 linux-sunxi#15 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9 linux-sunxi#16 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8 linux-sunxi#17 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9 linux-sunxi#18 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9 linux-sunxi#19 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4 linux-sunxi#20 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9 linux-sunxi#21 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11 linux-sunxi#22 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8 linux-sunxi#23 0x559cea95fbce in run_argv tools/perf/perf.c:409:2 linux-sunxi#24 0x559cea95fbce in main tools/perf/perf.c:539:3 Uninitialized value was stored to memory at #0 0x559ceb106a54 in handle_cfi elfutils/libdwfl/frame_unwind.c:613:9 #1 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4 #2 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7 #3 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10 #4 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17 #5 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17 #6 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14 linux-sunxi#7 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10 linux-sunxi#8 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8 linux-sunxi#9 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8 linux-sunxi#10 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26 linux-sunxi#11 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0) linux-sunxi#12 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2 linux-sunxi#13 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9 linux-sunxi#14 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9 linux-sunxi#15 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8 linux-sunxi#16 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9 linux-sunxi#17 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9 linux-sunxi#18 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4 linux-sunxi#19 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9 linux-sunxi#20 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11 linux-sunxi#21 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8 linux-sunxi#22 0x559cea95fbce in run_argv tools/perf/perf.c:409:2 linux-sunxi#23 0x559cea95fbce in main tools/perf/perf.c:539:3 Uninitialized value was stored to memory at #0 0x559ceaff8800 in memory_read tools/perf/util/unwind-libdw.c:156:10 #1 0x559ceb10f053 in expr_eval elfutils/libdwfl/frame_unwind.c:501:13 #2 0x559ceb1060cc in handle_cfi elfutils/libdwfl/frame_unwind.c:603:18 #3 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4 #4 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7 #5 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10 #6 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17 linux-sunxi#7 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17 linux-sunxi#8 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14 linux-sunxi#9 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10 linux-sunxi#10 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8 linux-sunxi#11 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8 linux-sunxi#12 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26 linux-sunxi#13 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0) linux-sunxi#14 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2 linux-sunxi#15 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9 linux-sunxi#16 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9 linux-sunxi#17 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8 linux-sunxi#18 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9 linux-sunxi#19 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9 linux-sunxi#20 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4 linux-sunxi#21 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9 linux-sunxi#22 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11 linux-sunxi#23 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8 linux-sunxi#24 0x559cea95fbce in run_argv tools/perf/perf.c:409:2 linux-sunxi#25 0x559cea95fbce in main tools/perf/perf.c:539:3 Uninitialized value was stored to memory at #0 0x559cea9027d9 in __msan_memcpy llvm/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:1558:3 #1 0x559cea9d2185 in sample_ustack tools/perf/arch/x86/tests/dwarf-unwind.c:41:2 #2 0x559cea9d202c in test__arch_unwind_sample tools/perf/arch/x86/tests/dwarf-unwind.c:72:9 #3 0x559ceabc9cbd in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:106:6 #4 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26 #5 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0) #6 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2 linux-sunxi#7 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9 linux-sunxi#8 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9 linux-sunxi#9 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8 linux-sunxi#10 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9 linux-sunxi#11 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9 linux-sunxi#12 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4 linux-sunxi#13 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9 linux-sunxi#14 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11 linux-sunxi#15 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8 linux-sunxi#16 0x559cea95fbce in run_argv tools/perf/perf.c:409:2 linux-sunxi#17 0x559cea95fbce in main tools/perf/perf.c:539:3 Uninitialized value was created by an allocation of 'bf' in the stack frame of function 'perf_event__synthesize_mmap_events' #0 0x559ceafc5f60 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:445 SUMMARY: MemorySanitizer: use-of-uninitialized-value elfutils/libdwfl/frame_unwind.c:648:8 in handle_cfi Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: clang-built-linux@googlegroups.com Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandeep Dasgupta <sdasgup@google.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20201113182053.754625-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

When using kprobe on powerpc booke series processor, Oops happens as show bellow: / # echo "p:myprobe do_nanosleep" > /sys/kernel/debug/tracing/kprobe_events / # echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable / # sleep 1 [ 50.076730] Oops: Exception in kernel mode, sig: 5 [#1] [ 50.077017] BE PAGE_SIZE=4K SMP NR_CPUS=24 QEMU e500 [ 50.077221] Modules linked in: [ 50.077462] CPU: 0 PID: 77 Comm: sleep Not tainted 5.14.0-rc4-00022-g251a1524293d linux-sunxi#21 [ 50.077887] NIP: c0b9c4e0 LR: c00ebecc CTR: 00000000 [ 50.078067] REGS: c3883de0 TRAP: 0700 Not tainted (5.14.0-rc4-00022-g251a1524293d) [ 50.078349] MSR: 00029000 <CE,EE,ME> CR: 24000228 XER: 20000000 [ 50.078675] [ 50.078675] GPR00: c00ebdf0 c3883e90 c313e300 c3883ea0 00000001 00000000 c3883ecc 00000001 [ 50.078675] GPR08: c100598c c00ea250 00000004 00000000 24000222 102490c2 bff4180c 101e60d4 [ 50.078675] GPR16: 00000000 102454ac 00000040 10240000 10241100 102410f 10240000 00500000 [ 50.078675] GPR24: 00000002 00000000 c3883ea0 00000001 00000000 0000c350 3b9b8d50 00000000 [ 50.080151] NIP [c0b9c4e0] do_nanosleep+0x0/0x190 [ 50.080352] LR [c00ebecc] hrtimer_nanosleep+0x14c/0x1e0 [ 50.080638] Call Trace: [ 50.080801] [c3883e90] [c00ebdf0] hrtimer_nanosleep+0x70/0x1e0 (unreliable) [ 50.081110] [c3883f00] [c00ec004] sys_nanosleep_time32+0xa4/0x110 [ 50.081336] [c3883f40] [c001509c] ret_from_syscall+0x0/0x28 [ 50.081541] --- interrupt: c00 at 0x100a4d08 [ 50.081749] NIP: 100a4d08 LR: 101b5234 CTR: 00000003 [ 50.081931] REGS: c3883f50 TRAP: 0c00 Not tainted (5.14.0-rc4-00022-g251a1524293d) [ 50.082183] MSR: 0002f902 <CE,EE,PR,FP,ME> CR: 24000222 XER: 00000000 [ 50.082457] [ 50.082457] GPR00: 000000a2 bf980040 1024b4d0 bf980084 bf980084 64000000 00555345 fefefeff [ 50.082457] GPR08: 7f7f7f7f 101e0000 00000069 00000003 28000422 102490c2 bff4180c 101e60d4 [ 50.082457] GPR16: 00000000 102454ac 00000040 10240000 10241100 102410f 10240000 00500000 [ 50.082457] GPR24: 00000002 bf9803f4 10240000 00000000 00000000 100039e0 00000000 102444e8 [ 50.083789] NIP [100a4d08] 0x100a4d08 [ 50.083917] LR [101b5234] 0x101b5234 [ 50.084042] --- interrupt: c00 [ 50.084238] Instruction dump: [ 50.084483] 4bfffc40 60000000 60000000 60000000 9421fff0 39400402 914200c0 38210010 [ 50.084841] 4bfffc20 00000000 00000000 00000000 <7fe00008> 7c0802a6 7c892378 93c10048 [ 50.085487] ---[ end trace f6fffe98e2fa8f3e ]--- [ 50.085678] Trace/breakpoint trap There is no real mode for booke arch and the MMU translation is always on. The corresponding MSR_IS/MSR_DS bit in booke is used to switch the address space, but not for real mode judgment. Fixes: 21f8b2f ("powerpc/kprobes: Ignore traps that happened in real mode") Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210809023658.218915-1-pulehui@huawei.com

[ Upstream commit 43e8f76 ] When using kprobe on powerpc booke series processor, Oops happens as show bellow: / # echo "p:myprobe do_nanosleep" > /sys/kernel/debug/tracing/kprobe_events / # echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable / # sleep 1 [ 50.076730] Oops: Exception in kernel mode, sig: 5 [jwrdegoede#1] [ 50.077017] BE PAGE_SIZE=4K SMP NR_CPUS=24 QEMU e500 [ 50.077221] Modules linked in: [ 50.077462] CPU: 0 PID: 77 Comm: sleep Not tainted 5.14.0-rc4-00022-g251a1524293d linux-sunxi#21 [ 50.077887] NIP: c0b9c4e0 LR: c00ebecc CTR: 00000000 [ 50.078067] REGS: c3883de0 TRAP: 0700 Not tainted (5.14.0-rc4-00022-g251a1524293d) [ 50.078349] MSR: 00029000 <CE,EE,ME> CR: 24000228 XER: 20000000 [ 50.078675] [ 50.078675] GPR00: c00ebdf0 c3883e90 c313e300 c3883ea0 00000001 00000000 c3883ecc 00000001 [ 50.078675] GPR08: c100598c c00ea250 00000004 00000000 24000222 102490c2 bff4180c 101e60d4 [ 50.078675] GPR16: 00000000 102454ac 00000040 10240000 10241100 102410f 10240000 00500000 [ 50.078675] GPR24: 00000002 00000000 c3883ea0 00000001 00000000 0000c350 3b9b8d50 00000000 [ 50.080151] NIP [c0b9c4e0] do_nanosleep+0x0/0x190 [ 50.080352] LR [c00ebecc] hrtimer_nanosleep+0x14c/0x1e0 [ 50.080638] Call Trace: [ 50.080801] [c3883e90] [c00ebdf0] hrtimer_nanosleep+0x70/0x1e0 (unreliable) [ 50.081110] [c3883f00] [c00ec004] sys_nanosleep_time32+0xa4/0x110 [ 50.081336] [c3883f40] [c001509c] ret_from_syscall+0x0/0x28 [ 50.081541] --- interrupt: c00 at 0x100a4d08 [ 50.081749] NIP: 100a4d08 LR: 101b5234 CTR: 00000003 [ 50.081931] REGS: c3883f50 TRAP: 0c00 Not tainted (5.14.0-rc4-00022-g251a1524293d) [ 50.082183] MSR: 0002f902 <CE,EE,PR,FP,ME> CR: 24000222 XER: 00000000 [ 50.082457] [ 50.082457] GPR00: 000000a2 bf980040 1024b4d0 bf980084 bf980084 64000000 00555345 fefefeff [ 50.082457] GPR08: 7f7f7f7f 101e0000 00000069 00000003 28000422 102490c2 bff4180c 101e60d4 [ 50.082457] GPR16: 00000000 102454ac 00000040 10240000 10241100 102410f 10240000 00500000 [ 50.082457] GPR24: 00000002 bf9803f4 10240000 00000000 00000000 100039e0 00000000 102444e8 [ 50.083789] NIP [100a4d08] 0x100a4d08 [ 50.083917] LR [101b5234] 0x101b5234 [ 50.084042] --- interrupt: c00 [ 50.084238] Instruction dump: [ 50.084483] 4bfffc40 60000000 60000000 60000000 9421fff0 39400402 914200c0 38210010 [ 50.084841] 4bfffc20 00000000 00000000 00000000 <7fe00008> 7c0802a6 7c892378 93c10048 [ 50.085487] ---[ end trace f6fffe98e2fa8f3e ]--- [ 50.085678] Trace/breakpoint trap There is no real mode for booke arch and the MMU translation is always on. The corresponding MSR_IS/MSR_DS bit in booke is used to switch the address space, but not for real mode judgment. Fixes: 21f8b2f ("powerpc/kprobes: Ignore traps that happened in real mode") Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210809023658.218915-1-pulehui@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>

The id of slv_cnoc_mnoc_cfg node is mistakenly coded as id of slv_blsp_1. It causes the following warning on slv_blsp_1 node adding. Correct the id of slv_cnoc_mnoc_cfg node. [ 1.948180] ------------[ cut here ]------------ [ 1.954122] WARNING: CPU: 2 PID: 7 at drivers/interconnect/core.c:962 icc_node_add+0xe4/0xf8 [ 1.958994] Modules linked in: [ 1.967399] CPU: 2 PID: 7 Comm: kworker/u16:0 Not tainted 5.14.0-rc6-next-20210818 linux-sunxi#21 [ 1.970275] Hardware name: Xiaomi Redmi Note 7 (DT) [ 1.978169] Workqueue: events_unbound deferred_probe_work_func [ 1.982945] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 1.988849] pc : icc_node_add+0xe4/0xf8 [ 1.995699] lr : qnoc_probe+0x350/0x438 [ 1.999519] sp : ffff80001008bb10 [ 2.003337] x29: ffff80001008bb10 x28: 000000000000001a x27: ffffb83ddc61ee28 [ 2.006818] x26: ffff2fe341d44080 x25: ffff2fe340f3aa80 x24: ffffb83ddc98f0e8 [ 2.013938] x23: 0000000000000024 x22: ffff2fe3408b7400 x21: 0000000000000000 [ 2.021054] x20: ffff2fe3408b7410 x19: ffff2fe341d44080 x18: 0000000000000010 [ 2.028173] x17: ffff2fe3bdd0aac0 x16: 0000000000000281 x15: ffff2fe3400f5528 [ 2.035290] x14: 000000000000013f x13: ffff2fe3400f5528 x12: 00000000ffffffea [ 2.042410] x11: ffffb83ddc9109d0 x10: ffffb83ddc8f8990 x9 : ffffb83ddc8f89e8 [ 2.049527] x8 : 0000000000017fe8 x7 : c0000000ffffefff x6 : 0000000000000001 [ 2.056645] x5 : 0000000000057fa8 x4 : 0000000000000000 x3 : ffffb83ddc9903b0 [ 2.063764] x2 : 1a1f6fde34d45500 x1 : ffff2fe340f3a880 x0 : ffff2fe340f3a880 [ 2.070882] Call trace: [ 2.077989] icc_node_add+0xe4/0xf8 [ 2.080247] qnoc_probe+0x350/0x438 [ 2.083718] platform_probe+0x68/0xd8 [ 2.087191] really_probe+0xb8/0x300 [ 2.091011] __driver_probe_device+0x78/0xe0 [ 2.094659] driver_probe_device+0x80/0x110 [ 2.098911] __device_attach_driver+0x90/0xe0 [ 2.102818] bus_for_each_drv+0x78/0xc8 [ 2.107331] __device_attach+0xf0/0x150 [ 2.110977] device_initial_probe+0x14/0x20 [ 2.114796] bus_probe_device+0x9c/0xa8 [ 2.118963] deferred_probe_work_func+0x88/0xc0 [ 2.122784] process_one_work+0x1a4/0x338 [ 2.127296] worker_thread+0x1f8/0x420 [ 2.131464] kthread+0x150/0x160 [ 2.135107] ret_from_fork+0x10/0x20 [ 2.138495] ---[ end trace 5eea8768cb620e87 ]--- Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Fixes: f80a1d4 ("interconnect: qcom: Add SDM660 interconnect provider driver") Link: https://lore.kernel.org/r/20210823014003.31391-1-shawn.guo@linaro.org Signed-off-by: Georgi Djakov <djakov@kernel.org>

Attempting to defragment a Btrfs file containing a transparent huge page immediately deadlocks with the following stack trace: #0 context_switch (kernel/sched/core.c:4940:2) #1 __schedule (kernel/sched/core.c:6287:8) #2 schedule (kernel/sched/core.c:6366:3) #3 io_schedule (kernel/sched/core.c:8389:2) #4 wait_on_page_bit_common (mm/filemap.c:1356:4) #5 __lock_page (mm/filemap.c:1648:2) #6 lock_page (./include/linux/pagemap.h:625:3) linux-sunxi#7 pagecache_get_page (mm/filemap.c:1910:4) linux-sunxi#8 find_or_create_page (./include/linux/pagemap.h:420:9) linux-sunxi#9 defrag_prepare_one_page (fs/btrfs/ioctl.c:1068:9) linux-sunxi#10 defrag_one_range (fs/btrfs/ioctl.c:1326:14) linux-sunxi#11 defrag_one_cluster (fs/btrfs/ioctl.c:1421:9) linux-sunxi#12 btrfs_defrag_file (fs/btrfs/ioctl.c:1523:9) linux-sunxi#13 btrfs_ioctl_defrag (fs/btrfs/ioctl.c:3117:9) linux-sunxi#14 btrfs_ioctl (fs/btrfs/ioctl.c:4872:10) linux-sunxi#15 vfs_ioctl (fs/ioctl.c:51:10) linux-sunxi#16 __do_sys_ioctl (fs/ioctl.c:874:11) linux-sunxi#17 __se_sys_ioctl (fs/ioctl.c:860:1) linux-sunxi#18 __x64_sys_ioctl (fs/ioctl.c:860:1) linux-sunxi#19 do_syscall_x64 (arch/x86/entry/common.c:50:14) linux-sunxi#20 do_syscall_64 (arch/x86/entry/common.c:80:7) linux-sunxi#21 entry_SYSCALL_64+0x7c/0x15b (arch/x86/entry/entry_64.S:113) A huge page is represented by a compound page, which consists of a struct page for each PAGE_SIZE page within the huge page. The first struct page is the "head page", and the remaining are "tail pages". Defragmentation attempts to lock each page in the range. However, lock_page() on a tail page actually locks the corresponding head page. So, if defragmentation tries to lock more than one struct page in a compound page, it tries to lock the same head page twice and deadlocks with itself. Ideally, we should be able to defragment transparent huge pages. However, THP for filesystems is currently read-only, so a lot of code is not ready to use huge pages for I/O. For now, let's just return ETXTBUSY. This can be reproduced with the following on a kernel with CONFIG_READ_ONLY_THP_FOR_FS=y: $ cat create_thp_file.c #include <fcntl.h> #include <stdbool.h> #include <stdio.h> #include <stdint.h> #include <stdlib.h> #include <unistd.h> #include <sys/mman.h> static const char zeroes[1024 * 1024]; static const size_t FILE_SIZE = 2 * 1024 * 1024; int main(int argc, char **argv) { if (argc != 2) { fprintf(stderr, "usage: %s PATH\n", argv[0]); return EXIT_FAILURE; } int fd = creat(argv[1], 0777); if (fd == -1) { perror("creat"); return EXIT_FAILURE; } size_t written = 0; while (written < FILE_SIZE) { ssize_t ret = write(fd, zeroes, sizeof(zeroes) < FILE_SIZE - written ? sizeof(zeroes) : FILE_SIZE - written); if (ret < 0) { perror("write"); return EXIT_FAILURE; } written += ret; } close(fd); fd = open(argv[1], O_RDONLY); if (fd == -1) { perror("open"); return EXIT_FAILURE; } /* * Reserve some address space so that we can align the file mapping to * the huge page size. */ void *placeholder_map = mmap(NULL, FILE_SIZE * 2, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (placeholder_map == MAP_FAILED) { perror("mmap (placeholder)"); return EXIT_FAILURE; } void *aligned_address = (void *)(((uintptr_t)placeholder_map + FILE_SIZE - 1) & ~(FILE_SIZE - 1)); void *map = mmap(aligned_address, FILE_SIZE, PROT_READ | PROT_EXEC, MAP_SHARED | MAP_FIXED, fd, 0); if (map == MAP_FAILED) { perror("mmap"); return EXIT_FAILURE; } if (madvise(map, FILE_SIZE, MADV_HUGEPAGE) < 0) { perror("madvise"); return EXIT_FAILURE; } char *line = NULL; size_t line_capacity = 0; FILE *smaps_file = fopen("/proc/self/smaps", "r"); if (!smaps_file) { perror("fopen"); return EXIT_FAILURE; } for (;;) { for (size_t off = 0; off < FILE_SIZE; off += 4096) ((volatile char *)map)[off]; ssize_t ret; bool this_mapping = false; while ((ret = getline(&line, &line_capacity, smaps_file)) > 0) { unsigned long start, end, huge; if (sscanf(line, "%lx-%lx", &start, &end) == 2) { this_mapping = (start <= (uintptr_t)map && (uintptr_t)map < end); } else if (this_mapping && sscanf(line, "FilePmdMapped: %ld", &huge) == 1 && huge > 0) { return EXIT_SUCCESS; } } sleep(6); rewind(smaps_file); fflush(smaps_file); } } $ ./create_thp_file huge $ btrfs fi defrag -czstd ./huge Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

[ Upstream commit 4224cfd ] When bringing down the netdevice or system shutdown, a panic can be triggered while accessing the sysfs path because the device is already removed. [ 755.549084] mlx5_core 0000:12:00.1: Shutdown was called [ 756.404455] mlx5_core 0000:12:00.0: Shutdown was called ... [ 757.937260] BUG: unable to handle kernel NULL pointer dereference at (null) [ 758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280 crash> bt ... PID: 12649 TASK: ffff8924108f2100 CPU: 1 COMMAND: "amsd" ... linux-sunxi#9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778 [exception RIP: dma_pool_alloc+0x1ab] RIP: ffffffff8ee11acb RSP: ffff89240e1a3968 RFLAGS: 00010046 RAX: 0000000000000246 RBX: ffff89243d874100 RCX: 0000000000001000 RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff89243d874090 RBP: ffff89240e1a39c0 R8: 000000000001f080 R9: ffff8905ffc03c00 R10: ffffffffc04680d4 R11: ffffffff8edde9fd R12: 00000000000080d0 R13: ffff89243d874090 R14: ffff89243d874080 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 linux-sunxi#10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core] linux-sunxi#11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core] linux-sunxi#12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core] linux-sunxi#13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core] linux-sunxi#14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core] linux-sunxi#15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core] linux-sunxi#16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core] linux-sunxi#17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46 linux-sunxi#18 [ffff89240e1a3d48] speed_show at ffffffff8f277208 linux-sunxi#19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3 linux-sunxi#20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf linux-sunxi#21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596 linux-sunxi#22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10 linux-sunxi#23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5 linux-sunxi#24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff linux-sunxi#25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f linux-sunxi#26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92 crash> net_device.state ffff89443b0c0000 state = 0x5 (__LINK_STATE_START| __LINK_STATE_NOCARRIER) To prevent this scenario, we also make sure that the netdevice is present. Signed-off-by: suresh kumar <suresh2514@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit 9de255c ] Commit b743512 ("uio: fix a sleep-in-atomic-context bug in uio_dmem_genirq_irqcontrol()") started calling disable_irq() without holding the spinlock because it can sleep. However, that fix introduced another bug: if interrupt is already disabled and a new disable request comes in, then the spinlock is not unlocked: root@localhost:~# printf '\x00\x00\x00\x00' > /dev/uio0 root@localhost:~# printf '\x00\x00\x00\x00' > /dev/uio0 root@localhost:~# [ 14.851538] BUG: scheduling while atomic: bash/223/0x00000002 [ 14.851991] Modules linked in: uio_dmem_genirq uio myfpga(OE) bochs drm_vram_helper drm_ttm_helper ttm drm_kms_helper drm snd_pcm ppdev joydev psmouse snd_timer snd e1000fb_sys_fops syscopyarea parport sysfillrect soundcore sysimgblt input_leds pcspkr i2c_piix4 serio_raw floppy evbug qemu_fw_cfg mac_hid pata_acpi ip_tables x_tables autofs4 [last unloaded: parport_pc] [ 14.854206] CPU: 0 PID: 223 Comm: bash Tainted: G OE 6.0.0-rc7 linux-sunxi#21 [ 14.854786] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 [ 14.855664] Call Trace: [ 14.855861] <TASK> [ 14.856025] dump_stack_lvl+0x4d/0x67 [ 14.856325] dump_stack+0x14/0x1a [ 14.856583] __schedule_bug.cold+0x4b/0x5c [ 14.856915] __schedule+0xe81/0x13d0 [ 14.857199] ? idr_find+0x13/0x20 [ 14.857456] ? get_work_pool+0x2d/0x50 [ 14.857756] ? __flush_work+0x233/0x280 [ 14.858068] ? __schedule+0xa95/0x13d0 [ 14.858307] ? idr_find+0x13/0x20 [ 14.858519] ? get_work_pool+0x2d/0x50 [ 14.858798] schedule+0x6c/0x100 [ 14.859009] schedule_hrtimeout_range_clock+0xff/0x110 [ 14.859335] ? tty_write_room+0x1f/0x30 [ 14.859598] ? n_tty_poll+0x1ec/0x220 [ 14.859830] ? tty_ldisc_deref+0x1a/0x20 [ 14.860090] schedule_hrtimeout_range+0x17/0x20 [ 14.860373] do_select+0x596/0x840 [ 14.860627] ? __kernel_text_address+0x16/0x50 [ 14.860954] ? poll_freewait+0xb0/0xb0 [ 14.861235] ? poll_freewait+0xb0/0xb0 [ 14.861517] ? rpm_resume+0x49d/0x780 [ 14.861798] ? common_interrupt+0x59/0xa0 [ 14.862127] ? asm_common_interrupt+0x2b/0x40 [ 14.862511] ? __uart_start.isra.0+0x61/0x70 [ 14.862902] ? __check_object_size+0x61/0x280 [ 14.863255] core_sys_select+0x1c6/0x400 [ 14.863575] ? vfs_write+0x1c9/0x3d0 [ 14.863853] ? vfs_write+0x1c9/0x3d0 [ 14.864121] ? _copy_from_user+0x45/0x70 [ 14.864526] do_pselect.constprop.0+0xb3/0xf0 [ 14.864893] ? do_syscall_64+0x6d/0x90 [ 14.865228] ? do_syscall_64+0x6d/0x90 [ 14.865556] __x64_sys_pselect6+0x76/0xa0 [ 14.865906] do_syscall_64+0x60/0x90 [ 14.866214] ? syscall_exit_to_user_mode+0x2a/0x50 [ 14.866640] ? do_syscall_64+0x6d/0x90 [ 14.866972] ? do_syscall_64+0x6d/0x90 [ 14.867286] ? do_syscall_64+0x6d/0x90 [ 14.867626] entry_SYSCALL_64_after_hwframe+0x63/0xcd [...] stripped [ 14.872959] </TASK> ('myfpga' is a simple 'uio_dmem_genirq' driver I wrote to test this) The implementation of "uio_dmem_genirq" was based on "uio_pdrv_genirq" and it is used in a similar manner to the "uio_pdrv_genirq" driver with respect to interrupt configuration and handling. At the time "uio_dmem_genirq" was introduced, both had the same implementation of the 'uio_info' handlers irqcontrol() and handler(). Then commit 34cb275 ("UIO: Fix concurrency issue"), which was only applied to "uio_pdrv_genirq", ended up making them a little different. That commit, among other things, changed disable_irq() to disable_irq_nosync() in the implementation of irqcontrol(). The motivation there was to avoid a deadlock between irqcontrol() and handler(), since it added a spinlock in the irq handler, and disable_irq() waits for the completion of the irq handler. By changing disable_irq() to disable_irq_nosync() in irqcontrol(), we also avoid the sleeping-while-atomic bug that commit b743512 ("uio: fix a sleep-in-atomic-context bug in uio_dmem_genirq_irqcontrol()") was trying to fix. Thus, this fixes the missing unlock in irqcontrol() by importing the implementation of irqcontrol() handler from the "uio_pdrv_genirq" driver. In the end, it reverts commit b743512 ("uio: fix a sleep-in-atomic-context bug in uio_dmem_genirq_irqcontrol()") and change disable_irq() to disable_irq_nosync(). It is worth noting that this still does not address the concurrency issue fixed by commit 34cb275 ("UIO: Fix concurrency issue"). It will be addressed separately in the next commits. Split out from commit 34cb275 ("UIO: Fix concurrency issue"). Fixes: b743512 ("uio: fix a sleep-in-atomic-context bug in uio_dmem_genirq_irqcontrol()") Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com> Link: https://lore.kernel.org/r/20220930224100.816175-2-rafaelmendsr@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit 4e264be ] When a system with E810 with existing VFs gets rebooted the following hang may be observed. Pid 1 is hung in iavf_remove(), part of a network driver: PID: 1 TASK: ffff965400e5a340 CPU: 24 COMMAND: "systemd-shutdow" #0 [ffffaad04005fa50] __schedule at ffffffff8b3239cb jwrdegoede#1 [ffffaad04005fae8] schedule at ffffffff8b323e2d jwrdegoede#2 [ffffaad04005fb00] schedule_hrtimeout_range_clock at ffffffff8b32cebc jwrdegoede#3 [ffffaad04005fb80] usleep_range_state at ffffffff8b32c930 jwrdegoede#4 [ffffaad04005fbb0] iavf_remove at ffffffffc12b9b4c [iavf] jwrdegoede#5 [ffffaad04005fbf0] pci_device_remove at ffffffff8add7513 jwrdegoede#6 [ffffaad04005fc10] device_release_driver_internal at ffffffff8af08baa linux-sunxi#7 [ffffaad04005fc40] pci_stop_bus_device at ffffffff8adcc5fc linux-sunxi#8 [ffffaad04005fc60] pci_stop_and_remove_bus_device at ffffffff8adcc81e linux-sunxi#9 [ffffaad04005fc70] pci_iov_remove_virtfn at ffffffff8adf9429 linux-sunxi#10 [ffffaad04005fca8] sriov_disable at ffffffff8adf98e4 linux-sunxi#11 [ffffaad04005fcc8] ice_free_vfs at ffffffffc04bb2c8 [ice] linux-sunxi#12 [ffffaad04005fd10] ice_remove at ffffffffc04778fe [ice] linux-sunxi#13 [ffffaad04005fd38] ice_shutdown at ffffffffc0477946 [ice] linux-sunxi#14 [ffffaad04005fd50] pci_device_shutdown at ffffffff8add58f1 linux-sunxi#15 [ffffaad04005fd70] device_shutdown at ffffffff8af05386 linux-sunxi#16 [ffffaad04005fd98] kernel_restart at ffffffff8a92a870 linux-sunxi#17 [ffffaad04005fda8] __do_sys_reboot at ffffffff8a92abd6 linux-sunxi#18 [ffffaad04005fee0] do_syscall_64 at ffffffff8b317159 linux-sunxi#19 [ffffaad04005ff08] __context_tracking_enter at ffffffff8b31b6fc linux-sunxi#20 [ffffaad04005ff18] syscall_exit_to_user_mode at ffffffff8b31b50d linux-sunxi#21 [ffffaad04005ff28] do_syscall_64 at ffffffff8b317169 linux-sunxi#22 [ffffaad04005ff50] entry_SYSCALL_64_after_hwframe at ffffffff8b40009b RIP: 00007f1baa5c13d7 RSP: 00007fffbcc55a98 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1baa5c13d7 RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead RBP: 00007fffbcc55ca0 R8: 0000000000000000 R9: 00007fffbcc54e90 R10: 00007fffbcc55050 R11: 0000000000000202 R12: 0000000000000005 R13: 0000000000000000 R14: 00007fffbcc55af0 R15: 0000000000000000 ORIG_RAX: 00000000000000a9 CS: 0033 SS: 002b During reboot all drivers PM shutdown callbacks are invoked. In iavf_shutdown() the adapter state is changed to __IAVF_REMOVE. In ice_shutdown() the call chain above is executed, which at some point calls iavf_remove(). However iavf_remove() expects the VF to be in one of the states __IAVF_RUNNING, __IAVF_DOWN or __IAVF_INIT_FAILED. If that's not the case it sleeps forever. So if iavf_shutdown() gets invoked before iavf_remove() the system will hang indefinitely because the adapter is already in state __IAVF_REMOVE. Fix this by returning from iavf_remove() if the state is __IAVF_REMOVE, as we already went through iavf_shutdown(). Fixes: 9745780 ("iavf: Add waiting so the port is initialized in remove") Fixes: a841733 ("iavf: Fix race condition between iavf_shutdown and iavf_remove") Reported-by: Marius Cornea <mcornea@redhat.com> Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

The cited commit adds a compeletion to remove dependency on rtnl lock. But it causes a deadlock for multiple encapsulations: crash> bt ffff8aece8a64000 PID: 1514557 TASK: ffff8aece8a64000 CPU: 3 COMMAND: "tc" #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418 #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898 #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8 #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core] #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core] linux-sunxi#7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core] linux-sunxi#8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core] linux-sunxi#9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core] linux-sunxi#10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core] linux-sunxi#11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core] linux-sunxi#12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8 linux-sunxi#13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower] linux-sunxi#14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower] linux-sunxi#15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047 linux-sunxi#16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31 linux-sunxi#17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853 linux-sunxi#18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835 linux-sunxi#19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27 linux-sunxi#20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245 linux-sunxi#21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482 linux-sunxi#22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a linux-sunxi#23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2 linux-sunxi#24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2 linux-sunxi#25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f linux-sunxi#26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8 linux-sunxi#27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c crash> bt 0xffff8aeb07544000 PID: 1110766 TASK: ffff8aeb07544000 CPU: 0 COMMAND: "kworker/u20:9" #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418 #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88 #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core] #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core] #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core] linux-sunxi#7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c linux-sunxi#8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012 linux-sunxi#9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d linux-sunxi#10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f After the first encap is attached, flow will be added to encap entry's flows list. If neigh update is running at this time, the following encaps of the flow can't hold the encap_tbl_lock and sleep. If neigh update thread is waiting for that flow's init_done, deadlock happens. Fix it by holding lock outside of the for loop. If neigh update is running, prevent encap flows from offloading. Since the lock is held outside of the for loop, concurrent creation of encap entries is not allowed. So remove unnecessary wait_for_completion call for res_ready. Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

The following processes run into a deadlock. CPU 41 was waiting for CPU 29 to handle a CSD request while holding spinlock "crashdump_lock", but CPU 29 was hung by that spinlock with IRQs disabled. PID: 17360 TASK: ffff95c1090c5c40 CPU: 41 COMMAND: "mrdiagd" !# 0 [ffffb80edbf37b58] __read_once_size at ffffffff9b871a40 include/linux/compiler.h:185:0 !# 1 [ffffb80edbf37b58] atomic_read at ffffffff9b871a40 arch/x86/include/asm/atomic.h:27:0 !# 2 [ffffb80edbf37b58] dump_stack at ffffffff9b871a40 lib/dump_stack.c:54:0 # 3 [ffffb80edbf37b78] csd_lock_wait_toolong at ffffffff9b131ad5 kernel/smp.c:364:0 # 4 [ffffb80edbf37b78] __csd_lock_wait at ffffffff9b131ad5 kernel/smp.c:384:0 # 5 [ffffb80edbf37bf8] csd_lock_wait at ffffffff9b13267a kernel/smp.c:394:0 # 6 [ffffb80edbf37bf8] smp_call_function_many at ffffffff9b13267a kernel/smp.c:843:0 # 7 [ffffb80edbf37c50] smp_call_function at ffffffff9b13279d kernel/smp.c:867:0 # 8 [ffffb80edbf37c50] on_each_cpu at ffffffff9b13279d kernel/smp.c:976:0 # 9 [ffffb80edbf37c78] flush_tlb_kernel_range at ffffffff9b085c4b arch/x86/mm/tlb.c:742:0 linux-sunxi#10 [ffffb80edbf37cb8] __purge_vmap_area_lazy at ffffffff9b23a1e0 mm/vmalloc.c:701:0 linux-sunxi#11 [ffffb80edbf37ce0] try_purge_vmap_area_lazy at ffffffff9b23a2cc mm/vmalloc.c:722:0 linux-sunxi#12 [ffffb80edbf37ce0] free_vmap_area_noflush at ffffffff9b23a2cc mm/vmalloc.c:754:0 linux-sunxi#13 [ffffb80edbf37cf8] free_unmap_vmap_area at ffffffff9b23bb3b mm/vmalloc.c:764:0 linux-sunxi#14 [ffffb80edbf37cf8] remove_vm_area at ffffffff9b23bb3b mm/vmalloc.c:1509:0 linux-sunxi#15 [ffffb80edbf37d18] __vunmap at ffffffff9b23bb8a mm/vmalloc.c:1537:0 linux-sunxi#16 [ffffb80edbf37d40] vfree at ffffffff9b23bc85 mm/vmalloc.c:1612:0 linux-sunxi#17 [ffffb80edbf37d58] megasas_free_host_crash_buffer [megaraid_sas] at ffffffffc020b7f2 drivers/scsi/megaraid/megaraid_sas_fusion.c:3932:0 linux-sunxi#18 [ffffb80edbf37d80] fw_crash_state_store [megaraid_sas] at ffffffffc01f804d drivers/scsi/megaraid/megaraid_sas_base.c:3291:0 linux-sunxi#19 [ffffb80edbf37dc0] dev_attr_store at ffffffff9b56dd7b drivers/base/core.c:758:0 linux-sunxi#20 [ffffb80edbf37dd0] sysfs_kf_write at ffffffff9b326acf fs/sysfs/file.c:144:0 linux-sunxi#21 [ffffb80edbf37de0] kernfs_fop_write at ffffffff9b325fd4 fs/kernfs/file.c:316:0 linux-sunxi#22 [ffffb80edbf37e20] __vfs_write at ffffffff9b29418a fs/read_write.c:480:0 linux-sunxi#23 [ffffb80edbf37ea8] vfs_write at ffffffff9b294462 fs/read_write.c:544:0 linux-sunxi#24 [ffffb80edbf37ee8] SYSC_write at ffffffff9b2946ec fs/read_write.c:590:0 linux-sunxi#25 [ffffb80edbf37ee8] SyS_write at ffffffff9b2946ec fs/read_write.c:582:0 linux-sunxi#26 [ffffb80edbf37f30] do_syscall_64 at ffffffff9b003ca9 arch/x86/entry/common.c:298:0 linux-sunxi#27 [ffffb80edbf37f58] entry_SYSCALL_64 at ffffffff9ba001b1 arch/x86/entry/entry_64.S:238:0 PID: 17355 TASK: ffff95c1090c3d80 CPU: 29 COMMAND: "mrdiagd" !# 0 [ffffb80f2d3c7d30] __read_once_size at ffffffff9b0f2ab0 include/linux/compiler.h:185:0 !# 1 [ffffb80f2d3c7d30] native_queued_spin_lock_slowpath at ffffffff9b0f2ab0 kernel/locking/qspinlock.c:368:0 # 2 [ffffb80f2d3c7d58] pv_queued_spin_lock_slowpath at ffffffff9b0f244b arch/x86/include/asm/paravirt.h:674:0 # 3 [ffffb80f2d3c7d58] queued_spin_lock_slowpath at ffffffff9b0f244b arch/x86/include/asm/qspinlock.h:53:0 # 4 [ffffb80f2d3c7d68] queued_spin_lock at ffffffff9b8961a6 include/asm-generic/qspinlock.h:90:0 # 5 [ffffb80f2d3c7d68] do_raw_spin_lock_flags at ffffffff9b8961a6 include/linux/spinlock.h:173:0 # 6 [ffffb80f2d3c7d68] __raw_spin_lock_irqsave at ffffffff9b8961a6 include/linux/spinlock_api_smp.h:122:0 # 7 [ffffb80f2d3c7d68] _raw_spin_lock_irqsave at ffffffff9b8961a6 kernel/locking/spinlock.c:160:0 # 8 [ffffb80f2d3c7d88] fw_crash_buffer_store [megaraid_sas] at ffffffffc01f8129 drivers/scsi/megaraid/megaraid_sas_base.c:3205:0 # 9 [ffffb80f2d3c7dc0] dev_attr_store at ffffffff9b56dd7b drivers/base/core.c:758:0 linux-sunxi#10 [ffffb80f2d3c7dd0] sysfs_kf_write at ffffffff9b326acf fs/sysfs/file.c:144:0 linux-sunxi#11 [ffffb80f2d3c7de0] kernfs_fop_write at ffffffff9b325fd4 fs/kernfs/file.c:316:0 linux-sunxi#12 [ffffb80f2d3c7e20] __vfs_write at ffffffff9b29418a fs/read_write.c:480:0 linux-sunxi#13 [ffffb80f2d3c7ea8] vfs_write at ffffffff9b294462 fs/read_write.c:544:0 linux-sunxi#14 [ffffb80f2d3c7ee8] SYSC_write at ffffffff9b2946ec fs/read_write.c:590:0 linux-sunxi#15 [ffffb80f2d3c7ee8] SyS_write at ffffffff9b2946ec fs/read_write.c:582:0 linux-sunxi#16 [ffffb80f2d3c7f30] do_syscall_64 at ffffffff9b003ca9 arch/x86/entry/common.c:298:0 linux-sunxi#17 [ffffb80f2d3c7f58] entry_SYSCALL_64 at ffffffff9ba001b1 arch/x86/entry/entry_64.S:238:0 The lock is used to synchronize different sysfs operations, it doesn't protect any resource that will be touched by an interrupt. Consequently it's not required to disable IRQs. Replace the spinlock with a mutex to fix the deadlock. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Link: https://lore.kernel.org/r/20230828221018.19471-1-junxiao.bi@oracle.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

commit 133466c ("net: stmmac: use per-queue 64 bit statistics where necessary") caused one regression as found by Uwe, the backtrace looks like: INFO: trying to register non-static key. The code is fine but needs lockdep annotation, or maybe you didn't initialize this object before use? turning off the locking correctness validator. CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc1-00449-g133466c3bbe1-dirty linux-sunxi#21 Hardware name: STM32 (Device Tree Support) unwind_backtrace from show_stack+0x18/0x1c show_stack from dump_stack_lvl+0x60/0x90 dump_stack_lvl from register_lock_class+0x98c/0x99c register_lock_class from __lock_acquire+0x74/0x293c __lock_acquire from lock_acquire+0x134/0x398 lock_acquire from stmmac_get_stats64+0x2ac/0x2fc stmmac_get_stats64 from dev_get_stats+0x44/0x130 dev_get_stats from rtnl_fill_stats+0x38/0x120 rtnl_fill_stats from rtnl_fill_ifinfo+0x834/0x17f4 rtnl_fill_ifinfo from rtmsg_ifinfo_build_skb+0xc0/0x144 rtmsg_ifinfo_build_skb from rtmsg_ifinfo+0x50/0x88 rtmsg_ifinfo from __dev_notify_flags+0xc0/0xec __dev_notify_flags from dev_change_flags+0x50/0x5c dev_change_flags from ip_auto_config+0x2f4/0x1260 ip_auto_config from do_one_initcall+0x70/0x35c do_one_initcall from kernel_init_freeable+0x2ac/0x308 kernel_init_freeable from kernel_init+0x1c/0x138 kernel_init from ret_from_fork+0x14/0x2c The reason is the rxq|txq_stats structures are not what expected because stmmac_open() -> __stmmac_open() the structure is overwritten by "memcpy(&priv->dma_conf, dma_conf, sizeof(*dma_conf));" This causes the well initialized syncp member of rxq|txq_stats is overwritten unexpectedly as pointed out by Johannes and Uwe. Fix this issue by moving rxq|txq_stats back to stmmac_extra_stats. For SMP cache friendly, we also mark stmmac_txq_stats and stmmac_rxq_stats as ____cacheline_aligned_in_smp. Fixes: 133466c ("net: stmmac: use per-queue 64 bit statistics where necessary") Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Link: https://lore.kernel.org/r/20230917165328.3403-1-jszhang@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>

[ Upstream commit e3e82fc ] When creating ceq_0 during probing irdma, cqp.sc_cqp will be sent as a cqp_request to cqp->sc_cqp.sq_ring. If the request is pending when removing the irdma driver or unplugging its aux device, cqp.sc_cqp will be dereferenced as wrong struct in irdma_free_pending_cqp_request(). PID: 3669 TASK: ffff88aef892c000 CPU: 28 COMMAND: "kworker/28:0" #0 [fffffe0000549e38] crash_nmi_callback at ffffffff810e3a34 jwrdegoede#1 [fffffe0000549e40] nmi_handle at ffffffff810788b2 jwrdegoede#2 [fffffe0000549ea0] default_do_nmi at ffffffff8107938f jwrdegoede#3 [fffffe0000549eb8] do_nmi at ffffffff81079582 jwrdegoede#4 [fffffe0000549ef0] end_repeat_nmi at ffffffff82e016b4 [exception RIP: native_queued_spin_lock_slowpath+1291] RIP: ffffffff8127e72b RSP: ffff88aa841ef778 RFLAGS: 00000046 RAX: 0000000000000000 RBX: ffff88b01f849700 RCX: ffffffff8127e47e RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff83857ec0 RBP: ffff88afe3e4efc8 R8: ffffed15fc7c9dfa R9: ffffed15fc7c9dfa R10: 0000000000000001 R11: ffffed15fc7c9df9 R12: 0000000000740000 R13: ffff88b01f849708 R14: 0000000000000003 R15: ffffed1603f092e1 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 -- <NMI exception stack> -- jwrdegoede#5 [ffff88aa841ef778] native_queued_spin_lock_slowpath at ffffffff8127e72b jwrdegoede#6 [ffff88aa841ef7b0] _raw_spin_lock_irqsave at ffffffff82c22aa4 linux-sunxi#7 [ffff88aa841ef7c8] __wake_up_common_lock at ffffffff81257363 linux-sunxi#8 [ffff88aa841ef888] irdma_free_pending_cqp_request at ffffffffa0ba12cc [irdma] linux-sunxi#9 [ffff88aa841ef958] irdma_cleanup_pending_cqp_op at ffffffffa0ba1469 [irdma] linux-sunxi#10 [ffff88aa841ef9c0] irdma_ctrl_deinit_hw at ffffffffa0b2989f [irdma] linux-sunxi#11 [ffff88aa841efa28] irdma_remove at ffffffffa0b252df [irdma] linux-sunxi#12 [ffff88aa841efae8] auxiliary_bus_remove at ffffffff8219afdb linux-sunxi#13 [ffff88aa841efb00] device_release_driver_internal at ffffffff821882e6 linux-sunxi#14 [ffff88aa841efb38] bus_remove_device at ffffffff82184278 linux-sunxi#15 [ffff88aa841efb88] device_del at ffffffff82179d23 linux-sunxi#16 [ffff88aa841efc48] ice_unplug_aux_dev at ffffffffa0eb1c14 [ice] linux-sunxi#17 [ffff88aa841efc68] ice_service_task at ffffffffa0d88201 [ice] linux-sunxi#18 [ffff88aa841efde8] process_one_work at ffffffff811c589a linux-sunxi#19 [ffff88aa841efe60] worker_thread at ffffffff811c71ff linux-sunxi#20 [ffff88aa841eff10] kthread at ffffffff811d87a0 linux-sunxi#21 [ffff88aa841eff50] ret_from_fork at ffffffff82e0022f Fixes: 44d9e52 ("RDMA/irdma: Implement device initialization definitions") Link: https://lore.kernel.org/r/20231130081415.891006-1-lishifeng@sangfor.com.cn Suggested-by: "Ismail, Mustafa" <mustafa.ismail@intel.com> Signed-off-by: Shifeng Li <lishifeng@sangfor.com.cn> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit 37c3b9f ] The cited commit adds a compeletion to remove dependency on rtnl lock. But it causes a deadlock for multiple encapsulations: crash> bt ffff8aece8a64000 PID: 1514557 TASK: ffff8aece8a64000 CPU: 3 COMMAND: "tc" #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45 jwrdegoede#1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418 jwrdegoede#2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898 jwrdegoede#3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8 jwrdegoede#4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb jwrdegoede#5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core] jwrdegoede#6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core] linux-sunxi#7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core] linux-sunxi#8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core] linux-sunxi#9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core] linux-sunxi#10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core] linux-sunxi#11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core] linux-sunxi#12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8 linux-sunxi#13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower] linux-sunxi#14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower] linux-sunxi#15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047 linux-sunxi#16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31 linux-sunxi#17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853 linux-sunxi#18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835 linux-sunxi#19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27 linux-sunxi#20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245 linux-sunxi#21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482 linux-sunxi#22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a linux-sunxi#23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2 linux-sunxi#24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2 linux-sunxi#25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f linux-sunxi#26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8 linux-sunxi#27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c crash> bt 0xffff8aeb07544000 PID: 1110766 TASK: ffff8aeb07544000 CPU: 0 COMMAND: "kworker/u20:9" #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45 jwrdegoede#1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418 jwrdegoede#2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88 jwrdegoede#3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b jwrdegoede#4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core] jwrdegoede#5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core] jwrdegoede#6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core] linux-sunxi#7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c linux-sunxi#8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012 linux-sunxi#9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d linux-sunxi#10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f After the first encap is attached, flow will be added to encap entry's flows list. If neigh update is running at this time, the following encaps of the flow can't hold the encap_tbl_lock and sleep. If neigh update thread is waiting for that flow's init_done, deadlock happens. Fix it by holding lock outside of the for loop. If neigh update is running, prevent encap flows from offloading. Since the lock is held outside of the for loop, concurrent creation of encap entries is not allowed. So remove unnecessary wait_for_completion call for res_ready. Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

The code in ocfs2_dio_end_io_write() estimates number of necessary transaction credits using ocfs2_calc_extend_credits(). This however does not take into account that the IO could be arbitrarily large and can contain arbitrary number of extents. Extent tree manipulations do often extend the current transaction but not in all of the cases. For example if we have only single block extents in the tree, ocfs2_mark_extent_written() will end up calling ocfs2_replace_extent_rec() all the time and we will never extend the current transaction and eventually exhaust all the transaction credits if the IO contains many single block extents. Once that happens a WARN_ON(jbd2_handle_buffer_credits(handle) <= 0) is triggered in jbd2_journal_dirty_metadata() and subsequently OCFS2 aborts in response to this error. This was actually triggered by one of our customers on a heavily fragmented OCFS2 filesystem. To fix the issue make sure the transaction always has enough credits for one extent insert before each call of ocfs2_mark_extent_written(). Heming Zhao said: ------ PANIC: "Kernel panic - not syncing: OCFS2: (device dm-1): panic forced after error" PID: xxx TASK: xxxx CPU: 5 COMMAND: "SubmitThread-CA" #0 machine_kexec at ffffffff8c069932 #1 __crash_kexec at ffffffff8c1338fa #2 panic at ffffffff8c1d69b9 #3 ocfs2_handle_error at ffffffffc0c86c0c [ocfs2] #4 __ocfs2_abort at ffffffffc0c88387 [ocfs2] #5 ocfs2_journal_dirty at ffffffffc0c51e98 [ocfs2] #6 ocfs2_split_extent at ffffffffc0c27ea3 [ocfs2] linux-sunxi#7 ocfs2_change_extent_flag at ffffffffc0c28053 [ocfs2] linux-sunxi#8 ocfs2_mark_extent_written at ffffffffc0c28347 [ocfs2] linux-sunxi#9 ocfs2_dio_end_io_write at ffffffffc0c2bef9 [ocfs2] linux-sunxi#10 ocfs2_dio_end_io at ffffffffc0c2c0f5 [ocfs2] linux-sunxi#11 dio_complete at ffffffff8c2b9fa7 linux-sunxi#12 do_blockdev_direct_IO at ffffffff8c2bc09f linux-sunxi#13 ocfs2_direct_IO at ffffffffc0c2b653 [ocfs2] linux-sunxi#14 generic_file_direct_write at ffffffff8c1dcf14 linux-sunxi#15 __generic_file_write_iter at ffffffff8c1dd07b linux-sunxi#16 ocfs2_file_write_iter at ffffffffc0c49f1f [ocfs2] linux-sunxi#17 aio_write at ffffffff8c2cc72e linux-sunxi#18 kmem_cache_alloc at ffffffff8c248dde linux-sunxi#19 do_io_submit at ffffffff8c2ccada linux-sunxi#20 do_syscall_64 at ffffffff8c004984 linux-sunxi#21 entry_SYSCALL_64_after_hwframe at ffffffff8c8000ba Link: https://lkml.kernel.org/r/20240617095543.6971-1-jack@suse.cz Link: https://lkml.kernel.org/r/20240614145243.8837-1-jack@suse.cz Fixes: c15471f ("ocfs2: fix sparse file & data ordering issue in direct io") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

In the buffered write path, the dirty page owns the qgroup reserve until it creates an ordered_extent. Therefore, any errors that occur before the ordered_extent is created must free that reservation, or else the space is leaked. The fstest generic/475 exercises various IO error paths, and is able to trigger errors in cow_file_range where we fail to get to allocating the ordered extent. Note that because we *do* clear delalloc, we are likely to remove the inode from the delalloc list, so the inodes/pages to not have invalidate/launder called on them in the commit abort path. This results in failures at the unmount stage of the test that look like: BTRFS: error (device dm-8 state EA) in cleanup_transaction:2018: errno=-5 IO failure BTRFS: error (device dm-8 state EA) in btrfs_replace_file_extents:2416: errno=-5 IO failure BTRFS warning (device dm-8 state EA): qgroup 0/5 has unreleased space, type 0 rsv 28672 ------------[ cut here ]------------ WARNING: CPU: 3 PID: 22588 at fs/btrfs/disk-io.c:4333 close_ctree+0x222/0x4d0 [btrfs] Modules linked in: btrfs blake2b_generic libcrc32c xor zstd_compress raid6_pq CPU: 3 PID: 22588 Comm: umount Kdump: loaded Tainted: G W 6.10.0-rc7-gab56fde445b8 linux-sunxi#21 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 RIP: 0010:close_ctree+0x222/0x4d0 [btrfs] RSP: 0018:ffffb4465283be00 EFLAGS: 00010202 RAX: 0000000000000001 RBX: ffffa1a1818e1000 RCX: 0000000000000001 RDX: 0000000000000000 RSI: ffffb4465283bbe0 RDI: ffffa1a19374fcb8 RBP: ffffa1a1818e13c0 R08: 0000000100028b16 R09: 0000000000000000 R10: 0000000000000003 R11: 0000000000000003 R12: ffffa1a18ad7972c R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f9168312b80(0000) GS:ffffa1a4afcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f91683c9140 CR3: 000000010acaa000 CR4: 00000000000006f0 Call Trace: <TASK> ? close_ctree+0x222/0x4d0 [btrfs] ? __warn.cold+0x8e/0xea ? close_ctree+0x222/0x4d0 [btrfs] ? report_bug+0xff/0x140 ? handle_bug+0x3b/0x70 ? exc_invalid_op+0x17/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? close_ctree+0x222/0x4d0 [btrfs] generic_shutdown_super+0x70/0x160 kill_anon_super+0x11/0x40 btrfs_kill_super+0x11/0x20 [btrfs] deactivate_locked_super+0x2e/0xa0 cleanup_mnt+0xb5/0x150 task_work_run+0x57/0x80 syscall_exit_to_user_mode+0x121/0x130 do_syscall_64+0xab/0x1a0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f916847a887 ---[ end trace 0000000000000000 ]--- BTRFS error (device dm-8 state EA): qgroup reserved space leaked Cases 2 and 3 in the out_reserve path both pertain to this type of leak and must free the reserved qgroup data. Because it is already an error path, I opted not to handle the possible errors in btrfs_free_qgroup_data. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>

[ Upstream commit 769e6a1 ] ui_browser__show() is capturing the input title that is stack allocated memory in hist_browser__run(). Avoid a use after return by strdup-ing the string. Committer notes: Further explanation from Ian Rogers: My command line using tui is: $ sudo bash -c 'rm /tmp/asan.log*; export ASAN_OPTIONS="log_path=/tmp/asan.log"; /tmp/perf/perf mem record -a sleep 1; /tmp/perf/perf mem report' I then go to the perf annotate view and quit. This triggers the asan error (from the log file): ``` ==1254591==ERROR: AddressSanitizer: stack-use-after-return on address 0x7f2813331920 at pc 0x7f28180 65991 bp 0x7fff0a21c750 sp 0x7fff0a21bf10 READ of size 80 at 0x7f2813331920 thread T0 #0 0x7f2818065990 in __interceptor_strlen ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:461 jwrdegoede#1 0x7f2817698251 in SLsmg_write_wrapped_string (/lib/x86_64-linux-gnu/libslang.so.2+0x98251) jwrdegoede#2 0x7f28176984b9 in SLsmg_write_nstring (/lib/x86_64-linux-gnu/libslang.so.2+0x984b9) jwrdegoede#3 0x55c94045b365 in ui_browser__write_nstring ui/browser.c:60 jwrdegoede#4 0x55c94045c558 in __ui_browser__show_title ui/browser.c:266 jwrdegoede#5 0x55c94045c776 in ui_browser__show ui/browser.c:288 jwrdegoede#6 0x55c94045c06d in ui_browser__handle_resize ui/browser.c:206 linux-sunxi#7 0x55c94047979b in do_annotate ui/browsers/hists.c:2458 linux-sunxi#8 0x55c94047fb17 in evsel__hists_browse ui/browsers/hists.c:3412 linux-sunxi#9 0x55c940480a0c in perf_evsel_menu__run ui/browsers/hists.c:3527 linux-sunxi#10 0x55c940481108 in __evlist__tui_browse_hists ui/browsers/hists.c:3613 linux-sunxi#11 0x55c9404813f7 in evlist__tui_browse_hists ui/browsers/hists.c:3661 linux-sunxi#12 0x55c93ffa253f in report__browse_hists tools/perf/builtin-report.c:671 linux-sunxi#13 0x55c93ffa58ca in __cmd_report tools/perf/builtin-report.c:1141 linux-sunxi#14 0x55c93ffaf159 in cmd_report tools/perf/builtin-report.c:1805 linux-sunxi#15 0x55c94000c05c in report_events tools/perf/builtin-mem.c:374 linux-sunxi#16 0x55c94000d96d in cmd_mem tools/perf/builtin-mem.c:516 linux-sunxi#17 0x55c9400e44ee in run_builtin tools/perf/perf.c:350 linux-sunxi#18 0x55c9400e4a5a in handle_internal_command tools/perf/perf.c:403 linux-sunxi#19 0x55c9400e4e22 in run_argv tools/perf/perf.c:447 linux-sunxi#20 0x55c9400e53ad in main tools/perf/perf.c:561 linux-sunxi#21 0x7f28170456c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 linux-sunxi#22 0x7f2817045784 in __libc_start_main_impl ../csu/libc-start.c:360 linux-sunxi#23 0x55c93ff544c0 in _start (/tmp/perf/perf+0x19a4c0) (BuildId: 84899b0e8c7d3a3eaa67b2eb35e3d8b2f8cd4c93) Address 0x7f2813331920 is located in stack of thread T0 at offset 32 in frame #0 0x55c94046e85e in hist_browser__run ui/browsers/hists.c:746 This frame has 1 object(s): [32, 192) 'title' (line 747) <== Memory access at offset 32 is inside this variable HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork ``` hist_browser__run isn't on the stack so the asan error looks legit. There's no clean init/exit on struct ui_browser so I may be trading a use-after-return for a memory leak, but that seems look a good trade anyway. Fixes: 05e8b08 ("perf ui browser: Stop using 'self'") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Li Dong <lidong@vivo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240507183545.1236093-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

commit be346c1 upstream. The code in ocfs2_dio_end_io_write() estimates number of necessary transaction credits using ocfs2_calc_extend_credits(). This however does not take into account that the IO could be arbitrarily large and can contain arbitrary number of extents. Extent tree manipulations do often extend the current transaction but not in all of the cases. For example if we have only single block extents in the tree, ocfs2_mark_extent_written() will end up calling ocfs2_replace_extent_rec() all the time and we will never extend the current transaction and eventually exhaust all the transaction credits if the IO contains many single block extents. Once that happens a WARN_ON(jbd2_handle_buffer_credits(handle) <= 0) is triggered in jbd2_journal_dirty_metadata() and subsequently OCFS2 aborts in response to this error. This was actually triggered by one of our customers on a heavily fragmented OCFS2 filesystem. To fix the issue make sure the transaction always has enough credits for one extent insert before each call of ocfs2_mark_extent_written(). Heming Zhao said: ------ PANIC: "Kernel panic - not syncing: OCFS2: (device dm-1): panic forced after error" PID: xxx TASK: xxxx CPU: 5 COMMAND: "SubmitThread-CA" #0 machine_kexec at ffffffff8c069932 jwrdegoede#1 __crash_kexec at ffffffff8c1338fa jwrdegoede#2 panic at ffffffff8c1d69b9 jwrdegoede#3 ocfs2_handle_error at ffffffffc0c86c0c [ocfs2] jwrdegoede#4 __ocfs2_abort at ffffffffc0c88387 [ocfs2] jwrdegoede#5 ocfs2_journal_dirty at ffffffffc0c51e98 [ocfs2] jwrdegoede#6 ocfs2_split_extent at ffffffffc0c27ea3 [ocfs2] linux-sunxi#7 ocfs2_change_extent_flag at ffffffffc0c28053 [ocfs2] linux-sunxi#8 ocfs2_mark_extent_written at ffffffffc0c28347 [ocfs2] linux-sunxi#9 ocfs2_dio_end_io_write at ffffffffc0c2bef9 [ocfs2] linux-sunxi#10 ocfs2_dio_end_io at ffffffffc0c2c0f5 [ocfs2] linux-sunxi#11 dio_complete at ffffffff8c2b9fa7 linux-sunxi#12 do_blockdev_direct_IO at ffffffff8c2bc09f linux-sunxi#13 ocfs2_direct_IO at ffffffffc0c2b653 [ocfs2] linux-sunxi#14 generic_file_direct_write at ffffffff8c1dcf14 linux-sunxi#15 __generic_file_write_iter at ffffffff8c1dd07b linux-sunxi#16 ocfs2_file_write_iter at ffffffffc0c49f1f [ocfs2] linux-sunxi#17 aio_write at ffffffff8c2cc72e linux-sunxi#18 kmem_cache_alloc at ffffffff8c248dde linux-sunxi#19 do_io_submit at ffffffff8c2ccada linux-sunxi#20 do_syscall_64 at ffffffff8c004984 linux-sunxi#21 entry_SYSCALL_64_after_hwframe at ffffffff8c8000ba Link: https://lkml.kernel.org/r/20240617095543.6971-1-jack@suse.cz Link: https://lkml.kernel.org/r/20240614145243.8837-1-jack@suse.cz Fixes: c15471f ("ocfs2: fix sparse file & data ordering issue in direct io") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

…s_lock For storing a value to a queue attribute, the queue_attr_store function first freezes the queue (->q_usage_counter(io)) and then acquire ->sysfs_lock. This seems not correct as the usual ordering should be to acquire ->sysfs_lock before freezing the queue. This incorrect ordering causes the following lockdep splat which we are able to reproduce always simply by accessing /sys/kernel/debug file using ls command: [ 57.597146] WARNING: possible circular locking dependency detected [ 57.597154] 6.12.0-10553-gb86545e02e8c linux-sunxi#20 Tainted: G W [ 57.597162] ------------------------------------------------------ [ 57.597168] ls/4605 is trying to acquire lock: [ 57.597176] c00000003eb56710 (&mm->mmap_lock){++++}-{4:4}, at: __might_fault+0x58/0xc0 [ 57.597200] but task is already holding lock: [ 57.597207] c0000018e27c6810 (&sb->s_type->i_mutex_key#3){++++}-{4:4}, at: iterate_dir+0x94/0x1d4 [ 57.597226] which lock already depends on the new lock. [ 57.597233] the existing dependency chain (in reverse order) is: [ 57.597241] -> #5 (&sb->s_type->i_mutex_key#3){++++}-{4:4}: [ 57.597255] down_write+0x6c/0x18c [ 57.597264] start_creating+0xb4/0x24c [ 57.597274] debugfs_create_dir+0x2c/0x1e8 [ 57.597283] blk_register_queue+0xec/0x294 [ 57.597292] add_disk_fwnode+0x2e4/0x548 [ 57.597302] brd_alloc+0x2c8/0x338 [ 57.597309] brd_init+0x100/0x178 [ 57.597317] do_one_initcall+0x88/0x3e4 [ 57.597326] kernel_init_freeable+0x3cc/0x6e0 [ 57.597334] kernel_init+0x34/0x1cc [ 57.597342] ret_from_kernel_user_thread+0x14/0x1c [ 57.597350] -> #4 (&q->debugfs_mutex){+.+.}-{4:4}: [ 57.597362] __mutex_lock+0xfc/0x12a0 [ 57.597370] blk_register_queue+0xd4/0x294 [ 57.597379] add_disk_fwnode+0x2e4/0x548 [ 57.597388] brd_alloc+0x2c8/0x338 [ 57.597395] brd_init+0x100/0x178 [ 57.597402] do_one_initcall+0x88/0x3e4 [ 57.597410] kernel_init_freeable+0x3cc/0x6e0 [ 57.597418] kernel_init+0x34/0x1cc [ 57.597426] ret_from_kernel_user_thread+0x14/0x1c [ 57.597434] -> #3 (&q->sysfs_lock){+.+.}-{4:4}: [ 57.597446] __mutex_lock+0xfc/0x12a0 [ 57.597454] queue_attr_store+0x9c/0x110 [ 57.597462] sysfs_kf_write+0x70/0xb0 [ 57.597471] kernfs_fop_write_iter+0x1b0/0x2ac [ 57.597480] vfs_write+0x3dc/0x6e8 [ 57.597488] ksys_write+0x84/0x140 [ 57.597495] system_call_exception+0x130/0x360 [ 57.597504] system_call_common+0x160/0x2c4 [ 57.597516] -> #2 (&q->q_usage_counter(io)linux-sunxi#21){++++}-{0:0}: [ 57.597530] __submit_bio+0x5ec/0x828 [ 57.597538] submit_bio_noacct_nocheck+0x1e4/0x4f0 [ 57.597547] iomap_readahead+0x2a0/0x448 [ 57.597556] xfs_vm_readahead+0x28/0x3c [ 57.597564] read_pages+0x88/0x41c [ 57.597571] page_cache_ra_unbounded+0x1ac/0x2d8 [ 57.597580] filemap_get_pages+0x188/0x984 [ 57.597588] filemap_read+0x13c/0x4bc [ 57.597596] xfs_file_buffered_read+0x88/0x17c [ 57.597605] xfs_file_read_iter+0xac/0x158 [ 57.597614] vfs_read+0x2d4/0x3b4 [ 57.597622] ksys_read+0x84/0x144 [ 57.597629] system_call_exception+0x130/0x360 [ 57.597637] system_call_common+0x160/0x2c4 [ 57.597647] -> #1 (mapping.invalidate_lock#2){++++}-{4:4}: [ 57.597661] down_read+0x6c/0x220 [ 57.597669] filemap_fault+0x870/0x100c [ 57.597677] xfs_filemap_fault+0xc4/0x18c [ 57.597684] __do_fault+0x64/0x164 [ 57.597693] __handle_mm_fault+0x1274/0x1dac [ 57.597702] handle_mm_fault+0x248/0x484 [ 57.597711] ___do_page_fault+0x428/0xc0c [ 57.597719] hash__do_page_fault+0x30/0x68 [ 57.597727] do_hash_fault+0x90/0x35c [ 57.597736] data_access_common_virt+0x210/0x220 [ 57.597745] _copy_from_user+0xf8/0x19c [ 57.597754] sel_write_load+0x178/0xd54 [ 57.597762] vfs_write+0x108/0x6e8 [ 57.597769] ksys_write+0x84/0x140 [ 57.597777] system_call_exception+0x130/0x360 [ 57.597785] system_call_common+0x160/0x2c4 [ 57.597794] -> #0 (&mm->mmap_lock){++++}-{4:4}: [ 57.597806] __lock_acquire+0x17cc/0x2330 [ 57.597814] lock_acquire+0x138/0x400 [ 57.597822] __might_fault+0x7c/0xc0 [ 57.597830] filldir64+0xe8/0x390 [ 57.597839] dcache_readdir+0x80/0x2d4 [ 57.597846] iterate_dir+0xd8/0x1d4 [ 57.597855] sys_getdents64+0x88/0x2d4 [ 57.597864] system_call_exception+0x130/0x360 [ 57.597872] system_call_common+0x160/0x2c4 [ 57.597881] other info that might help us debug this: [ 57.597888] Chain exists of: &mm->mmap_lock --> &q->debugfs_mutex --> &sb->s_type->i_mutex_key#3 [ 57.597905] Possible unsafe locking scenario: [ 57.597911] CPU0 CPU1 [ 57.597917] ---- ---- [ 57.597922] rlock(&sb->s_type->i_mutex_key#3); [ 57.597932] lock(&q->debugfs_mutex); [ 57.597940] lock(&sb->s_type->i_mutex_key#3); [ 57.597950] rlock(&mm->mmap_lock); [ 57.597958] *** DEADLOCK *** [ 57.597965] 2 locks held by ls/4605: [ 57.597971] #0: c0000000137c12f8 (&f->f_pos_lock){+.+.}-{4:4}, at: fdget_pos+0xcc/0x154 [ 57.597989] #1: c0000018e27c6810 (&sb->s_type->i_mutex_key#3){++++}-{4:4}, at: iterate_dir+0x94/0x1d4 Prevent the above lockdep warning by acquiring ->sysfs_lock before freezing the queue while storing a queue attribute in queue_attr_store function. Later, we also found[1] another function __blk_mq_update_nr_ hw_queues where we first freeze queue and then acquire the ->sysfs_lock. So we've also updated lock ordering in __blk_mq_update_nr_hw_queues function and ensured that in all code paths we follow the correct lock ordering i.e. acquire ->sysfs_lock before freezing the queue. [1] https://lore.kernel.org/all/CAFj5m9Ke8+EHKQBs_Nk6hqd=LGXtk4mUxZUN5==ZcCjnZSBwHw@mail.gmail.com/ Reported-by: kjain@linux.ibm.com Fixes: af28141 ("block: freeze the queue in queue_attr_store") Tested-by: kjain@linux.ibm.com Cc: hch@lst.de Cc: axboe@kernel.dk Cc: ritesh.list@gmail.com Cc: ming.lei@redhat.com Cc: gjoyce@linux.ibm.com Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241210144222.1066229-1-nilay@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk>

turl mentioned this issue May 19, 2012

Fix issue #21 #22

Merged

amery pushed a commit that referenced this issue May 19, 2012

Fix build when using O=

4a92fcd

Fixes issue #21 on amery/linux-allwinner

amery added a commit that referenced this issue May 19, 2012

Merge pull request #22 from allwinner-dev-team/allwinner-3.0

f8959c5

Fix build using O= (issue #21) and inline build on CM9

amery closed this as completed May 19, 2012

amery pushed a commit that referenced this issue May 26, 2012

Fix build when using O=

53959ae

Fixes issue #21 on amery/linux-allwinner

amery pushed a commit that referenced this issue Jun 11, 2012

Fix build when using O=

a940ef2

Fixes issue #21 on amery/linux-allwinner

amery pushed a commit that referenced this issue Jul 2, 2012

Fix build when using O=

1dce0db

Fixes issue #21 on amery/linux-allwinner

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Building with O= is broken #21

Building with O= is broken #21

turl commented May 19, 2012

amery commented May 19, 2012

amery commented May 19, 2012

Building with O= is broken #21

Building with O= is broken #21

Comments

turl commented May 19, 2012

amery commented May 19, 2012

amery commented May 19, 2012