Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linux-image-6.8.0-101041-tuxedo keyboard not working after resume from suspend #10

Closed
fvalasiad opened this issue Sep 4, 2024 · 1 comment

Comments

@fvalasiad
Copy link

Hello!

Just wanted to point out that the aforementioned kernel breaks my polaris 15 keyboard after resume from suspend.

The keyboard just doesn't work, funnily enough kde connect's keyboard works though!

External USB plugged keyboards don't work either!

Booting into the older 6.5 kernel fixes this.

Additionally I've noticed that even suspend sometimes fails, only locking KDE. Suspending again usually works correctly though.

I am on default TuxedoOS install.

tuxedo-bot pushed a commit that referenced this issue Sep 5, 2024
A sysfs reader can race with a device reset or removal, attempting to
read device state when the device is not actually present. eg:

     [exception RIP: qed_get_current_link+17]
  #8 [ffffb9e4f2907c48] qede_get_link_ksettings at ffffffffc07a994a [qede]
  #9 [ffffb9e4f2907cd8] __rh_call_get_link_ksettings at ffffffff992b01a3
 #10 [ffffb9e4f2907d38] __ethtool_get_link_ksettings at ffffffff992b04e4
 #11 [ffffb9e4f2907d90] duplex_show at ffffffff99260300
 #12 [ffffb9e4f2907e38] dev_attr_show at ffffffff9905a01c
 #13 [ffffb9e4f2907e50] sysfs_kf_seq_show at ffffffff98e0145b
 #14 [ffffb9e4f2907e68] seq_read at ffffffff98d902e3
 #15 [ffffb9e4f2907ec8] vfs_read at ffffffff98d657d1
 #16 [ffffb9e4f2907f00] ksys_read at ffffffff98d65c3f
 #17 [ffffb9e4f2907f38] do_syscall_64 at ffffffff98a052fb

 crash> struct net_device.state ffff9a9d21336000
    state = 5,

state 5 is __LINK_STATE_START (0b1) and __LINK_STATE_NOCARRIER (0b100).
The device is not present, note lack of __LINK_STATE_PRESENT (0b10).

This is the same sort of panic as observed in commit 4224cfd
("net-sysfs: add check for netdevice being present to speed_show").

There are many other callers of __ethtool_get_link_ksettings() which
don't have a device presence check.

Move this check into ethtool to protect all callers.

Fixes: d519e17 ("net: export device speed and duplex via sysfs")
Fixes: 4224cfd ("net-sysfs: add check for netdevice being present to speed_show")
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Link: https://patch.msgid.link/8bae218864beaa44ed01628140475b9bf641c5b0.1724393671.git.jamie.bainbridge@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
tuxedo-bot pushed a commit that referenced this issue Sep 5, 2024
BugLink: https://bugs.launchpad.net/bugs/1942215

When the Timer operation is called, there are no arguments, and
acpi_ex_resolve_operands will be called with an out-of-bounds stack pointer
as num_operands is 0.

This does not usually cause any problems, as acpi_ex_resolve_operands will
ignore the parameter when the operation requires no arguments, as is the
case.

However, when the code is compiled with UBSAN, it will trigger, leading to
an oops with invalid opcode on Linux.

Fix it by using a NULL parameter when num_operands is 0.

[    8.285428] invalid opcode: 0000 [#1] SMP NOPTI
[    8.286436] CPU: 18 PID: 1522 Comm: systemd-udevd Not tainted 5.15.0-10-generic #10
[    8.287505] Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.0D.01.0395.022720191340 02/27/2019
[    8.288495] RIP: 0010:acpi_ds_exec_end_op+0x1be/0x7a6
[    8.289658] Code: 7b 0a 48 89 da 44 89 45 d4 48 98 48 8d 34 c3 e8 f8 3c 01 00 44 8b 45 d4 85 c0 41 89 c6 75 22 eb 9e 44 89 c0 41 80 f8 0b 76 02 <0f> 0b 48 8b 04 c5 c0 c0 ca aa 48 89 df ff d0 0f 1f 00 41 89 c4 eb
[    8.291858] RSP: 0018:ffffc38561a3f6d0 EFLAGS: 00010286
[    8.292888] RAX: 0000000000000000 RBX: ffffa0aa87c91800 RCX: 0000000000000040
[    8.294056] RDX: ffffffffffffffff RSI: ffffffffaacabf40 RDI: 00000000000002cb
[    8.295839] RBP: ffffc38561a3f700 R08: 0000000000000000 R09: ffffa0aa9f5a1000
[    8.296030] IPMI message handler: version 39.2
[    8.297554] R10: ffffa0aa89cdec00 R11: 0000000000000003 R12: 0000000000000000
[    8.297556] R13: ffffa0aa9f5a10a0 R14: 0000000000000000 R15: 0000000000000000
[    8.297558] FS:  00007f68ba26b8c0(0000) GS:ffffa0d60ca80000(0000) knlGS:0000000000000000
[    8.297560] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    8.297561] CR2: 00007fdbb3b9eec8 CR3: 00000001176ba001 CR4: 00000000007706e0
[    8.297563] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    8.297564] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    8.297565] PKRU: 55555554
[    8.297566] Call Trace:
[    8.297569]  acpi_ps_parse_loop+0x587/0x660
[    8.297574]  acpi_ps_parse_aml+0x1af/0x552
[    8.297578]  acpi_ps_execute_method+0x208/0x2ca
[    8.297580]  acpi_ns_evaluate+0x34e/0x4f0
[    8.297583]  acpi_evaluate_object+0x18e/0x3b4
[    8.297587]  acpi_evaluate_dsm+0xb3/0x120
[    8.297593]  ? acpi_evaluate_dsm+0xb3/0x120
[    8.297596]  nfit_intel_shutdown_status+0xed/0x1a0 [nfit]
[    8.297606]  acpi_nfit_add_dimm+0x3cb/0x660 [nfit]
[    8.297614]  acpi_nfit_register_dimms+0x141/0x460 [nfit]
[    8.297620]  acpi_nfit_init+0x54f/0x620 [nfit]
[    8.327895]  acpi_nfit_add+0x18c/0x1f0 [nfit]
[    8.329341]  acpi_device_probe+0x49/0x170
[    8.330815]  really_probe+0x209/0x410
[    8.330820]  __driver_probe_device+0x109/0x180
[    8.330823]  driver_probe_device+0x23/0x90
[    8.330825]  __driver_attach+0xac/0x1b0
[    8.330828]  ? __device_attach_driver+0xe0/0xe0
[    8.330831]  bus_for_each_dev+0x7c/0xc0
[    8.330834]  driver_attach+0x1e/0x20
[    8.330835]  bus_add_driver+0x135/0x1f0
[    8.330837]  driver_register+0x95/0xf0
[    8.330840]  acpi_bus_register_driver+0x39/0x50
[    8.344874]  nfit_init+0x168/0x1000 [nfit]
[    8.344882]  ? 0xffffffffc0735000
[    8.344885]  do_one_initcall+0x46/0x1d0
[    8.350927]  ? kmem_cache_alloc_trace+0x18c/0x2c0
[    8.350933]  do_init_module+0x62/0x290
[    8.350940]  load_module+0xaa3/0xb30
[    8.350944]  __do_sys_finit_module+0xbf/0x120
[    8.350948]  __x64_sys_finit_module+0x18/0x20
[    8.350951]  do_syscall_64+0x59/0xc0
[    8.350955]  ? exit_to_user_mode_prepare+0x37/0xb0
[    8.350959]  ? irqentry_exit_to_user_mode+0x9/0x20
[    8.350963]  ? irqentry_exit+0x19/0x30
[    8.350965]  ? exc_page_fault+0x89/0x160
[    8.350968]  ? asm_exc_page_fault+0x8/0x30
[    8.350971]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[    8.350975] RIP: 0033:0x7f68ba7fc94d
[    8.350978] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b3 64 0f 00 f7 d8 64 89 01 48
[    8.350980] RSP: 002b:00007ffc7e0b93c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[    8.350984] RAX: ffffffffffffffda RBX: 000055bbb29a4a00 RCX: 00007f68ba7fc94d
[    8.350985] RDX: 0000000000000000 RSI: 00007f68ba9923fe RDI: 0000000000000006
[    8.350987] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000
[    8.350988] R10: 0000000000000006 R11: 0000000000000246 R12: 00007f68ba9923fe
[    8.350989] R13: 000055bbb28e3a20 R14: 000055bbb297d940 R15: 000055bbb297ea60

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Acked-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
@tuxedo-wse
Copy link
Contributor

please reopen the issue over at https://gitlab.com/tuxedocomputers/development/packages/linux and add whitch generation of polaris you have a problem with

tuxedo-bot pushed a commit that referenced this issue Sep 11, 2024
BugLink: https://bugs.launchpad.net/bugs/1942215

When the Timer operation is called, there are no arguments, and
acpi_ex_resolve_operands will be called with an out-of-bounds stack pointer
as num_operands is 0.

This does not usually cause any problems, as acpi_ex_resolve_operands will
ignore the parameter when the operation requires no arguments, as is the
case.

However, when the code is compiled with UBSAN, it will trigger, leading to
an oops with invalid opcode on Linux.

Fix it by using a NULL parameter when num_operands is 0.

[    8.285428] invalid opcode: 0000 [#1] SMP NOPTI
[    8.286436] CPU: 18 PID: 1522 Comm: systemd-udevd Not tainted 5.15.0-10-generic #10
[    8.287505] Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.0D.01.0395.022720191340 02/27/2019
[    8.288495] RIP: 0010:acpi_ds_exec_end_op+0x1be/0x7a6
[    8.289658] Code: 7b 0a 48 89 da 44 89 45 d4 48 98 48 8d 34 c3 e8 f8 3c 01 00 44 8b 45 d4 85 c0 41 89 c6 75 22 eb 9e 44 89 c0 41 80 f8 0b 76 02 <0f> 0b 48 8b 04 c5 c0 c0 ca aa 48 89 df ff d0 0f 1f 00 41 89 c4 eb
[    8.291858] RSP: 0018:ffffc38561a3f6d0 EFLAGS: 00010286
[    8.292888] RAX: 0000000000000000 RBX: ffffa0aa87c91800 RCX: 0000000000000040
[    8.294056] RDX: ffffffffffffffff RSI: ffffffffaacabf40 RDI: 00000000000002cb
[    8.295839] RBP: ffffc38561a3f700 R08: 0000000000000000 R09: ffffa0aa9f5a1000
[    8.296030] IPMI message handler: version 39.2
[    8.297554] R10: ffffa0aa89cdec00 R11: 0000000000000003 R12: 0000000000000000
[    8.297556] R13: ffffa0aa9f5a10a0 R14: 0000000000000000 R15: 0000000000000000
[    8.297558] FS:  00007f68ba26b8c0(0000) GS:ffffa0d60ca80000(0000) knlGS:0000000000000000
[    8.297560] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    8.297561] CR2: 00007fdbb3b9eec8 CR3: 00000001176ba001 CR4: 00000000007706e0
[    8.297563] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    8.297564] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    8.297565] PKRU: 55555554
[    8.297566] Call Trace:
[    8.297569]  acpi_ps_parse_loop+0x587/0x660
[    8.297574]  acpi_ps_parse_aml+0x1af/0x552
[    8.297578]  acpi_ps_execute_method+0x208/0x2ca
[    8.297580]  acpi_ns_evaluate+0x34e/0x4f0
[    8.297583]  acpi_evaluate_object+0x18e/0x3b4
[    8.297587]  acpi_evaluate_dsm+0xb3/0x120
[    8.297593]  ? acpi_evaluate_dsm+0xb3/0x120
[    8.297596]  nfit_intel_shutdown_status+0xed/0x1a0 [nfit]
[    8.297606]  acpi_nfit_add_dimm+0x3cb/0x660 [nfit]
[    8.297614]  acpi_nfit_register_dimms+0x141/0x460 [nfit]
[    8.297620]  acpi_nfit_init+0x54f/0x620 [nfit]
[    8.327895]  acpi_nfit_add+0x18c/0x1f0 [nfit]
[    8.329341]  acpi_device_probe+0x49/0x170
[    8.330815]  really_probe+0x209/0x410
[    8.330820]  __driver_probe_device+0x109/0x180
[    8.330823]  driver_probe_device+0x23/0x90
[    8.330825]  __driver_attach+0xac/0x1b0
[    8.330828]  ? __device_attach_driver+0xe0/0xe0
[    8.330831]  bus_for_each_dev+0x7c/0xc0
[    8.330834]  driver_attach+0x1e/0x20
[    8.330835]  bus_add_driver+0x135/0x1f0
[    8.330837]  driver_register+0x95/0xf0
[    8.330840]  acpi_bus_register_driver+0x39/0x50
[    8.344874]  nfit_init+0x168/0x1000 [nfit]
[    8.344882]  ? 0xffffffffc0735000
[    8.344885]  do_one_initcall+0x46/0x1d0
[    8.350927]  ? kmem_cache_alloc_trace+0x18c/0x2c0
[    8.350933]  do_init_module+0x62/0x290
[    8.350940]  load_module+0xaa3/0xb30
[    8.350944]  __do_sys_finit_module+0xbf/0x120
[    8.350948]  __x64_sys_finit_module+0x18/0x20
[    8.350951]  do_syscall_64+0x59/0xc0
[    8.350955]  ? exit_to_user_mode_prepare+0x37/0xb0
[    8.350959]  ? irqentry_exit_to_user_mode+0x9/0x20
[    8.350963]  ? irqentry_exit+0x19/0x30
[    8.350965]  ? exc_page_fault+0x89/0x160
[    8.350968]  ? asm_exc_page_fault+0x8/0x30
[    8.350971]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[    8.350975] RIP: 0033:0x7f68ba7fc94d
[    8.350978] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b3 64 0f 00 f7 d8 64 89 01 48
[    8.350980] RSP: 002b:00007ffc7e0b93c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[    8.350984] RAX: ffffffffffffffda RBX: 000055bbb29a4a00 RCX: 00007f68ba7fc94d
[    8.350985] RDX: 0000000000000000 RSI: 00007f68ba9923fe RDI: 0000000000000006
[    8.350987] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000
[    8.350988] R10: 0000000000000006 R11: 0000000000000246 R12: 00007f68ba9923fe
[    8.350989] R13: 000055bbb28e3a20 R14: 000055bbb297d940 R15: 000055bbb297ea60

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Acked-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
tuxedo-bot pushed a commit that referenced this issue Sep 13, 2024
BugLink: https://bugs.launchpad.net/bugs/2076435

commit be346c1 upstream.

The code in ocfs2_dio_end_io_write() estimates number of necessary
transaction credits using ocfs2_calc_extend_credits().  This however does
not take into account that the IO could be arbitrarily large and can
contain arbitrary number of extents.

Extent tree manipulations do often extend the current transaction but not
in all of the cases.  For example if we have only single block extents in
the tree, ocfs2_mark_extent_written() will end up calling
ocfs2_replace_extent_rec() all the time and we will never extend the
current transaction and eventually exhaust all the transaction credits if
the IO contains many single block extents.  Once that happens a
WARN_ON(jbd2_handle_buffer_credits(handle) <= 0) is triggered in
jbd2_journal_dirty_metadata() and subsequently OCFS2 aborts in response to
this error.  This was actually triggered by one of our customers on a
heavily fragmented OCFS2 filesystem.

To fix the issue make sure the transaction always has enough credits for
one extent insert before each call of ocfs2_mark_extent_written().

Heming Zhao said:

------
PANIC: "Kernel panic - not syncing: OCFS2: (device dm-1): panic forced after error"

PID: xxx  TASK: xxxx  CPU: 5  COMMAND: "SubmitThread-CA"
  #0 machine_kexec at ffffffff8c069932
  #1 __crash_kexec at ffffffff8c1338fa
  #2 panic at ffffffff8c1d69b9
  #3 ocfs2_handle_error at ffffffffc0c86c0c [ocfs2]
  #4 __ocfs2_abort at ffffffffc0c88387 [ocfs2]
  #5 ocfs2_journal_dirty at ffffffffc0c51e98 [ocfs2]
  #6 ocfs2_split_extent at ffffffffc0c27ea3 [ocfs2]
  #7 ocfs2_change_extent_flag at ffffffffc0c28053 [ocfs2]
  #8 ocfs2_mark_extent_written at ffffffffc0c28347 [ocfs2]
  #9 ocfs2_dio_end_io_write at ffffffffc0c2bef9 [ocfs2]
#10 ocfs2_dio_end_io at ffffffffc0c2c0f5 [ocfs2]
#11 dio_complete at ffffffff8c2b9fa7
#12 do_blockdev_direct_IO at ffffffff8c2bc09f
#13 ocfs2_direct_IO at ffffffffc0c2b653 [ocfs2]
#14 generic_file_direct_write at ffffffff8c1dcf14
#15 __generic_file_write_iter at ffffffff8c1dd07b
#16 ocfs2_file_write_iter at ffffffffc0c49f1f [ocfs2]
#17 aio_write at ffffffff8c2cc72e
#18 kmem_cache_alloc at ffffffff8c248dde
#19 do_io_submit at ffffffff8c2ccada
#20 do_syscall_64 at ffffffff8c004984
#21 entry_SYSCALL_64_after_hwframe at ffffffff8c8000ba

Link: https://lkml.kernel.org/r/20240617095543.6971-1-jack@suse.cz
Link: https://lkml.kernel.org/r/20240614145243.8837-1-jack@suse.cz
Fixes: c15471f ("ocfs2: fix sparse file & data ordering issue in direct io")
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Heming Zhao <heming.zhao@suse.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Portia Stephens <portia.stephens@canonical.com>
Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
tuxedo-bot pushed a commit that referenced this issue Sep 13, 2024
BugLink: https://bugs.launchpad.net/bugs/2077600

commit d99fbd9 upstream.

Bos can be put with multiple unrelated dma-resv locks held. But
imported bos attempt to grab the bo dma-resv during dma-buf detach
that typically happens during cleanup. That leads to lockde splats
similar to the below and a potential ABBA deadlock.

Fix this by always taking the delayed workqueue cleanup path for
imported bos.

Requesting stable fixes from when the Xe driver was introduced,
since its usage of drm_exec and wide vm dma_resvs appear to be
the first reliable trigger of this.

[22982.116427] ============================================
[22982.116428] WARNING: possible recursive locking detected
[22982.116429] 6.10.0-rc2+ #10 Tainted: G     U  W
[22982.116430] --------------------------------------------
[22982.116430] glxgears:sh0/5785 is trying to acquire lock:
[22982.116431] ffff8c2bafa539a8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: dma_buf_detach+0x3b/0xf0
[22982.116438]
               but task is already holding lock:
[22982.116438] ffff8c2d9aba6da8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_exec_lock_obj+0x49/0x2b0 [drm_exec]
[22982.116442]
               other info that might help us debug this:
[22982.116442]  Possible unsafe locking scenario:

[22982.116443]        CPU0
[22982.116444]        ----
[22982.116444]   lock(reservation_ww_class_mutex);
[22982.116445]   lock(reservation_ww_class_mutex);
[22982.116447]
                *** DEADLOCK ***

[22982.116447]  May be due to missing lock nesting notation

[22982.116448] 5 locks held by glxgears:sh0/5785:
[22982.116449]  #0: ffff8c2d9aba58c8 (&xef->vm.lock){+.+.}-{3:3}, at: xe_file_close+0xde/0x1c0 [xe]
[22982.116507]  #1: ffff8c2e28cc8480 (&vm->lock){++++}-{3:3}, at: xe_vm_close_and_put+0x161/0x9b0 [xe]
[22982.116578]  #2: ffff8c2e31982970 (&val->lock){.+.+}-{3:3}, at: xe_validation_ctx_init+0x6d/0x70 [xe]
[22982.116647]  #3: ffffacdc469478a8 (reservation_ww_class_acquire){+.+.}-{0:0}, at: xe_vma_destroy_unlocked+0x7f/0xe0 [xe]
[22982.116716]  #4: ffff8c2d9aba6da8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_exec_lock_obj+0x49/0x2b0 [drm_exec]
[22982.116719]
               stack backtrace:
[22982.116720] CPU: 8 PID: 5785 Comm: glxgears:sh0 Tainted: G     U  W          6.10.0-rc2+ #10
[22982.116721] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 2001 02/01/2023
[22982.116723] Call Trace:
[22982.116724]  <TASK>
[22982.116725]  dump_stack_lvl+0x77/0xb0
[22982.116727]  __lock_acquire+0x1232/0x2160
[22982.116730]  lock_acquire+0xcb/0x2d0
[22982.116732]  ? dma_buf_detach+0x3b/0xf0
[22982.116734]  ? __lock_acquire+0x417/0x2160
[22982.116736]  __ww_mutex_lock.constprop.0+0xd0/0x13b0
[22982.116738]  ? dma_buf_detach+0x3b/0xf0
[22982.116741]  ? dma_buf_detach+0x3b/0xf0
[22982.116743]  ? ww_mutex_lock+0x2b/0x90
[22982.116745]  ww_mutex_lock+0x2b/0x90
[22982.116747]  dma_buf_detach+0x3b/0xf0
[22982.116749]  drm_prime_gem_destroy+0x2f/0x40 [drm]
[22982.116775]  xe_ttm_bo_destroy+0x32/0x220 [xe]
[22982.116818]  ? __mutex_unlock_slowpath+0x3a/0x290
[22982.116821]  drm_exec_unlock_all+0xa1/0xd0 [drm_exec]
[22982.116823]  drm_exec_fini+0x12/0xb0 [drm_exec]
[22982.116824]  xe_validation_ctx_fini+0x15/0x40 [xe]
[22982.116892]  xe_vma_destroy_unlocked+0xb1/0xe0 [xe]
[22982.116959]  xe_vm_close_and_put+0x41a/0x9b0 [xe]
[22982.117025]  ? xa_find+0xe3/0x1e0
[22982.117028]  xe_file_close+0x10a/0x1c0 [xe]
[22982.117074]  drm_file_free+0x22a/0x280 [drm]
[22982.117099]  drm_release_noglobal+0x22/0x70 [drm]
[22982.117119]  __fput+0xf1/0x2d0
[22982.117122]  task_work_run+0x59/0x90
[22982.117125]  do_exit+0x330/0xb40
[22982.117127]  do_group_exit+0x36/0xa0
[22982.117129]  get_signal+0xbd2/0xbe0
[22982.117131]  arch_do_signal_or_restart+0x3e/0x240
[22982.117134]  syscall_exit_to_user_mode+0x1e7/0x290
[22982.117137]  do_syscall_64+0xa1/0x180
[22982.117139]  ? lock_acquire+0xcb/0x2d0
[22982.117140]  ? __set_task_comm+0x28/0x1e0
[22982.117141]  ? find_held_lock+0x2b/0x80
[22982.117144]  ? __set_task_comm+0xe1/0x1e0
[22982.117145]  ? lock_release+0xca/0x290
[22982.117147]  ? __do_sys_prctl+0x245/0xab0
[22982.117149]  ? lockdep_hardirqs_on_prepare+0xde/0x190
[22982.117150]  ? syscall_exit_to_user_mode+0xb0/0x290
[22982.117152]  ? do_syscall_64+0xa1/0x180
[22982.117154]  ? __lock_acquire+0x417/0x2160
[22982.117155]  ? reacquire_held_locks+0xd1/0x1f0
[22982.117156]  ? do_user_addr_fault+0x30c/0x790
[22982.117158]  ? lock_acquire+0xcb/0x2d0
[22982.117160]  ? find_held_lock+0x2b/0x80
[22982.117162]  ? do_user_addr_fault+0x357/0x790
[22982.117163]  ? lock_release+0xca/0x290
[22982.117164]  ? do_user_addr_fault+0x361/0x790
[22982.117166]  ? trace_hardirqs_off+0x4b/0xc0
[22982.117168]  ? clear_bhb_loop+0x45/0xa0
[22982.117170]  ? clear_bhb_loop+0x45/0xa0
[22982.117172]  ? clear_bhb_loop+0x45/0xa0
[22982.117174]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[22982.117176] RIP: 0033:0x7f943d267169
[22982.117192] Code: Unable to access opcode bytes at 0x7f943d26713f.
[22982.117193] RSP: 002b:00007f9430bffc80 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
[22982.117195] RAX: fffffffffffffe00 RBX: 0000000000000000 RCX: 00007f943d267169
[22982.117196] RDX: 0000000000000000 RSI: 0000000000000189 RDI: 00005622f89579d0
[22982.117197] RBP: 00007f9430bffcb0 R08: 0000000000000000 R09: 00000000ffffffff
[22982.117198] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[22982.117199] R13: 0000000000000000 R14: 0000000000000000 R15: 00005622f89579d0
[22982.117202]  </TASK>

Fixes: dd08ebf ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Christian König <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
Cc: intel-xe@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v6.8+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240628153848.4989-1-thomas.hellstrom@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Portia Stephens <portia.stephens@canonical.com>
Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
tuxedo-bot pushed a commit that referenced this issue Sep 17, 2024
BugLink: https://bugs.launchpad.net/bugs/1942215

When the Timer operation is called, there are no arguments, and
acpi_ex_resolve_operands will be called with an out-of-bounds stack pointer
as num_operands is 0.

This does not usually cause any problems, as acpi_ex_resolve_operands will
ignore the parameter when the operation requires no arguments, as is the
case.

However, when the code is compiled with UBSAN, it will trigger, leading to
an oops with invalid opcode on Linux.

Fix it by using a NULL parameter when num_operands is 0.

[    8.285428] invalid opcode: 0000 [#1] SMP NOPTI
[    8.286436] CPU: 18 PID: 1522 Comm: systemd-udevd Not tainted 5.15.0-10-generic #10
[    8.287505] Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.0D.01.0395.022720191340 02/27/2019
[    8.288495] RIP: 0010:acpi_ds_exec_end_op+0x1be/0x7a6
[    8.289658] Code: 7b 0a 48 89 da 44 89 45 d4 48 98 48 8d 34 c3 e8 f8 3c 01 00 44 8b 45 d4 85 c0 41 89 c6 75 22 eb 9e 44 89 c0 41 80 f8 0b 76 02 <0f> 0b 48 8b 04 c5 c0 c0 ca aa 48 89 df ff d0 0f 1f 00 41 89 c4 eb
[    8.291858] RSP: 0018:ffffc38561a3f6d0 EFLAGS: 00010286
[    8.292888] RAX: 0000000000000000 RBX: ffffa0aa87c91800 RCX: 0000000000000040
[    8.294056] RDX: ffffffffffffffff RSI: ffffffffaacabf40 RDI: 00000000000002cb
[    8.295839] RBP: ffffc38561a3f700 R08: 0000000000000000 R09: ffffa0aa9f5a1000
[    8.296030] IPMI message handler: version 39.2
[    8.297554] R10: ffffa0aa89cdec00 R11: 0000000000000003 R12: 0000000000000000
[    8.297556] R13: ffffa0aa9f5a10a0 R14: 0000000000000000 R15: 0000000000000000
[    8.297558] FS:  00007f68ba26b8c0(0000) GS:ffffa0d60ca80000(0000) knlGS:0000000000000000
[    8.297560] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    8.297561] CR2: 00007fdbb3b9eec8 CR3: 00000001176ba001 CR4: 00000000007706e0
[    8.297563] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    8.297564] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    8.297565] PKRU: 55555554
[    8.297566] Call Trace:
[    8.297569]  acpi_ps_parse_loop+0x587/0x660
[    8.297574]  acpi_ps_parse_aml+0x1af/0x552
[    8.297578]  acpi_ps_execute_method+0x208/0x2ca
[    8.297580]  acpi_ns_evaluate+0x34e/0x4f0
[    8.297583]  acpi_evaluate_object+0x18e/0x3b4
[    8.297587]  acpi_evaluate_dsm+0xb3/0x120
[    8.297593]  ? acpi_evaluate_dsm+0xb3/0x120
[    8.297596]  nfit_intel_shutdown_status+0xed/0x1a0 [nfit]
[    8.297606]  acpi_nfit_add_dimm+0x3cb/0x660 [nfit]
[    8.297614]  acpi_nfit_register_dimms+0x141/0x460 [nfit]
[    8.297620]  acpi_nfit_init+0x54f/0x620 [nfit]
[    8.327895]  acpi_nfit_add+0x18c/0x1f0 [nfit]
[    8.329341]  acpi_device_probe+0x49/0x170
[    8.330815]  really_probe+0x209/0x410
[    8.330820]  __driver_probe_device+0x109/0x180
[    8.330823]  driver_probe_device+0x23/0x90
[    8.330825]  __driver_attach+0xac/0x1b0
[    8.330828]  ? __device_attach_driver+0xe0/0xe0
[    8.330831]  bus_for_each_dev+0x7c/0xc0
[    8.330834]  driver_attach+0x1e/0x20
[    8.330835]  bus_add_driver+0x135/0x1f0
[    8.330837]  driver_register+0x95/0xf0
[    8.330840]  acpi_bus_register_driver+0x39/0x50
[    8.344874]  nfit_init+0x168/0x1000 [nfit]
[    8.344882]  ? 0xffffffffc0735000
[    8.344885]  do_one_initcall+0x46/0x1d0
[    8.350927]  ? kmem_cache_alloc_trace+0x18c/0x2c0
[    8.350933]  do_init_module+0x62/0x290
[    8.350940]  load_module+0xaa3/0xb30
[    8.350944]  __do_sys_finit_module+0xbf/0x120
[    8.350948]  __x64_sys_finit_module+0x18/0x20
[    8.350951]  do_syscall_64+0x59/0xc0
[    8.350955]  ? exit_to_user_mode_prepare+0x37/0xb0
[    8.350959]  ? irqentry_exit_to_user_mode+0x9/0x20
[    8.350963]  ? irqentry_exit+0x19/0x30
[    8.350965]  ? exc_page_fault+0x89/0x160
[    8.350968]  ? asm_exc_page_fault+0x8/0x30
[    8.350971]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[    8.350975] RIP: 0033:0x7f68ba7fc94d
[    8.350978] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b3 64 0f 00 f7 d8 64 89 01 48
[    8.350980] RSP: 002b:00007ffc7e0b93c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[    8.350984] RAX: ffffffffffffffda RBX: 000055bbb29a4a00 RCX: 00007f68ba7fc94d
[    8.350985] RDX: 0000000000000000 RSI: 00007f68ba9923fe RDI: 0000000000000006
[    8.350987] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000
[    8.350988] R10: 0000000000000006 R11: 0000000000000246 R12: 00007f68ba9923fe
[    8.350989] R13: 000055bbb28e3a20 R14: 000055bbb297d940 R15: 000055bbb297ea60

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Acked-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
tuxedo-bot pushed a commit that referenced this issue Sep 27, 2024
BugLink: https://bugs.launchpad.net/bugs/2076435

commit be346c1 upstream.

The code in ocfs2_dio_end_io_write() estimates number of necessary
transaction credits using ocfs2_calc_extend_credits().  This however does
not take into account that the IO could be arbitrarily large and can
contain arbitrary number of extents.

Extent tree manipulations do often extend the current transaction but not
in all of the cases.  For example if we have only single block extents in
the tree, ocfs2_mark_extent_written() will end up calling
ocfs2_replace_extent_rec() all the time and we will never extend the
current transaction and eventually exhaust all the transaction credits if
the IO contains many single block extents.  Once that happens a
WARN_ON(jbd2_handle_buffer_credits(handle) <= 0) is triggered in
jbd2_journal_dirty_metadata() and subsequently OCFS2 aborts in response to
this error.  This was actually triggered by one of our customers on a
heavily fragmented OCFS2 filesystem.

To fix the issue make sure the transaction always has enough credits for
one extent insert before each call of ocfs2_mark_extent_written().

Heming Zhao said:

------
PANIC: "Kernel panic - not syncing: OCFS2: (device dm-1): panic forced after error"

PID: xxx  TASK: xxxx  CPU: 5  COMMAND: "SubmitThread-CA"
  #0 machine_kexec at ffffffff8c069932
  #1 __crash_kexec at ffffffff8c1338fa
  #2 panic at ffffffff8c1d69b9
  #3 ocfs2_handle_error at ffffffffc0c86c0c [ocfs2]
  #4 __ocfs2_abort at ffffffffc0c88387 [ocfs2]
  #5 ocfs2_journal_dirty at ffffffffc0c51e98 [ocfs2]
  #6 ocfs2_split_extent at ffffffffc0c27ea3 [ocfs2]
  #7 ocfs2_change_extent_flag at ffffffffc0c28053 [ocfs2]
  #8 ocfs2_mark_extent_written at ffffffffc0c28347 [ocfs2]
  #9 ocfs2_dio_end_io_write at ffffffffc0c2bef9 [ocfs2]
#10 ocfs2_dio_end_io at ffffffffc0c2c0f5 [ocfs2]
#11 dio_complete at ffffffff8c2b9fa7
#12 do_blockdev_direct_IO at ffffffff8c2bc09f
#13 ocfs2_direct_IO at ffffffffc0c2b653 [ocfs2]
#14 generic_file_direct_write at ffffffff8c1dcf14
#15 __generic_file_write_iter at ffffffff8c1dd07b
#16 ocfs2_file_write_iter at ffffffffc0c49f1f [ocfs2]
#17 aio_write at ffffffff8c2cc72e
#18 kmem_cache_alloc at ffffffff8c248dde
#19 do_io_submit at ffffffff8c2ccada
#20 do_syscall_64 at ffffffff8c004984
#21 entry_SYSCALL_64_after_hwframe at ffffffff8c8000ba

Link: https://lkml.kernel.org/r/20240617095543.6971-1-jack@suse.cz
Link: https://lkml.kernel.org/r/20240614145243.8837-1-jack@suse.cz
Fixes: c15471f ("ocfs2: fix sparse file & data ordering issue in direct io")
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Heming Zhao <heming.zhao@suse.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Portia Stephens <portia.stephens@canonical.com>
Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
tuxedo-bot pushed a commit that referenced this issue Sep 27, 2024
BugLink: https://bugs.launchpad.net/bugs/2077600

commit d99fbd9 upstream.

Bos can be put with multiple unrelated dma-resv locks held. But
imported bos attempt to grab the bo dma-resv during dma-buf detach
that typically happens during cleanup. That leads to lockde splats
similar to the below and a potential ABBA deadlock.

Fix this by always taking the delayed workqueue cleanup path for
imported bos.

Requesting stable fixes from when the Xe driver was introduced,
since its usage of drm_exec and wide vm dma_resvs appear to be
the first reliable trigger of this.

[22982.116427] ============================================
[22982.116428] WARNING: possible recursive locking detected
[22982.116429] 6.10.0-rc2+ #10 Tainted: G     U  W
[22982.116430] --------------------------------------------
[22982.116430] glxgears:sh0/5785 is trying to acquire lock:
[22982.116431] ffff8c2bafa539a8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: dma_buf_detach+0x3b/0xf0
[22982.116438]
               but task is already holding lock:
[22982.116438] ffff8c2d9aba6da8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_exec_lock_obj+0x49/0x2b0 [drm_exec]
[22982.116442]
               other info that might help us debug this:
[22982.116442]  Possible unsafe locking scenario:

[22982.116443]        CPU0
[22982.116444]        ----
[22982.116444]   lock(reservation_ww_class_mutex);
[22982.116445]   lock(reservation_ww_class_mutex);
[22982.116447]
                *** DEADLOCK ***

[22982.116447]  May be due to missing lock nesting notation

[22982.116448] 5 locks held by glxgears:sh0/5785:
[22982.116449]  #0: ffff8c2d9aba58c8 (&xef->vm.lock){+.+.}-{3:3}, at: xe_file_close+0xde/0x1c0 [xe]
[22982.116507]  #1: ffff8c2e28cc8480 (&vm->lock){++++}-{3:3}, at: xe_vm_close_and_put+0x161/0x9b0 [xe]
[22982.116578]  #2: ffff8c2e31982970 (&val->lock){.+.+}-{3:3}, at: xe_validation_ctx_init+0x6d/0x70 [xe]
[22982.116647]  #3: ffffacdc469478a8 (reservation_ww_class_acquire){+.+.}-{0:0}, at: xe_vma_destroy_unlocked+0x7f/0xe0 [xe]
[22982.116716]  #4: ffff8c2d9aba6da8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_exec_lock_obj+0x49/0x2b0 [drm_exec]
[22982.116719]
               stack backtrace:
[22982.116720] CPU: 8 PID: 5785 Comm: glxgears:sh0 Tainted: G     U  W          6.10.0-rc2+ #10
[22982.116721] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 2001 02/01/2023
[22982.116723] Call Trace:
[22982.116724]  <TASK>
[22982.116725]  dump_stack_lvl+0x77/0xb0
[22982.116727]  __lock_acquire+0x1232/0x2160
[22982.116730]  lock_acquire+0xcb/0x2d0
[22982.116732]  ? dma_buf_detach+0x3b/0xf0
[22982.116734]  ? __lock_acquire+0x417/0x2160
[22982.116736]  __ww_mutex_lock.constprop.0+0xd0/0x13b0
[22982.116738]  ? dma_buf_detach+0x3b/0xf0
[22982.116741]  ? dma_buf_detach+0x3b/0xf0
[22982.116743]  ? ww_mutex_lock+0x2b/0x90
[22982.116745]  ww_mutex_lock+0x2b/0x90
[22982.116747]  dma_buf_detach+0x3b/0xf0
[22982.116749]  drm_prime_gem_destroy+0x2f/0x40 [drm]
[22982.116775]  xe_ttm_bo_destroy+0x32/0x220 [xe]
[22982.116818]  ? __mutex_unlock_slowpath+0x3a/0x290
[22982.116821]  drm_exec_unlock_all+0xa1/0xd0 [drm_exec]
[22982.116823]  drm_exec_fini+0x12/0xb0 [drm_exec]
[22982.116824]  xe_validation_ctx_fini+0x15/0x40 [xe]
[22982.116892]  xe_vma_destroy_unlocked+0xb1/0xe0 [xe]
[22982.116959]  xe_vm_close_and_put+0x41a/0x9b0 [xe]
[22982.117025]  ? xa_find+0xe3/0x1e0
[22982.117028]  xe_file_close+0x10a/0x1c0 [xe]
[22982.117074]  drm_file_free+0x22a/0x280 [drm]
[22982.117099]  drm_release_noglobal+0x22/0x70 [drm]
[22982.117119]  __fput+0xf1/0x2d0
[22982.117122]  task_work_run+0x59/0x90
[22982.117125]  do_exit+0x330/0xb40
[22982.117127]  do_group_exit+0x36/0xa0
[22982.117129]  get_signal+0xbd2/0xbe0
[22982.117131]  arch_do_signal_or_restart+0x3e/0x240
[22982.117134]  syscall_exit_to_user_mode+0x1e7/0x290
[22982.117137]  do_syscall_64+0xa1/0x180
[22982.117139]  ? lock_acquire+0xcb/0x2d0
[22982.117140]  ? __set_task_comm+0x28/0x1e0
[22982.117141]  ? find_held_lock+0x2b/0x80
[22982.117144]  ? __set_task_comm+0xe1/0x1e0
[22982.117145]  ? lock_release+0xca/0x290
[22982.117147]  ? __do_sys_prctl+0x245/0xab0
[22982.117149]  ? lockdep_hardirqs_on_prepare+0xde/0x190
[22982.117150]  ? syscall_exit_to_user_mode+0xb0/0x290
[22982.117152]  ? do_syscall_64+0xa1/0x180
[22982.117154]  ? __lock_acquire+0x417/0x2160
[22982.117155]  ? reacquire_held_locks+0xd1/0x1f0
[22982.117156]  ? do_user_addr_fault+0x30c/0x790
[22982.117158]  ? lock_acquire+0xcb/0x2d0
[22982.117160]  ? find_held_lock+0x2b/0x80
[22982.117162]  ? do_user_addr_fault+0x357/0x790
[22982.117163]  ? lock_release+0xca/0x290
[22982.117164]  ? do_user_addr_fault+0x361/0x790
[22982.117166]  ? trace_hardirqs_off+0x4b/0xc0
[22982.117168]  ? clear_bhb_loop+0x45/0xa0
[22982.117170]  ? clear_bhb_loop+0x45/0xa0
[22982.117172]  ? clear_bhb_loop+0x45/0xa0
[22982.117174]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[22982.117176] RIP: 0033:0x7f943d267169
[22982.117192] Code: Unable to access opcode bytes at 0x7f943d26713f.
[22982.117193] RSP: 002b:00007f9430bffc80 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
[22982.117195] RAX: fffffffffffffe00 RBX: 0000000000000000 RCX: 00007f943d267169
[22982.117196] RDX: 0000000000000000 RSI: 0000000000000189 RDI: 00005622f89579d0
[22982.117197] RBP: 00007f9430bffcb0 R08: 0000000000000000 R09: 00000000ffffffff
[22982.117198] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[22982.117199] R13: 0000000000000000 R14: 0000000000000000 R15: 00005622f89579d0
[22982.117202]  </TASK>

Fixes: dd08ebf ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Christian König <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
Cc: intel-xe@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v6.8+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240628153848.4989-1-thomas.hellstrom@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Portia Stephens <portia.stephens@canonical.com>
Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
tuxedo-bot pushed a commit that referenced this issue Oct 8, 2024
BugLink: https://bugs.launchpad.net/bugs/2080594

[ Upstream commit a699781 ]

A sysfs reader can race with a device reset or removal, attempting to
read device state when the device is not actually present. eg:

     [exception RIP: qed_get_current_link+17]
  #8 [ffffb9e4f2907c48] qede_get_link_ksettings at ffffffffc07a994a [qede]
  #9 [ffffb9e4f2907cd8] __rh_call_get_link_ksettings at ffffffff992b01a3
 #10 [ffffb9e4f2907d38] __ethtool_get_link_ksettings at ffffffff992b04e4
 #11 [ffffb9e4f2907d90] duplex_show at ffffffff99260300
 #12 [ffffb9e4f2907e38] dev_attr_show at ffffffff9905a01c
 #13 [ffffb9e4f2907e50] sysfs_kf_seq_show at ffffffff98e0145b
 #14 [ffffb9e4f2907e68] seq_read at ffffffff98d902e3
 #15 [ffffb9e4f2907ec8] vfs_read at ffffffff98d657d1
 #16 [ffffb9e4f2907f00] ksys_read at ffffffff98d65c3f
 #17 [ffffb9e4f2907f38] do_syscall_64 at ffffffff98a052fb

 crash> struct net_device.state ffff9a9d21336000
    state = 5,

state 5 is __LINK_STATE_START (0b1) and __LINK_STATE_NOCARRIER (0b100).
The device is not present, note lack of __LINK_STATE_PRESENT (0b10).

This is the same sort of panic as observed in commit 4224cfd
("net-sysfs: add check for netdevice being present to speed_show").

There are many other callers of __ethtool_get_link_ksettings() which
don't have a device presence check.

Move this check into ethtool to protect all callers.

Fixes: d519e17 ("net: export device speed and duplex via sysfs")
Fixes: 4224cfd ("net-sysfs: add check for netdevice being present to speed_show")
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Link: https://patch.msgid.link/8bae218864beaa44ed01628140475b9bf641c5b0.1724393671.git.jamie.bainbridge@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
tuxedo-bot pushed a commit that referenced this issue Nov 15, 2024
BugLink: https://bugs.launchpad.net/bugs/2086242

commit 7d59ac07ccb58f8f604f8057db63b8efcebeb3de upstream.

Attaching SST PCI device to VM causes "BUG: KASAN: slab-out-of-bounds".
kasan report:
[   19.411889] ==================================================================
[   19.413702] BUG: KASAN: slab-out-of-bounds in _isst_if_get_pci_dev+0x3d5/0x400 [isst_if_common]
[   19.415634] Read of size 8 at addr ffff888829e65200 by task cpuhp/16/113
[   19.417368]
[   19.418627] CPU: 16 PID: 113 Comm: cpuhp/16 Tainted: G            E      6.9.0 #10
[   19.420435] Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference Platform, BIOS VMW201.00V.20192059.B64.2207280713 07/28/2022
[   19.422687] Call Trace:
[   19.424091]  <TASK>
[   19.425448]  dump_stack_lvl+0x5d/0x80
[   19.426963]  ? _isst_if_get_pci_dev+0x3d5/0x400 [isst_if_common]
[   19.428694]  print_report+0x19d/0x52e
[   19.430206]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[   19.431837]  ? _isst_if_get_pci_dev+0x3d5/0x400 [isst_if_common]
[   19.433539]  kasan_report+0xf0/0x170
[   19.435019]  ? _isst_if_get_pci_dev+0x3d5/0x400 [isst_if_common]
[   19.436709]  _isst_if_get_pci_dev+0x3d5/0x400 [isst_if_common]
[   19.438379]  ? __pfx_sched_clock_cpu+0x10/0x10
[   19.439910]  isst_if_cpu_online+0x406/0x58f [isst_if_common]
[   19.441573]  ? __pfx_isst_if_cpu_online+0x10/0x10 [isst_if_common]
[   19.443263]  ? ttwu_queue_wakelist+0x2c1/0x360
[   19.444797]  cpuhp_invoke_callback+0x221/0xec0
[   19.446337]  cpuhp_thread_fun+0x21b/0x610
[   19.447814]  ? __pfx_cpuhp_thread_fun+0x10/0x10
[   19.449354]  smpboot_thread_fn+0x2e7/0x6e0
[   19.450859]  ? __pfx_smpboot_thread_fn+0x10/0x10
[   19.452405]  kthread+0x29c/0x350
[   19.453817]  ? __pfx_kthread+0x10/0x10
[   19.455253]  ret_from_fork+0x31/0x70
[   19.456685]  ? __pfx_kthread+0x10/0x10
[   19.458114]  ret_from_fork_asm+0x1a/0x30
[   19.459573]  </TASK>
[   19.460853]
[   19.462055] Allocated by task 1198:
[   19.463410]  kasan_save_stack+0x30/0x50
[   19.464788]  kasan_save_track+0x14/0x30
[   19.466139]  __kasan_kmalloc+0xaa/0xb0
[   19.467465]  __kmalloc+0x1cd/0x470
[   19.468748]  isst_if_cdev_register+0x1da/0x350 [isst_if_common]
[   19.470233]  isst_if_mbox_init+0x108/0xff0 [isst_if_mbox_msr]
[   19.471670]  do_one_initcall+0xa4/0x380
[   19.472903]  do_init_module+0x238/0x760
[   19.474105]  load_module+0x5239/0x6f00
[   19.475285]  init_module_from_file+0xd1/0x130
[   19.476506]  idempotent_init_module+0x23b/0x650
[   19.477725]  __x64_sys_finit_module+0xbe/0x130
[   19.476506]  idempotent_init_module+0x23b/0x650
[   19.477725]  __x64_sys_finit_module+0xbe/0x130
[   19.478920]  do_syscall_64+0x82/0x160
[   19.480036]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   19.481292]
[   19.482205] The buggy address belongs to the object at ffff888829e65000
 which belongs to the cache kmalloc-512 of size 512
[   19.484818] The buggy address is located 0 bytes to the right of
 allocated 512-byte region [ffff888829e65000, ffff888829e65200)
[   19.487447]
[   19.488328] The buggy address belongs to the physical page:
[   19.489569] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888829e60c00 pfn:0x829e60
[   19.491140] head: order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[   19.492466] anon flags: 0x57ffffc0000840(slab|head|node=1|zone=2|lastcpupid=0x1fffff)
[   19.493914] page_type: 0xffffffff()
[   19.494988] raw: 0057ffffc0000840 ffff88810004cc80 0000000000000000 0000000000000001
[   19.496451] raw: ffff888829e60c00 0000000080200018 00000001ffffffff 0000000000000000
[   19.497906] head: 0057ffffc0000840 ffff88810004cc80 0000000000000000 0000000000000001
[   19.499379] head: ffff888829e60c00 0000000080200018 00000001ffffffff 0000000000000000
[   19.500844] head: 0057ffffc0000003 ffffea0020a79801 ffffea0020a79848 00000000ffffffff
[   19.502316] head: 0000000800000000 0000000000000000 00000000ffffffff 0000000000000000
[   19.503784] page dumped because: kasan: bad access detected
[   19.505058]
[   19.505970] Memory state around the buggy address:
[   19.507172]  ffff888829e65100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   19.508599]  ffff888829e65180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   19.510013] >ffff888829e65200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   19.510014]                    ^
[   19.510016]  ffff888829e65280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   19.510018]  ffff888829e65300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   19.515367] ==================================================================

The reason for this error is physical_package_ids assigned by VMware VMM
are not continuous and have gaps. This will cause value returned by
topology_physical_package_id() to be more than topology_max_packages().

Here the allocation uses topology_max_packages(). The call to
topology_max_packages() returns maximum logical package ID not physical
ID. Hence use topology_logical_package_id() instead of
topology_physical_package_id().

Fixes: 9a1aac8 ("platform/x86: ISST: PUNIT device mapping with Sub-NUMA clustering")
Cc: stable@vger.kernel.org
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Zach Wade <zachwade.k@gmail.com>
Link: https://lore.kernel.org/r/20240923144508.1764-1-zachwade.k@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
tuxedo-bot pushed a commit that referenced this issue Nov 15, 2024
BugLink: https://bugs.launchpad.net/bugs/2086242

commit ac01c8c4246546fd8340a232f3ada1921dc0ee48 upstream.

AddressSanitizer found a use-after-free bug in the symbol code which
manifested as 'perf top' segfaulting.

  ==1238389==ERROR: AddressSanitizer: heap-use-after-free on address 0x60b00c48844b at pc 0x5650d8035961 bp 0x7f751aaecc90 sp 0x7f751aaecc80
  READ of size 1 at 0x60b00c48844b thread T193
      #0 0x5650d8035960 in _sort__sym_cmp util/sort.c:310
      #1 0x5650d8043744 in hist_entry__cmp util/hist.c:1286
      #2 0x5650d8043951 in hists__findnew_entry util/hist.c:614
      #3 0x5650d804568f in __hists__add_entry util/hist.c:754
      #4 0x5650d8045bf9 in hists__add_entry util/hist.c:772
      #5 0x5650d8045df1 in iter_add_single_normal_entry util/hist.c:997
      #6 0x5650d8043326 in hist_entry_iter__add util/hist.c:1242
      #7 0x5650d7ceeefe in perf_event__process_sample /home/matt/src/linux/tools/perf/builtin-top.c:845
      #8 0x5650d7ceeefe in deliver_event /home/matt/src/linux/tools/perf/builtin-top.c:1208
      #9 0x5650d7fdb51b in do_flush util/ordered-events.c:245
      #10 0x5650d7fdb51b in __ordered_events__flush util/ordered-events.c:324
      #11 0x5650d7ced743 in process_thread /home/matt/src/linux/tools/perf/builtin-top.c:1120
      #12 0x7f757ef1f133 in start_thread nptl/pthread_create.c:442
      #13 0x7f757ef9f7db in clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

When updating hist maps it's also necessary to update the hist symbol
reference because the old one gets freed in map__put().

While this bug was probably introduced with 5c24b67 ("perf
tools: Replace map->referenced & maps->removed_maps with map->refcnt"),
the symbol objects were leaked until c087e94 ("perf machine:
Fix refcount usage when processing PERF_RECORD_KSYMBOL") was merged so
the bug was masked.

Fixes: c087e94 ("perf machine: Fix refcount usage when processing PERF_RECORD_KSYMBOL")
Reported-by: Yunzhao Li <yunzhao@cloudflare.com>
Signed-off-by: Matt Fleming (Cloudflare) <matt@readmodwrite.com>
Cc: Ian Rogers <irogers@google.com>
Cc: kernel-team@cloudflare.com
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: stable@vger.kernel.org # v5.13+
Link: https://lore.kernel.org/r/20240815142212.3834625-1-matt@readmodwrite.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
tuxedo-bot pushed a commit that referenced this issue Nov 15, 2024
BugLink: https://bugs.launchpad.net/bugs/2086242

commit 9af2efee41b27a0f386fb5aa95d8d0b4b5d9fede upstream.

The fields in the hist_entry are filled on-demand which means they only
have meaningful values when relevant sort keys are used.

So if neither of 'dso' nor 'sym' sort keys are used, the map/symbols in
the hist entry can be garbage.  So it shouldn't access it
unconditionally.

I got a segfault, when I wanted to see cgroup profiles.

  $ sudo perf record -a --all-cgroups --synth=cgroup true

  $ sudo perf report -s cgroup

  Program received signal SIGSEGV, Segmentation fault.
  0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48
  48		return RC_CHK_ACCESS(map)->dso;
  (gdb) bt
  #0  0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48
  #1  0x00005555557aa39b in map__load (map=0x0) at util/map.c:344
  #2  0x00005555557aa592 in map__find_symbol (map=0x0, addr=140736115941088) at util/map.c:385
  #3  0x00005555557ef000 in hists__findnew_entry (hists=0x555556039d60, entry=0x7fffffffa4c0, al=0x7fffffffa8c0, sample_self=true)
      at util/hist.c:644
  #4  0x00005555557ef61c in __hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0,
      block_info=0x0, sample=0x7fffffffaa90, sample_self=true, ops=0x0) at util/hist.c:761
  #5  0x00005555557ef71f in hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0,
      sample=0x7fffffffaa90, sample_self=true) at util/hist.c:779
  #6  0x00005555557f00fb in iter_add_single_normal_entry (iter=0x7fffffffa900, al=0x7fffffffa8c0) at util/hist.c:1015
  #7  0x00005555557f09a7 in hist_entry_iter__add (iter=0x7fffffffa900, al=0x7fffffffa8c0, max_stack_depth=127, arg=0x7fffffffbce0)
      at util/hist.c:1260
  #8  0x00005555555ba7ce in process_sample_event (tool=0x7fffffffbce0, event=0x7ffff7c14128, sample=0x7fffffffaa90, evsel=0x555556039ad0,
      machine=0x5555560388e8) at builtin-report.c:334
  #9  0x00005555557b30c8 in evlist__deliver_sample (evlist=0x555556039010, tool=0x7fffffffbce0, event=0x7ffff7c14128,
      sample=0x7fffffffaa90, evsel=0x555556039ad0, machine=0x5555560388e8) at util/session.c:1232
  #10 0x00005555557b32bc in machines__deliver_event (machines=0x5555560388e8, evlist=0x555556039010, event=0x7ffff7c14128,
      sample=0x7fffffffaa90, tool=0x7fffffffbce0, file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1271
  #11 0x00005555557b3848 in perf_session__deliver_event (session=0x5555560386d0, event=0x7ffff7c14128, tool=0x7fffffffbce0,
      file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1354
  #12 0x00005555557affaf in ordered_events__deliver_event (oe=0x555556038e60, event=0x555556135aa0) at util/session.c:132
  #13 0x00005555557bb605 in do_flush (oe=0x555556038e60, show_progress=false) at util/ordered-events.c:245
  #14 0x00005555557bb95c in __ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND, timestamp=0) at util/ordered-events.c:324
  #15 0x00005555557bba46 in ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND) at util/ordered-events.c:342
  #16 0x00005555557b1b3b in perf_event__process_finished_round (tool=0x7fffffffbce0, event=0x7ffff7c15bb8, oe=0x555556038e60)
      at util/session.c:780
  #17 0x00005555557b3b27 in perf_session__process_user_event (session=0x5555560386d0, event=0x7ffff7c15bb8, file_offset=117688,
      file_path=0x555556038ff0 "perf.data") at util/session.c:1406

As you can see the entry->ms.map was NULL even if he->ms.map has a
value.  This is because 'sym' sort key is not given, so it cannot assume
whether he->ms.sym and entry->ms.sym is the same.  I only checked the
'sym' sort key here as it implies 'dso' behavior (so maps are the same).

Fixes: ac01c8c4246546fd ("perf hist: Update hist symbol when updating maps")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Matt Fleming <matt@readmodwrite.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240826221045.1202305-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants