Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arch: loongarch: replace BPI implementation with a minimal patchset #356

Merged
merged 14 commits into from
Aug 14, 2024

Conversation

MingcongBai
Copy link
Contributor

Introduce old-world firmware boot support with a more minimal patchset. Also fixes shutdown on systems that do not declare ACPI S5 support:

  • loongarch/reset: use general efi poweroff
  • loongarch: fix missing dependency info in DSDT
  • loongarch: correct missing offset of PCI root controller in DSDT table
  • loongarch: add MADT ACPI table conversion
  • loongarch: parse BPI data and add memory mapping
  • loongarch: basic boot support for legacy firmware
  • Revert "LoongArch: add kernel setvirtmap for runtime"
  • Revert "LoongArch: Old BPI compatibility"
  • Revert "LoongArch: Fix virtual machine startup error"
  • Revert "LoongArch: Fixed EIOINTC structure members"
  • Revert "LoongArch: use arch specific phys_to_dma"
  • Revert "LoongArch: fix KASLR can not be disabled by nokaslr when boot from old BPI"

Tested on ML5A, Lenovo Kaitian M540z, and Excelsior L71 (laptop).

@@ -44,6 +44,15 @@ static inline u32 get_acpi_id_for_cpu(unsigned int cpu)
return acpi_core_pic[cpu_logical_map(cpu)].processor_id;
}

#define ACPI_HAVE_ARCH_TABLE_OVERRIDE
extern void acpi_arch_os_table_override (struct acpi_table_header *existing_table, struct acpi_table_header **new_table);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Prefer a maximum 75 chars per line (possible unwrapped commit description?)

@@ -11,7 +11,6 @@ obj-y += head.o cpu-probe.o cacheinfo.o env.o setup.o entry.o genex.o \
traps.o irq.o idle.o process.o dma.o mem.o io.o reset.o switch.o \
elf.o syscall.o signal.o time.o topology.o inst.o ptrace.o vdso.o \
alternative.o unwind.o
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ERROR: patch seems to be corrupt (line wrapped?)

long nid = (daddr >> node_id_offset) & 0xf;

return ((nid << node_id_offset) ^ daddr) | (nid << 44);
}

void acpi_arch_dma_setup(struct device *dev)
{
int ret;
u64 mask, end = 0;
const struct bus_dma_region *map = NULL;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Prefer a maximum 75 chars per line (possible unwrapped commit description?)

#define LOONGSON3_BOOT_MEM_MAP_MAX 128
#define RT_MAP_START 100
#define FIX_MAP_ENTRY 32
#define LOONGARCH_BPI_GUID EFI_GUID(0x4660f721, 0x2ec5, 0x416a, 0x89, 0x9a, 0x43, 0x18, 0x02, 0x50, 0xa0, 0xc9)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Prefer a maximum 75 chars per line (possible unwrapped commit description?)

void __init memblock_init(void)
{
u32 i, mem_type;
u32 mem_type;
u64 mem_start, mem_end, mem_size;
efi_memory_desc_t *md;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Prefer a maximum 75 chars per line (possible unwrapped commit description?)

@@ -23,6 +23,8 @@ struct exit_boot_struct {
int runtime_entry_count;
};

static int is_oldworld = 0;

static efi_status_t exit_boot_func(struct efi_boot_memmap *map, void *priv)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Prefer a maximum 75 chars per line (possible unwrapped commit description?)

int cores = (cpu_has_hypervisor ? MAX_CORES_PER_EIO_NODE : CORES_PER_EIO_NODE);

return cpu_logical_map(cpu) / cores;
return cpu_logical_map(cpu) / CORES_PER_EIO_NODE;
}

static void eiointc_set_irq_route(int pos, unsigned int cpu, unsigned int mnode, nodemask_t *node_map)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Prefer a maximum 75 chars per line (possible unwrapped commit description?)

@@ -762,6 +762,16 @@ static inline u64 acpi_arch_get_root_pointer(void)
int acpi_get_local_address(acpi_handle handle, u32 *addr);
const char *acpi_get_subsystem_id(acpi_handle handle);

#ifndef ACPI_HAVE_ARCH_TABLE_OVERRIDE
static inline void acpi_arch_os_table_override (struct acpi_table_header *existing_table, struct acpi_table_header **new_table){
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Prefer a maximum 75 chars per line (possible unwrapped commit description?)

@@ -133,6 +133,10 @@ static inline void pci_acpi_remove_edr_notifier(struct pci_dev *pdev) { }
int pci_acpi_set_companion_lookup_hook(struct acpi_device *(*func)(struct pci_dev *));
void pci_acpi_clear_companion_lookup_hook(void);

#ifndef ACPI_HAVE_ARCH_PCI_ROOT_RES_FILTER
static inline void acpi_arch_pci_probe_root_dev_filter(struct resource_entry *entry) { }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Prefer a maximum 75 chars per line (possible unwrapped commit description?)

@Avenger-285714 Avenger-285714 changed the title arch: loongarch: replace BPI implementation with a minimal patchset [WIP] arch: loongarch: replace BPI implementation with a minimal patchset Aug 9, 2024
Currently efi_shutdown_init can register a general sys_off handler,
efi_power_off, which will be called during do_kernel_power_off and shut
the machine off via efi runtime services. So enable this by providing
efi_poweroff_required, like arm and x86, and prevent directly calling
efi.reset_system in machine_power_off.

Signed-off-by: Miao Wang <shankerwangmiao@gmail.com>
The addresses passed from legacy firmware in efi system tables, the
initrd table, the efi system memory mapping table are virtual addresses.
This patch converts all that addresses back to physical addresses, to
make sure all the following booting processes can run smoothly like on
modern firmwares. The conversion happens unconditionally, since physical
addresses passed in by modern firmwares remain unchanged after the
conversion.

In the EFI Stub, the mapping entries given by GetMemoryMap() on legacy
firmwares contain virtual addresses in the phys_addr fields and 0x1xxxx
addresses in the virt_addr fields, causing the later call to
SetVirtualAddressMap() fails. This patch fixes this by correcting the
addresses in the virt_addr fields. This patch detects the existence of
the legacy firmware by reading DMW1 CSR, as done in the legacy loongarch
GRUB port. Only if legacy firmwares detected, this correction happens.

With this patch, the linux kernel is basically able to boot.
@MingcongBai MingcongBai force-pushed the bai/loong64-clean-bpi-support branch from 860c024 to 3fb1481 Compare August 14, 2024 09:09
@@ -44,6 +44,15 @@ static inline u32 get_acpi_id_for_cpu(unsigned int cpu)
return acpi_core_pic[cpu_logical_map(cpu)].processor_id;
}

#define ACPI_HAVE_ARCH_TABLE_OVERRIDE
extern void acpi_arch_os_table_override (struct acpi_table_header *existing_table, struct acpi_table_header **new_table);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Prefer a maximum 75 chars per line (possible unwrapped commit description?)

@@ -11,7 +11,6 @@ obj-y += head.o cpu-probe.o cacheinfo.o env.o setup.o entry.o genex.o \
traps.o irq.o idle.o process.o dma.o mem.o io.o reset.o switch.o \
elf.o syscall.o signal.o time.o topology.o inst.o ptrace.o vdso.o \
alternative.o unwind.o

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ERROR: Avoid using diff content in the commit message - patch(1) might not work

@@ -11,7 +11,6 @@ obj-y += head.o cpu-probe.o cacheinfo.o env.o setup.o entry.o genex.o \
traps.o irq.o idle.o process.o dma.o mem.o io.o reset.o switch.o \
elf.o syscall.o signal.o time.o topology.o inst.o ptrace.o vdso.o \
alternative.o unwind.o

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ERROR: patch seems to be corrupt (line wrapped?)

long nid = (daddr >> node_id_offset) & 0xf;

return ((nid << node_id_offset) ^ daddr) | (nid << 44);
}

void acpi_arch_dma_setup(struct device *dev)
{
int ret;
u64 mask, end = 0;
const struct bus_dma_region *map = NULL;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Prefer a maximum 75 chars per line (possible unwrapped commit description?)

#define LOONGSON3_BOOT_MEM_MAP_MAX 128
#define RT_MAP_START 100
#define FIX_MAP_ENTRY 32
#define LOONGARCH_BPI_GUID EFI_GUID(0x4660f721, 0x2ec5, 0x416a, 0x89, 0x9a, 0x43, 0x18, 0x02, 0x50, 0xa0, 0xc9)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Prefer a maximum 75 chars per line (possible unwrapped commit description?)

void __init memblock_init(void)
{
u32 i, mem_type;
u32 mem_type;
u64 mem_start, mem_end, mem_size;
efi_memory_desc_t *md;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Prefer a maximum 75 chars per line (possible unwrapped commit description?)

@@ -23,6 +23,8 @@ struct exit_boot_struct {
int runtime_entry_count;
};

static int is_oldworld = 0;

static efi_status_t exit_boot_func(struct efi_boot_memmap *map, void *priv)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Prefer a maximum 75 chars per line (possible unwrapped commit description?)

On machines with legacy firmware with BPI version BPI01000, i.e. the
older version, the content of the MADT ACPI table is not following the
later finalized ACPI standard. A conversion of the content is added by
this patch. Also the patch generates and adds a MCFG table accordingly.

The above behavior is only enabled when BPI data and the BPI01000 version
is detected.
On machines with legacy firmware with BPI version BPI01000, the PCI root
controller information in the DSDT table lacks AddressTranslation in its
WordIO resource defination, causing the failure of the registration for the
LIO address space. This patch corrects this issue when the given offset
is zero for the PCI root controller.

This patch also fixes the lack of the leading 16K, i.e. ISA_IOSIZE, in
the defination of WordIO resource. This is because that address range is
registered unconditionally on the legacy loongarch linux port.

This patch also fixes the start addresses or end addresses of WordIO
resource not aligned to the page size by rouding the addresses up to the
nearest page starting.

The above behavior is only enabled when BPI data and the BPI01000 version
is detected.
On machines with legacy firmware with BPI01000 version, the devices on
the LIO bus lackes the dependency on the PCI root controller, causing
the memory-mapped address of the legacy IO ports read before the setup
of the mapping, resulting in kernel panic. Such DSDT can work on the
legacy loongarch linux port because the leading 16K is unconditionally
registered, before the enumeration of the devices in the DSDT table.

This patch addes such dependency info, to order the initialization of
the devices on the LIO bus after the initialization of the PCI root
controller, fixing this problem. However, the addition should be done on
each possible LIO device, and currently the patch only includes the
legacy EC device on some laptops located at the path \_SB.PCI0.LPC.EC.
Thus, this patch will be improved to include more devices.

The above behavior is only enabled when BPI data and the BPI01000 version
is detected.
Loongarch CPUs are using HT bus interface, which is only 40-bit wide,
to communicate with the bridge chipset. However, the actual memory
bus of the CPUs is 48-bit wide, and the available address space of DMA
requests of PCI devices is also larger than the 40-bit address space.
The address space of loongarch systems is not continuous, the bits
[47:44] of which are used to denote the belonging node id, which is
far beyond the space provided by the 40-bit wide HT bus. As a result,
a translation on the both LS7A and CPU sides is needed.

The translation happens on the LS7A side is controlled by the higher
half of the register, HT_ROUTE. Bits [12:8] denotes dma_node_id_offset
and bits[15:13] denotes dma_node_id_offset_mapped. The behavior of the
translation is that the chip extracts the node id from bit
dma_node_id_offset + 36 of a DMA address, places it at bit
dma_node_id_offset_mapped + 32, and generates the address on the HT bus.
On the CPU side, an alike translation happens, to convert the address on
the HT bus back to a proper memory address.

On machines with legacy firmware with BPI01000 version,
dma_node_id_offset is configured with 0, resulting the address which
should be used by the DMA engine of a PCI device differs from the
actural physical memeory address, which requires a pair of arch-specific
phys_to_dma and dma_to_phys functions or setting up the whole mapping
in the _DMA method of the PCI root device in the DSDT table.

The former method requires we add back the prevoiusly removed functions,
and the latter method degrades the performance since when the
translation happens, the mapping table is scanned linearly.

This patch addresses this issue by directly setting dma_node_id_offset
to 8, like what is done by modern firmwares, making the address used by
the DMA engine just the same as the actual physical memory address, and
eliminating the need of DMA address translation on the kernel side.
On machines with legacy firmware with BPI01000 version, the
HT_RX_INT_TRANS register on the node 5, i.e. the node connected with the
second LS7A bridge chip, is not initialized correctly, causing the
failure of the delivery of interrupts from PCIe devices connected to the
second LS7A bridge chip.

This patch fixes this by correctly pointing the HT_RX_INT_TRANS register
to the EXT_IOI_SEND_OFF register on the corresponding node.
@MingcongBai MingcongBai force-pushed the bai/loong64-clean-bpi-support branch from 3fb1481 to df06a5a Compare August 14, 2024 11:13
@@ -44,6 +44,15 @@ static inline u32 get_acpi_id_for_cpu(unsigned int cpu)
return acpi_core_pic[cpu_logical_map(cpu)].processor_id;
}

#define ACPI_HAVE_ARCH_TABLE_OVERRIDE
extern void acpi_arch_os_table_override (struct acpi_table_header *existing_table, struct acpi_table_header **new_table);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Prefer a maximum 75 chars per line (possible unwrapped commit description?)

@@ -11,7 +11,6 @@ obj-y += head.o cpu-probe.o cacheinfo.o env.o setup.o entry.o genex.o \
traps.o irq.o idle.o process.o dma.o mem.o io.o reset.o switch.o \
elf.o syscall.o signal.o time.o topology.o inst.o ptrace.o vdso.o \
alternative.o unwind.o

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ERROR: Avoid using diff content in the commit message - patch(1) might not work

@@ -11,7 +11,6 @@ obj-y += head.o cpu-probe.o cacheinfo.o env.o setup.o entry.o genex.o \
traps.o irq.o idle.o process.o dma.o mem.o io.o reset.o switch.o \
elf.o syscall.o signal.o time.o topology.o inst.o ptrace.o vdso.o \
alternative.o unwind.o

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ERROR: patch seems to be corrupt (line wrapped?)

long nid = (daddr >> node_id_offset) & 0xf;

return ((nid << node_id_offset) ^ daddr) | (nid << 44);
}

void acpi_arch_dma_setup(struct device *dev)
{
int ret;
u64 mask, end = 0;
const struct bus_dma_region *map = NULL;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Prefer a maximum 75 chars per line (possible unwrapped commit description?)

#define LOONGSON3_BOOT_MEM_MAP_MAX 128
#define RT_MAP_START 100
#define FIX_MAP_ENTRY 32
#define LOONGARCH_BPI_GUID EFI_GUID(0x4660f721, 0x2ec5, 0x416a, 0x89, 0x9a, 0x43, 0x18, 0x02, 0x50, 0xa0, 0xc9)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Prefer a maximum 75 chars per line (possible unwrapped commit description?)

void __init memblock_init(void)
{
u32 i, mem_type;
u32 mem_type;
u64 mem_start, mem_end, mem_size;
efi_memory_desc_t *md;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Prefer a maximum 75 chars per line (possible unwrapped commit description?)

@deepin-ci-robot
Copy link

deepin pr auto review

ACPI: Fix the detection of old world firmware

This patch fixes the detection of old world firmware.

Signed-off-by: Alvaro Saurin ossa-git@ossa.org

@@ -23,6 +23,8 @@ struct exit_boot_struct {
int runtime_entry_count;
};

static int is_oldworld = 0;

static efi_status_t exit_boot_func(struct efi_boot_memmap *map, void *priv)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Prefer a maximum 75 chars per line (possible unwrapped commit description?)

int cores = (cpu_has_hypervisor ? MAX_CORES_PER_EIO_NODE : CORES_PER_EIO_NODE);

return cpu_logical_map(cpu) / cores;
return cpu_logical_map(cpu) / CORES_PER_EIO_NODE;
}

static void eiointc_set_irq_route(int pos, unsigned int cpu, unsigned int mnode, nodemask_t *node_map)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Prefer a maximum 75 chars per line (possible unwrapped commit description?)

@@ -762,6 +762,16 @@ static inline u64 acpi_arch_get_root_pointer(void)
int acpi_get_local_address(acpi_handle handle, u32 *addr);
const char *acpi_get_subsystem_id(acpi_handle handle);

#ifndef ACPI_HAVE_ARCH_TABLE_OVERRIDE
static inline void acpi_arch_os_table_override (struct acpi_table_header *existing_table, struct acpi_table_header **new_table){

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Prefer a maximum 75 chars per line (possible unwrapped commit description?)

@@ -133,6 +133,10 @@ static inline void pci_acpi_remove_edr_notifier(struct pci_dev *pdev) { }
int pci_acpi_set_companion_lookup_hook(struct acpi_device *(*func)(struct pci_dev *));
void pci_acpi_clear_companion_lookup_hook(void);

#ifndef ACPI_HAVE_ARCH_PCI_ROOT_RES_FILTER
static inline void acpi_arch_pci_probe_root_dev_filter(struct resource_entry *entry) { }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Prefer a maximum 75 chars per line (possible unwrapped commit description?)

@deepin-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from mingcongbai. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@MingcongBai MingcongBai changed the title [WIP] arch: loongarch: replace BPI implementation with a minimal patchset arch: loongarch: replace BPI implementation with a minimal patchset Aug 14, 2024
@MingcongBai
Copy link
Contributor Author

Tested good on the following:

  • Lenovo Kaitian M540z
  • Excelsior L71
  • 706 ML5A
  • 4C5LG (4-way 3C5000L/7A1000)

Also backported to rolling-stable (6.9.6).

@Avenger-285714 Avenger-285714 merged commit 409da4e into linux-6.6.y Aug 14, 2024
5 of 11 checks passed
@Avenger-285714
Copy link
Collaborator

Great!

@opsiff opsiff deleted the bai/loong64-clean-bpi-support branch September 21, 2024 09:15
@opsiff opsiff restored the bai/loong64-clean-bpi-support branch September 21, 2024 09:15
Avenger-285714 pushed a commit to Avenger-285714/DeepinKernel that referenced this pull request Nov 23, 2024
commit 4aa923a6e6406b43566ef6ac35a3d9a3197fa3e8 upstream.

KASAN reports that the GPU metrics table allocated in
vangogh_tables_init() is not large enough for the memset done in
smu_cmn_init_soft_gpu_metrics(). Condensed report follows:

[   33.861314] BUG: KASAN: slab-out-of-bounds in smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu]
[   33.861799] Write of size 168 at addr ffff888129f59500 by task mangoapp/1067
...
[   33.861808] CPU: 6 UID: 1000 PID: 1067 Comm: mangoapp Tainted: G        W          6.12.0-rc4 deepin-community#356 1a56f59a8b5182eeaf67eb7cb8b13594dd23b544
[   33.861816] Tainted: [W]=WARN
[   33.861818] Hardware name: Valve Galileo/Galileo, BIOS F7G0107 12/01/2023
[   33.861822] Call Trace:
[   33.861826]  <TASK>
[   33.861829]  dump_stack_lvl+0x66/0x90
[   33.861838]  print_report+0xce/0x620
[   33.861853]  kasan_report+0xda/0x110
[   33.862794]  kasan_check_range+0xfd/0x1a0
[   33.862799]  __asan_memset+0x23/0x40
[   33.862803]  smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.863306]  vangogh_get_gpu_metrics_v2_4+0x123/0xad0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.864257]  vangogh_common_get_gpu_metrics+0xb0c/0xbc0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.865682]  amdgpu_dpm_get_gpu_metrics+0xcc/0x110 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.866160]  amdgpu_get_gpu_metrics+0x154/0x2d0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.867135]  dev_attr_show+0x43/0xc0
[   33.867147]  sysfs_kf_seq_show+0x1f1/0x3b0
[   33.867155]  seq_read_iter+0x3f8/0x1140
[   33.867173]  vfs_read+0x76c/0xc50
[   33.867198]  ksys_read+0xfb/0x1d0
[   33.867214]  do_syscall_64+0x90/0x160
...
[   33.867353] Allocated by task 378 on cpu 7 at 22.794876s:
[   33.867358]  kasan_save_stack+0x33/0x50
[   33.867364]  kasan_save_track+0x17/0x60
[   33.867367]  __kasan_kmalloc+0x87/0x90
[   33.867371]  vangogh_init_smc_tables+0x3f9/0x840 [amdgpu]
[   33.867835]  smu_sw_init+0xa32/0x1850 [amdgpu]
[   33.868299]  amdgpu_device_init+0x467b/0x8d90 [amdgpu]
[   33.868733]  amdgpu_driver_load_kms+0x19/0xf0 [amdgpu]
[   33.869167]  amdgpu_pci_probe+0x2d6/0xcd0 [amdgpu]
[   33.869608]  local_pci_probe+0xda/0x180
[   33.869614]  pci_device_probe+0x43f/0x6b0

Empirically we can confirm that the former allocates 152 bytes for the
table, while the latter memsets the 168 large block.

Root cause appears that when GPU metrics tables for v2_4 parts were added
it was not considered to enlarge the table to fit.

The fix in this patch is rather "brute force" and perhaps later should be
done in a smarter way, by extracting and consolidating the part version to
size logic to a common helper, instead of brute forcing the largest
possible allocation. Nevertheless, for now this works and fixes the out of
bounds write.

v2:
 * Drop impossible v3_0 case. (Mario)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Fixes: 41cec40 ("drm/amd/pm: Vangogh: Add new gpu_metrics_v2_4 to acquire gpu_metrics")
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Evan Quan <evan.quan@amd.com>
Cc: Wenyou Yang <WenYou.Yang@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20241025145639.19124-1-tursulin@igalia.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0880f58f9609f0200483a49429af0f050d281703)
Cc: stable@vger.kernel.org # v6.6+
Signed-off-by: Bin Lan <bin.lan.cn@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
opsiff pushed a commit that referenced this pull request Nov 23, 2024
commit 4aa923a6e6406b43566ef6ac35a3d9a3197fa3e8 upstream.

KASAN reports that the GPU metrics table allocated in
vangogh_tables_init() is not large enough for the memset done in
smu_cmn_init_soft_gpu_metrics(). Condensed report follows:

[   33.861314] BUG: KASAN: slab-out-of-bounds in smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu]
[   33.861799] Write of size 168 at addr ffff888129f59500 by task mangoapp/1067
...
[   33.861808] CPU: 6 UID: 1000 PID: 1067 Comm: mangoapp Tainted: G        W          6.12.0-rc4 #356 1a56f59a8b5182eeaf67eb7cb8b13594dd23b544
[   33.861816] Tainted: [W]=WARN
[   33.861818] Hardware name: Valve Galileo/Galileo, BIOS F7G0107 12/01/2023
[   33.861822] Call Trace:
[   33.861826]  <TASK>
[   33.861829]  dump_stack_lvl+0x66/0x90
[   33.861838]  print_report+0xce/0x620
[   33.861853]  kasan_report+0xda/0x110
[   33.862794]  kasan_check_range+0xfd/0x1a0
[   33.862799]  __asan_memset+0x23/0x40
[   33.862803]  smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.863306]  vangogh_get_gpu_metrics_v2_4+0x123/0xad0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.864257]  vangogh_common_get_gpu_metrics+0xb0c/0xbc0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.865682]  amdgpu_dpm_get_gpu_metrics+0xcc/0x110 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.866160]  amdgpu_get_gpu_metrics+0x154/0x2d0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.867135]  dev_attr_show+0x43/0xc0
[   33.867147]  sysfs_kf_seq_show+0x1f1/0x3b0
[   33.867155]  seq_read_iter+0x3f8/0x1140
[   33.867173]  vfs_read+0x76c/0xc50
[   33.867198]  ksys_read+0xfb/0x1d0
[   33.867214]  do_syscall_64+0x90/0x160
...
[   33.867353] Allocated by task 378 on cpu 7 at 22.794876s:
[   33.867358]  kasan_save_stack+0x33/0x50
[   33.867364]  kasan_save_track+0x17/0x60
[   33.867367]  __kasan_kmalloc+0x87/0x90
[   33.867371]  vangogh_init_smc_tables+0x3f9/0x840 [amdgpu]
[   33.867835]  smu_sw_init+0xa32/0x1850 [amdgpu]
[   33.868299]  amdgpu_device_init+0x467b/0x8d90 [amdgpu]
[   33.868733]  amdgpu_driver_load_kms+0x19/0xf0 [amdgpu]
[   33.869167]  amdgpu_pci_probe+0x2d6/0xcd0 [amdgpu]
[   33.869608]  local_pci_probe+0xda/0x180
[   33.869614]  pci_device_probe+0x43f/0x6b0

Empirically we can confirm that the former allocates 152 bytes for the
table, while the latter memsets the 168 large block.

Root cause appears that when GPU metrics tables for v2_4 parts were added
it was not considered to enlarge the table to fit.

The fix in this patch is rather "brute force" and perhaps later should be
done in a smarter way, by extracting and consolidating the part version to
size logic to a common helper, instead of brute forcing the largest
possible allocation. Nevertheless, for now this works and fixes the out of
bounds write.

v2:
 * Drop impossible v3_0 case. (Mario)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Fixes: 41cec40 ("drm/amd/pm: Vangogh: Add new gpu_metrics_v2_4 to acquire gpu_metrics")
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Evan Quan <evan.quan@amd.com>
Cc: Wenyou Yang <WenYou.Yang@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20241025145639.19124-1-tursulin@igalia.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0880f58f9609f0200483a49429af0f050d281703)
Cc: stable@vger.kernel.org # v6.6+
Signed-off-by: Bin Lan <bin.lan.cn@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit f111de0f010308949254ee1cc45df8e6b8e1d7d4)
Avenger-285714 pushed a commit to Avenger-285714/DeepinKernel that referenced this pull request Nov 23, 2024
commit 4aa923a6e6406b43566ef6ac35a3d9a3197fa3e8 upstream.

KASAN reports that the GPU metrics table allocated in
vangogh_tables_init() is not large enough for the memset done in
smu_cmn_init_soft_gpu_metrics(). Condensed report follows:

[   33.861314] BUG: KASAN: slab-out-of-bounds in smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu]
[   33.861799] Write of size 168 at addr ffff888129f59500 by task mangoapp/1067
...
[   33.861808] CPU: 6 UID: 1000 PID: 1067 Comm: mangoapp Tainted: G        W          6.12.0-rc4 deepin-community#356 1a56f59a8b5182eeaf67eb7cb8b13594dd23b544
[   33.861816] Tainted: [W]=WARN
[   33.861818] Hardware name: Valve Galileo/Galileo, BIOS F7G0107 12/01/2023
[   33.861822] Call Trace:
[   33.861826]  <TASK>
[   33.861829]  dump_stack_lvl+0x66/0x90
[   33.861838]  print_report+0xce/0x620
[   33.861853]  kasan_report+0xda/0x110
[   33.862794]  kasan_check_range+0xfd/0x1a0
[   33.862799]  __asan_memset+0x23/0x40
[   33.862803]  smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.863306]  vangogh_get_gpu_metrics_v2_4+0x123/0xad0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.864257]  vangogh_common_get_gpu_metrics+0xb0c/0xbc0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.865682]  amdgpu_dpm_get_gpu_metrics+0xcc/0x110 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.866160]  amdgpu_get_gpu_metrics+0x154/0x2d0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.867135]  dev_attr_show+0x43/0xc0
[   33.867147]  sysfs_kf_seq_show+0x1f1/0x3b0
[   33.867155]  seq_read_iter+0x3f8/0x1140
[   33.867173]  vfs_read+0x76c/0xc50
[   33.867198]  ksys_read+0xfb/0x1d0
[   33.867214]  do_syscall_64+0x90/0x160
...
[   33.867353] Allocated by task 378 on cpu 7 at 22.794876s:
[   33.867358]  kasan_save_stack+0x33/0x50
[   33.867364]  kasan_save_track+0x17/0x60
[   33.867367]  __kasan_kmalloc+0x87/0x90
[   33.867371]  vangogh_init_smc_tables+0x3f9/0x840 [amdgpu]
[   33.867835]  smu_sw_init+0xa32/0x1850 [amdgpu]
[   33.868299]  amdgpu_device_init+0x467b/0x8d90 [amdgpu]
[   33.868733]  amdgpu_driver_load_kms+0x19/0xf0 [amdgpu]
[   33.869167]  amdgpu_pci_probe+0x2d6/0xcd0 [amdgpu]
[   33.869608]  local_pci_probe+0xda/0x180
[   33.869614]  pci_device_probe+0x43f/0x6b0

Empirically we can confirm that the former allocates 152 bytes for the
table, while the latter memsets the 168 large block.

Root cause appears that when GPU metrics tables for v2_4 parts were added
it was not considered to enlarge the table to fit.

The fix in this patch is rather "brute force" and perhaps later should be
done in a smarter way, by extracting and consolidating the part version to
size logic to a common helper, instead of brute forcing the largest
possible allocation. Nevertheless, for now this works and fixes the out of
bounds write.

v2:
 * Drop impossible v3_0 case. (Mario)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Fixes: 41cec40 ("drm/amd/pm: Vangogh: Add new gpu_metrics_v2_4 to acquire gpu_metrics")
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Evan Quan <evan.quan@amd.com>
Cc: Wenyou Yang <WenYou.Yang@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20241025145639.19124-1-tursulin@igalia.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0880f58f9609f0200483a49429af0f050d281703)
Cc: stable@vger.kernel.org # v6.6+
Signed-off-by: Bin Lan <bin.lan.cn@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Avenger-285714 pushed a commit that referenced this pull request Nov 24, 2024
commit 4aa923a6e6406b43566ef6ac35a3d9a3197fa3e8 upstream.

KASAN reports that the GPU metrics table allocated in
vangogh_tables_init() is not large enough for the memset done in
smu_cmn_init_soft_gpu_metrics(). Condensed report follows:

[   33.861314] BUG: KASAN: slab-out-of-bounds in smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu]
[   33.861799] Write of size 168 at addr ffff888129f59500 by task mangoapp/1067
...
[   33.861808] CPU: 6 UID: 1000 PID: 1067 Comm: mangoapp Tainted: G        W          6.12.0-rc4 #356 1a56f59a8b5182eeaf67eb7cb8b13594dd23b544
[   33.861816] Tainted: [W]=WARN
[   33.861818] Hardware name: Valve Galileo/Galileo, BIOS F7G0107 12/01/2023
[   33.861822] Call Trace:
[   33.861826]  <TASK>
[   33.861829]  dump_stack_lvl+0x66/0x90
[   33.861838]  print_report+0xce/0x620
[   33.861853]  kasan_report+0xda/0x110
[   33.862794]  kasan_check_range+0xfd/0x1a0
[   33.862799]  __asan_memset+0x23/0x40
[   33.862803]  smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.863306]  vangogh_get_gpu_metrics_v2_4+0x123/0xad0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.864257]  vangogh_common_get_gpu_metrics+0xb0c/0xbc0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.865682]  amdgpu_dpm_get_gpu_metrics+0xcc/0x110 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.866160]  amdgpu_get_gpu_metrics+0x154/0x2d0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.867135]  dev_attr_show+0x43/0xc0
[   33.867147]  sysfs_kf_seq_show+0x1f1/0x3b0
[   33.867155]  seq_read_iter+0x3f8/0x1140
[   33.867173]  vfs_read+0x76c/0xc50
[   33.867198]  ksys_read+0xfb/0x1d0
[   33.867214]  do_syscall_64+0x90/0x160
...
[   33.867353] Allocated by task 378 on cpu 7 at 22.794876s:
[   33.867358]  kasan_save_stack+0x33/0x50
[   33.867364]  kasan_save_track+0x17/0x60
[   33.867367]  __kasan_kmalloc+0x87/0x90
[   33.867371]  vangogh_init_smc_tables+0x3f9/0x840 [amdgpu]
[   33.867835]  smu_sw_init+0xa32/0x1850 [amdgpu]
[   33.868299]  amdgpu_device_init+0x467b/0x8d90 [amdgpu]
[   33.868733]  amdgpu_driver_load_kms+0x19/0xf0 [amdgpu]
[   33.869167]  amdgpu_pci_probe+0x2d6/0xcd0 [amdgpu]
[   33.869608]  local_pci_probe+0xda/0x180
[   33.869614]  pci_device_probe+0x43f/0x6b0

Empirically we can confirm that the former allocates 152 bytes for the
table, while the latter memsets the 168 large block.

Root cause appears that when GPU metrics tables for v2_4 parts were added
it was not considered to enlarge the table to fit.

The fix in this patch is rather "brute force" and perhaps later should be
done in a smarter way, by extracting and consolidating the part version to
size logic to a common helper, instead of brute forcing the largest
possible allocation. Nevertheless, for now this works and fixes the out of
bounds write.

v2:
 * Drop impossible v3_0 case. (Mario)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Fixes: 41cec40 ("drm/amd/pm: Vangogh: Add new gpu_metrics_v2_4 to acquire gpu_metrics")
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Evan Quan <evan.quan@amd.com>
Cc: Wenyou Yang <WenYou.Yang@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20241025145639.19124-1-tursulin@igalia.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0880f58f9609f0200483a49429af0f050d281703)
Cc: stable@vger.kernel.org # v6.6+
Signed-off-by: Bin Lan <bin.lan.cn@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Avenger-285714 pushed a commit to Avenger-285714/DeepinKernel that referenced this pull request Nov 25, 2024
commit 4aa923a6e6406b43566ef6ac35a3d9a3197fa3e8 upstream.

KASAN reports that the GPU metrics table allocated in
vangogh_tables_init() is not large enough for the memset done in
smu_cmn_init_soft_gpu_metrics(). Condensed report follows:

[   33.861314] BUG: KASAN: slab-out-of-bounds in smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu]
[   33.861799] Write of size 168 at addr ffff888129f59500 by task mangoapp/1067
...
[   33.861808] CPU: 6 UID: 1000 PID: 1067 Comm: mangoapp Tainted: G        W          6.12.0-rc4 deepin-community#356 1a56f59a8b5182eeaf67eb7cb8b13594dd23b544
[   33.861816] Tainted: [W]=WARN
[   33.861818] Hardware name: Valve Galileo/Galileo, BIOS F7G0107 12/01/2023
[   33.861822] Call Trace:
[   33.861826]  <TASK>
[   33.861829]  dump_stack_lvl+0x66/0x90
[   33.861838]  print_report+0xce/0x620
[   33.861853]  kasan_report+0xda/0x110
[   33.862794]  kasan_check_range+0xfd/0x1a0
[   33.862799]  __asan_memset+0x23/0x40
[   33.862803]  smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.863306]  vangogh_get_gpu_metrics_v2_4+0x123/0xad0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.864257]  vangogh_common_get_gpu_metrics+0xb0c/0xbc0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.865682]  amdgpu_dpm_get_gpu_metrics+0xcc/0x110 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.866160]  amdgpu_get_gpu_metrics+0x154/0x2d0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.867135]  dev_attr_show+0x43/0xc0
[   33.867147]  sysfs_kf_seq_show+0x1f1/0x3b0
[   33.867155]  seq_read_iter+0x3f8/0x1140
[   33.867173]  vfs_read+0x76c/0xc50
[   33.867198]  ksys_read+0xfb/0x1d0
[   33.867214]  do_syscall_64+0x90/0x160
...
[   33.867353] Allocated by task 378 on cpu 7 at 22.794876s:
[   33.867358]  kasan_save_stack+0x33/0x50
[   33.867364]  kasan_save_track+0x17/0x60
[   33.867367]  __kasan_kmalloc+0x87/0x90
[   33.867371]  vangogh_init_smc_tables+0x3f9/0x840 [amdgpu]
[   33.867835]  smu_sw_init+0xa32/0x1850 [amdgpu]
[   33.868299]  amdgpu_device_init+0x467b/0x8d90 [amdgpu]
[   33.868733]  amdgpu_driver_load_kms+0x19/0xf0 [amdgpu]
[   33.869167]  amdgpu_pci_probe+0x2d6/0xcd0 [amdgpu]
[   33.869608]  local_pci_probe+0xda/0x180
[   33.869614]  pci_device_probe+0x43f/0x6b0

Empirically we can confirm that the former allocates 152 bytes for the
table, while the latter memsets the 168 large block.

Root cause appears that when GPU metrics tables for v2_4 parts were added
it was not considered to enlarge the table to fit.

The fix in this patch is rather "brute force" and perhaps later should be
done in a smarter way, by extracting and consolidating the part version to
size logic to a common helper, instead of brute forcing the largest
possible allocation. Nevertheless, for now this works and fixes the out of
bounds write.

v2:
 * Drop impossible v3_0 case. (Mario)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Fixes: 41cec40 ("drm/amd/pm: Vangogh: Add new gpu_metrics_v2_4 to acquire gpu_metrics")
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Evan Quan <evan.quan@amd.com>
Cc: Wenyou Yang <WenYou.Yang@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20241025145639.19124-1-tursulin@igalia.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0880f58f9609f0200483a49429af0f050d281703)
Cc: stable@vger.kernel.org # v6.6+
Signed-off-by: Bin Lan <bin.lan.cn@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants