-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable resizable BAR support #3
Conversation
Thanks for the contribution, sjkelly, and congratulations on the first pull request in this repository :) I need to catch up a bit on the history of resizable BAR support within NVIDIA, and either I or someone else from NVIDIA will get back to this change soon. Thanks again! |
The use of |
You have to adapt your code to the current codebase. Rewriting such a thing is for a separate time not relevant to this PR. |
I am not advocating for a rewrite, but for not introducing more |
Yes, this patch adds 3 gotos. There are 30 gotos in the file already, and Linus himself says gotos are good kernel coding style. I see no problem. https://web.archive.org/web/20130521051957/http:/kerneltrap.org/node/553/2131 |
@TheRawMeatball Look at the current file. It's how the codebase was written. Removing the usage of goto is not relevant to this PR. |
One of the gotos is not required, since we dropped one of the error paths during rebase, so @sjkelly will remove that. But yes, other than that, gotos are fairly idiomatic in Linux. Much of the hate against gotos is based on the Dijkstra article, but there's some missing context there. Dijkstra was not really arguing against local gotos, but against global gotos that jump all over the place. Regardless, may I suggest not turning this PR into a discussion on the merits of gotos to allow the NVIDIA folks space for a proper review :)? |
The worst part about using a GOTO is how other programmers will react to it. |
I have a feeling there are firmware edgecases where an SMM handler or ACPI code could be holding on to the old BAR address. This kind of code is common for NVIDIA laptops and can vary between vendors or even models. |
Yes, it's possible, though in that situation, the firmware is supposed to ask the kernel not to move the PCIe BARs: https://github.com/torvalds/linux/blob/f400bea2d44beec76f7e7f45e5372ef790336a4d/drivers/acpi/pci_root.c#L919-L927. This patch respects that setting. I'm also not sure whether there are currently any laptop GPUs that include ReBAR support, since that's quite a new feature. |
It is good there is a way for firmaare to communicate this case. Yes, there are laptops that support resizeable bar, it became standard with RTX 3000 I believe. |
Most 30 series laptops support resizeable BAR as far as I'm aware, I know that the Omen 15 supports it but usually I don't think it's visible in the UEFI like on desktop UEFIs. |
could easily replace the gotos with a |
Hi @sjkelly, if I understand the limitation of this patch correctly, it will only resize BAR1 if there is already enough address space available to do so. In other words, we cannot change the address space window that is already allocated to the GPU and can only operate within it. Can you please share a system configuration where these conditions exist and the patch successfully resizes BAR1? |
Yes, this is correct.
No, this is not. The resources allocated to the GPU (BAR1 and BAR3) are released. The restriction is that address space needs to be available within the PCIe bridge that's upstream of the GPU. If there are no other devices in this PCIe bridge or the upstream PCIe bridge is the root complex this is not an issue, because the kernel will resize it (well, there's an issue about the size of the root complex window on AMD CPUs, but Linux already has code to open an additional huge window on boot to support rebar on AMD GPUs). The mentioned restriction comes up when the PCIe topology includes a switch between the root complex and the GPU (pretty common on motherboards) and there is another PCIe device attached to the same switch. In that case, Linux currently does not have the ability to move the other device's PCIe BARs in order to resize the bridge window, which will likely cause the resize of the GPU BAR to fail.
We have successfully tested this on a number of systems. The one that @sjkelly has locally is a SuperMicro MBD-C9Z390-CG-O motherboard with an RTX A5000 plugged into the closest PCIe slot, with the other PCIe slot on the same PCIe switch left empty. |
Thanks Keno, I'm currently testing this patch internally. |
|
||
if ((pci_resource_flags(pci_dev, NV_GPU_BAR1) & IORESOURCE_UNSET) || | ||
(pci_resource_flags(pci_dev, NV_GPU_BAR3) & IORESOURCE_UNSET)) { | ||
if (requested_size != old_size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to step through the available sizes down to the original size here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so.
@@ -434,6 +506,12 @@ nv_pci_probe | |||
goto failed; | |||
} | |||
|
|||
if (nv_resize_pcie_bars(pci_dev)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this path runs every time .probe is called, we should add this feature behind a regkey module param at first (see nvrm_registry.h). Once we have good test coverage and feel confident that the feature is robust, we can enable this feature by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean nv-reg.h
? Those expose kernel module options?
Tracking internally via bug 3642641 |
Yes I intend to address the comments unless someone is willing to expedite it internally. |
Resizable BAR support is a PCIe extension that allows resizing a PCIe device's mappable memory/register space (also referred to as BARs - after the Base Address Register that sets up the region). An important use case are GPUs. While data center GPUs, generally have BAR sizes that match the size of video memory, consumer and workstation GPUs generally declare only have 256MiB worth of BARs mapping GPU memory to maintain compatibility with 32bit operating systems. However, for performance (particularly when using PCIe P2P), it is desirable to be able to map the entirety of GPU memory, necessitating resizable BARs. However, while PCIe ReBAR has been a standard for more than 10 years, it was ill used until a few years ago and thus support is lacking. On very recent motherboards (generally after 2020), a BIOS update might be available that causes the firmware to read and reserve space for a ReBAR expansion (or even perform the BAR resize itself). However, older motherbards do not have this support. Fortunately for us, the Linux kernel has some support to do its own PCIe enumeration without relying on the firmware to do everything. Linux even has support for resizable BARs, though in practice there are a number of important limitations: * There is currently no support for movable BARs in Linux. This means that if there are adjacent address space allocations, it is quite possible for BAR resizing to fail. There was a WIP patch series to resolve this issue at https://patchwork.ozlabs.org/project/linux-pci/cover/20201218174011.340514-1-s.miroshnichenko@yadro.com/ but it appears to have faded out.
- check for `pci_rebar_get_possible_sizes` - check for `pci_get_host_bridge` - print info on EOPNOTSUPP when attempting rebar_resize - short circuit return when requesting resize equal to current size
@dmitriyc-nv Sorry, I lost track of this. I see @sjkelly did do the rebase, but it grew conflicts again. I can do another round of rebasing here if you would like. |
No worries, I've continued working on the patch internally and it is going through review. |
Excellent, thank you. Looking forward to it. |
A version of this patch has been merged in 530.30.02. Resizing can be enabled with the following module parameter: NVreg_EnableResizableBar=1 |
Awesome, thanks for pushing this through! |
Resizable BAR support is a PCIe extension that allows resizing a PCIe device's
mappable memory/register space (also referred to as BARs - after the Base
Address Register that sets up the region). An important use case are GPUs.
While data center GPUs, generally have BAR sizes that match the size of video
memory, consumer and workstation GPUs generally declare only have 256MiB worth
of BARs mapping GPU memory to maintain compatibility with 32bit operating
systems. However, for performance (particularly when using PCIe P2P), it is
desirable to be able to map the entirety of GPU memory, necessitating
resizable BARs.
However, while PCIe ReBAR has been a standard for more than 10 years,
it was ill used until a few years ago and thus support is lacking.
On very recent motherboards (generally after 2020), a BIOS update might
be available that causes the firmware to read and reserve space for a
ReBAR expansion (or even perform the BAR resize itself). However, older
motherbards do not have this support.
Fortunately for us, the Linux kernel has some support to do its own PCIe
enumeration without relying on the firmware to do everything. Linux even has
support for resizable BARs, though in practice there are a number of important
limitations:
there are adjacent address space allocations, it is quite possible for BAR
resizing to fail. There was a WIP patch series to resolve this issue at
https://patchwork.ozlabs.org/project/linux-pci/cover/20201218174011.340514-1-s.miroshnichenko@yadro.com/
but it appears to have faded out.