You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Hypervisor extension (issue #44) introduces new address translation semantics that the current TLB was not designed for. There are two situations that we need to investigate for possible optimizations.
Hypervisor introduces new hypervisor specific load/store instructions that performs virtual address translation using custom access modes (privilege mode + virtual mode). The way the current TLB is designed makes it impossible to use the TLB in these instructions, because the TLB is designed to support only the current access mode, therefore these new instructions have to always use the slow address translation path, this is bad for performance. Furthermore having a TLB per access mode gives the opportunity to decrease TLB shootdowns in access mode switches, however this may not be that promising, because the kernel scheduler already minimizes the amount of access mode switches to make good use of the CPU, so the performance gains of this may be negligible.
Hypervisor introduces a two-stage address translation, we could have a TLB for each stage, but it's not clear if this would pay off, it deserves some investigation. At first, this is not a big issue because the current TLB is still used for whole two-stage process, therefore a TLB only for the second stage of the address translation would only speed-up the second stage translation after a miss of the first stage, and optimizing miss code path may not be promising.
Possible solutions
First we need to investigate how often hypervisor load/store is really used, otherwise this optimization will not give us meaningful benefits. If hypervisor load/store are used very often, we could make a TLB per access mode, there are 5 possible access modes (U-mode, M-mode, HS-mode, VU-mode, VS-mode)
First we need to investigate how often the miss path of the two-stage is hit, otherwise this optimization will not pay off. We could duplicate the TLB, to have one for the whole two-stage process, and a second TLB only for the second stage.
The text was updated successfully, but these errors were encountered:
Context
The Hypervisor extension (issue #44) introduces new address translation semantics that the current TLB was not designed for. There are two situations that we need to investigate for possible optimizations.
Hypervisor introduces new hypervisor specific load/store instructions that performs virtual address translation using custom access modes (privilege mode + virtual mode). The way the current TLB is designed makes it impossible to use the TLB in these instructions, because the TLB is designed to support only the current access mode, therefore these new instructions have to always use the slow address translation path, this is bad for performance. Furthermore having a TLB per access mode gives the opportunity to decrease TLB shootdowns in access mode switches, however this may not be that promising, because the kernel scheduler already minimizes the amount of access mode switches to make good use of the CPU, so the performance gains of this may be negligible.
Hypervisor introduces a two-stage address translation, we could have a TLB for each stage, but it's not clear if this would pay off, it deserves some investigation. At first, this is not a big issue because the current TLB is still used for whole two-stage process, therefore a TLB only for the second stage of the address translation would only speed-up the second stage translation after a miss of the first stage, and optimizing miss code path may not be promising.
Possible solutions
First we need to investigate how often hypervisor load/store is really used, otherwise this optimization will not give us meaningful benefits. If hypervisor load/store are used very often, we could make a TLB per access mode, there are 5 possible access modes (U-mode, M-mode, HS-mode, VU-mode, VS-mode)
First we need to investigate how often the miss path of the two-stage is hit, otherwise this optimization will not pay off. We could duplicate the TLB, to have one for the whole two-stage process, and a second TLB only for the second stage.
The text was updated successfully, but these errors were encountered: