-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add documentation on current behavior for X86 #212
Conversation
@gchatelet, @Mizux, please review PR |
ping |
d85520f
to
37e4db5
Compare
// Before an application attempts to use the SIMD extensions, | ||
// it should check that they are present on the processor and supported by OS: | ||
// 1. Check that operating system saves and restores XMM/YMM/ZMM/TMM | ||
// registers during context switches. Detect CPUID.1:ECX.OSXSAVE[bit 27] | ||
// this feature flag specifies that operating system provides extended state | ||
// management implied HW support for XSAVE, XRSTOR, XGETBV, XCR0. | ||
// 2. Check that the SIMD extensions supports by hardware. | ||
// Detect CPUID.1:ECX.XSAVE[bit 26] | ||
// 3. Check enabled state in XCR0 via XGETBV. | ||
// 4. Check feature flag for instruction set. | ||
// | ||
// Note: | ||
// It is unwise for an application to rely exclusively on feature flag for | ||
// instruction set or at all on CPUID.1:ECX.XSAVE[bit 26]: These indicate | ||
// hardware support but not operating system support. If XMM/YMM/ZMM/TMM state | ||
// management is not enabled by an operating systems, instructions will #UD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIU this is documenting the current behavior right?
If so I would phrase it differently because it reads as if it was an advice to the user.
Also the section to which you added the documentation only contains the following functions:
static bool HasMask(uint32_t value, uint32_t mask);
static bool HasXmmOsXSave(uint32_t xcr0_eax);
static bool HasYmmOsXSave(uint32_t xcr0_eax);
static bool HasZmmOsXSave(uint32_t xcr0_eax);
static bool HasTmmOsXSave(uint32_t xcr0_eax);
I agree that we could use some documentation to describe what it means to support SIMD instructions but then I would put this section at the top of the file and describe it in terms of process, not in the form of an advice ("it should check", "It is unwise"). Something along these lines:
A note on x86 SIMD instructions availability
--------------------------------------------
A number of conditions need to be met for an application to use SIMD instructions:
1. The CPU itself must support the instruction.
- we use `CPUID` to check whether the feature is supported.
2. The OS must save and restore the associated SIMD register across context switches, we check that:
- the CPU reports supporting hardware context switching instructions via CPUID.1:ECX.XSAVE[bit 26]
- the OS reports supporting hardware context switching instructions via CPUID.1:ECX.OSXSAVE[bit 27]
- the CPU extended control register 0 (XCR0) is set to save and restore the needed SIMD registers
Note that if `XSAVE`/`OSXSAVE` are missing, we delegate the detection to the OS via the `DetectFeaturesFromOs` function or via microarchitecture heuristics.
8fb1ca5
to
8528847
Compare
@gchatelet, I updated PR, but there is a question related to // Returns cache hierarchy informations.
// Can call cpuid multiple times.
// Only works on Intel CPU at the moment.
CacheInfo GetX86CacheInfo(void); |
@toor1245 the comment is outdated, yes please remove the part about "Intel only" |
// Encoding | ||
// ----------------------------------------------------------------------------- | ||
// `X86Info` contains fields such as `vendor` and `brand_string` they are | ||
// represents characters in ASCII encoding. `vendor` length of characters is 13 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"X86Info
contains fields such as vendor
and brand_string
that are ASCII encoded strings."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a bunch of documentation comments. Should be good to go then.
// ----------------------------------------------------------------------------- | ||
// `GetX86Microarchitecture` function consists of check on vendor via | ||
// `IsVendorByX86Info`. We use `CPUID(family, model)` macro to define | ||
// microarchitecture of vendor. In cases where the `family` and `model` is the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"to define the vendor's microarchitecture"
// `IsVendorByX86Info`. We use `CPUID(family, model)` macro to define | ||
// microarchitecture of vendor. In cases where the `family` and `model` is the | ||
// same for several microarchitectures we do a stepping check or in the worst | ||
// case parsing `brand_string` as example `HasSecondFMA`. Details of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"or in the worst case we rely on parsing brand_string
(see HasSecondFMA
for an example)"
// | ||
// CacheInfo X86 | ||
// ----------------------------------------------------------------------------- | ||
// We use common struct `CacheInfo` for all CPU architectures and this struct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"We use the CacheInfo
struct to store information about cache levels. The maximum number of levels is hardcoded but can be increased if needed. We have full support .,."
// ----------------------------------------------------------------------------- | ||
// We use common struct `CacheInfo` for all CPU architectures and this struct | ||
// contains of `levels` with certain maximum number of cache levels, you can | ||
// feel free increase this value if more cache levels are needed. We have full |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use bullet points to split between vendors, e.g.
- Intel:
- modern:
- old processors:
- AMD:...
// Internal structures | ||
// ----------------------------------------------------------------------------- | ||
// We use internal structures such as `Leaves` and `OsPreserves` to hold cpuid | ||
// info and support of registers, since latency of `CPUID` instruction ~100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since latency of CPUID
instruction is around ~100
// | ||
// Internal structures | ||
// ----------------------------------------------------------------------------- | ||
// We use internal structures such as `Leaves` and `OsPreserves` to hold cpuid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"to hold cpuid" -> "to cache the result of cpuid
"
534e586
to
52f863f
Compare
52f863f
to
bd7161a
Compare
// ----------------------------------------------------------------------------- | ||
// We use internal structures such as `Leaves` and `OsPreserves` to cache the | ||
// result of cpuid info and support of registers, since latency of CPUID | ||
// instruction is around ~100, see |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
~100 cycles
I'll squash and merge the PR once the typo is fixed. Thx a lot for the documentation! |
No description provided.