-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change options based on arch for Mesa #1892
Conversation
"-Dswr-arches=avx,avx2,skx,knl" fails on too many types of system, so removing it
…s into mesa-for-power9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested with easybuilders/easybuild-easyconfigs#9764 and works on Power9, see easybuilders/easybuild-easyconfigs#9764 (comment)
…s into mesa-for-power9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@edmondac could you select the -Dswr-arches
options according the CPU microarchitecture, similar to what is done in the lammps
easyblock?
Co-Authored-By: Sam Moors <smoors@users.noreply.github.com>
Just a few comments: the reason why I put in the all the swr arches unconditionally is that they are selected at runtime, and you may be "cross-compiling". The problem with the recipes was that in the 19.1.* they run into a GCC bug (fixed in GCC 9 IIRC) where |
Hi @bartoldeman - the problem with this approach is that it doesn't build on POWER9 CPUs:
The new approach does, see easybuilders/easybuild-easyconfigs#9764 (comment) Is cross-compilation "supported" by EasyBuild? Not using it, I wouldn't know. So many things auto-detect their CPU features that I'd have thought it would mostly not build with the right support - but I've not tried it. |
Using self.cfg.update('configopts', "-Dswr-arches=avx,avx2,skx,knl") works on our broadwell nodes at least. |
My point was that there is a pending investigation but this was about to be merged. |
Using The current easyblock for Since there is already a PR open (easybuilders/easybuild-easyconfigs#9764) to use the As I said, the current state of this PR seems to be a safer path forward. If any user wants to do cross-compiling, this easyblock already allows the user to do that by manually overriding This is just my advice, I leave the final decision to other maintainers. |
Do I misunderstood the bug report because to me this looks like an issue with a compiler not with the architecture. So this makes this an arch dependent failure which is even worse And I mentioned that this is probably what was mentioned in #1892 (comment) |
easybuild/easyblocks/m/mesa.py
Outdated
# avx512f: AVX-512 Foundation - introduced in Skylake | ||
# avx512er: AVX-512 Exponential and Reciprocal Instructions implemented in Knights Landing | ||
x86_features = {'avx': 'avx', 'avx1.0': 'avx', 'avx2': 'avx2', 'avx512f': 'skx', 'avx512er': 'knl'} | ||
swr_arches = [farch for fname, farch in x86_features.items() if fname in cpu_features] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So after finding that Mesa < 19.3.7 bug, suggested code would be:
if arch == X86_64:
x86_features = {'avx': 'avx', 'avx1.0': 'avx', 'avx2': 'avx2', 'avx512f': 'skx', 'avx512er': 'knl'}
if LooseVersion(self.version) < LooaseVersion('19.3.7'):
swr_arches = [farch for fname, farch in x86_features.items() if fname in cpu_features]
else:
swr_arches = [farch for fname, farch in x86_features.items() ]
Or something similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @lexming here that including this version check is not a good idea, especially since there's a patch available to fix the problem you're dancing around here.
@Flamefire @lexming good enough for both of you? And if so, @edmondac can you make that effect if the4y both agree? |
Again the question: is this a bug related to the current arch or the current compiler? I understood it to be the latter. Then this is not acceptable as it will fail when one tries to build on knl or skylake(?) |
Based on my tests, the current
So the bug occurs in a combination of architecture and compiler when using @akesandgren I would really prefer avoiding the burden and complication of maintaining version checks of the compiler or Mesa unless we find a case where this easyblock fails. This is not currently the case. Moreover (I repeat myself) users wanting to do cross-compiling can set Having said all of this, if a majority of you still prefer to add version checks on the compiler and Mesa, I won't oppose this change obviously. |
And you are still missing the point. To summarize:
See above: The bug is the compiler with older Mesa versions. There was not any evidence that this bug only applies when running on non-AVX-512 archs. The linked issue shows a compile failure, not a runtime failure.
The burden of having a bug that is only reproducible on 1 architecture due to the EasyBlock changing behavior based on the architecture is IMO higher. Furthermore there exists a patch for that which has been mentioned in #1892 (comment): https://github.com/easybuilders/easybuild-easyconfigs/pull/9764/files#diff-a56c37857393d269498e8fd48f40500dR29 |
/facepalm I already tested this easyblock in a Skylake (Intel(R) Xeon(R) Gold 6126) and it works. Precisely with I am tired of repeating myself. All those tests listed in #1892 (comment) with this easyblock as it is in commit cac0c0c are successful. I won't participate in this discussion any more unless you provide some evidence to your claims. So far you have talked a lot, but have not submitted the results of any test. I already provided tests of several Mesa easyconfigs in different architectures, I tested both the current easyblock and your proposed changes, and I am the one who found that we could be affected by bug 105029 by using indiscriminate |
Apologies, it seems it is KNL affected by the bug you mentioned, not skylake. Got fooled by the mentioning of avx512. The upstream fix is this: https://gitlab.freedesktop.org/mesa/mesa/-/commit/f1fbeb1a53 and it seems to be fixed in 18.1.0. So that doesn't look like the bug you encountered. But since no test reports of the failure were submitted despite me asking I couldn't know better.
I did link https://github.com/easybuilders/easybuild-easyconfigs/pull/9764/files#diff-a56c37857393d269498e8fd48f40500dR29 and #1892 (comment) multiple times where all archs are passed for an existing EasyConfig and all tests (submitted when this was merged and current usage) show that using all archs is possible.
And I'm the one who pointed out that we might be affected by the issue reported in #1892 (comment) and asked to verify this. It seems you must have missed #1892 (comment). I did just do this: I used this EC: https://gist.github.com/Flamefire/6730d3b40d75dabf47d9df3a3ca515f3 It is the current Mesa-19.1.7-GCCcore-8.3.0.eb with
So conclusion: We run into https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69471 as mentioned by @bartoldeman in #1892 (comment) and there is a patch available so we can continue to build for all architectures. However building only with the swr for the current arch happens to work as |
Seems like it's time I pitch in here... :) EasyBuild usually builds for the host processor architecture (usually by just using Also listing other values in the I'm not in favor of always listing all supported architectures to try and obtain a Mesa installation that will perform "optimally" on hosts that have newer processors than the build host. That's pretty atypical for EasyBuild, so that shouldn't be the default. In the current state (commit cac0c0c) there's an easy way to override the default of building for the host architecture: as soon as you include @Flamefire I understand your point that obtaining a Mesa installation that will perform well on other (newer) architectures can be useful, but the problem that @bartoldeman fixed with a patch shows that this is also a bit risky. So I would prefer going forward with the current implementation, which fits better with what EasyBuild usually does by default, and is less likely to cause build problems going forward. I hope that settles the discussion. I would like to see some minor stylistic changes to the easyblock (like stretching out the |
Ok, to me it looked like it should do that as it was removed from the EC: https://github.com/easybuilders/easybuild-easyconfigs/pull/9764/files#diff-a56c37857393d269498e8fd48f40500dL64 BTW: What is done for other Software that does runtime selection? Doesn't OpenBLAS or FFTW does something similar? This is why I didn't consider it atypical. |
I consider this approach better than hardcoding
I don't know of any other software which does runtime selection to an arch that is newer than what the software was built for. |
@edmondac Please take a look at bear-rsg#17, I took this a step further, mostly w.r.t. the sanity check... |
cleanup in Mesa easyblock + add custom sanity check step
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
'dirs': [os.path.join('include', 'GL', 'internal')], | ||
} | ||
|
||
if self.swr_arches: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, this is indeed wrong: It expects swr to be build without checking the configopts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Flamefire Did this get fixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it was in #2006...
arch-specific config opts for Mesa