-
Notifications
You must be signed in to change notification settings - Fork 705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
{vis}[foss/2022b] Paraview 5.11.1 fat build compatible with hardware rendering, software rendering, headless server mode, as well as interactive mode. #18631
Conversation
…dless server mode, as well as interactive mode.
Test report by @lcniel |
Note that I have tested this build in a few different ways.
I checked that hardware rendering worked with Note that the libglvnd flag
|
Interesting, I'm going to build this and see if it supports all the use cases on our end. My feeling says it should not be possible to get a fat binary like this (as I've looked into it before and even Kitware themselves distribute separate versions for headless EGL and osmesa ;-)). |
Btw, what kind of EB build time are you seeing for this config? ParaView has always been one of the slowest packages to build for us. On a 72-core Xeon 8360Y 2.4GHz node my build has been chugging away for 90 minutes now, with configure alone already taking 40 minutes. We've had some FS metadata issues in the past causing slow builds and these might be cropping up again, hence a reference data point would be nice to have. |
I believe that it didn't use to be possible not that long ago, but is now - they've changed their guidelines. Maybe it used to depend directly on Glew builds which were exclusive. Of course, it will only use one driver at a time. Headless just means that it doesn't support X.
This package alone (I had everything else built already) took less than an hour in total on 16 cores Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz, which is consistent with my experience of VTK builds on various systems. ETA: according to the TR I uploaded it took abour 28 minutes. |
28 minutes for full the ParaView build (with all deps already built) on a 16-core machine? 👀 |
Yes. For reference - that's similar to what I used to get when I built VTK from scratch in Apptainer for a Mayavi install I used to do on my 12-core Ryzen 3900X workstation, so it doesn't seem that crazy to me. @Micket knows better than me if there are any special hardware quirks that speeds the whole thing up on those nodes. |
@paulmelis you can view build reports in all merged PRs. Like taking this recent one on our test cluster via boegelbot
I happened to see this very much related issue though: https://gitlab.kitware.com/vtk/vtk/-/issues/18547 Maybe we are getting away without using GLEW for extensions? Or maybe the GLX part is actually not working properly here and software rendering is picked up, or.. maybe EGL rendering is used on the frontend part as well somehow and we just aren't seeing it? (I look forward to the glorious future when osmesa and glx can both go die in a fire and we welcome our new EGL overlords) |
version = '5.11.1' | ||
versionsuffix = '-mpi-egl-osmesa' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can make a fat build i think we can exclude the suffices completely. I think we should just drop the -mpi one as well (if someone wants to make a mpi-less version, let them add -nompi instead!)
We can combine this with bumping it to 2022b toolchain instead, which doesn't yet have a ParaView, and just treat it as the default build henceforth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, if we bump it to 2022b I will have to check all of the versions of the dependencies so that will take some more time, but I think there are GCC 12.2.0 easyconfigs of all the deps so that should be OK.
easybuild/easyconfigs/p/ParaView/ParaView-5.11.1-foss-2022a-mpi-egl-osmesa.eb
Outdated
Show resolved
Hide resolved
|
||
sanity_check_commands = ['python -c "import paraview"'] | ||
|
||
patches = ['ParaView-5.11.1-remove_glew_init_warning.patch'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move this right below sources
There's definitely pretty heavy GPU usage going on on the frontend, and the CPU usage is comparatively modest. But the GLEW error message we patched out seems to suggest there is indeed some kind of issue with GLX, so I also wonder what exactly is happening. |
…i-egl-osmesa.eb Co-authored-by: Mikael Öhman <micketeer@gmail.com>
Justed tested the build on one of our GPU nodes. These have four A100s, each configured as an X screen. We don't have EGL configured on these nodes (no access to /dev/dri/...), so only GLX will work. Some background: for 5.10 I've provided our users with 2 different builds:
Running 5.11.1-foss-2022a-mpi-egl-osmesa under VirtualGL in a VNC desktop works, but does print these warning messages related to Mesa, which I haven't seen before. It doesn't show up when using ParaView/5.10.1-foss-2022a-mpi. Since the warning comes from libEGL it must be related to the EGL support in the 5.11.1 build, as that's not in our 5.10.1 module.
Rendering within the GUI seems okay and the connection info reports the NVIDIA GPU being used, and I see load in Since we don't have EGL any headless non-X rendering should fall back to OSMesa, and it does (showing the same libEGL warnings). I've been using a small Python script over the years to check what OpenGL and rendering capabilities ParaView reports itself:
When running this with
Checking to see what happens if we try to rendering something simple shows a number of issues.
The resulting image is black. When trying to force offscreen rendering I think it still attemps to open a window (and fails on that):
So for me the current 5.11.1 build would only provide a working GUI version. Node GPU info (first screen only, other 3 are the same):
|
@paulmelis Thank you for highlighting this use-case and providing your script. I didn't consider the specific use-case of headless software rendering (my motivation for working on headless rendering has always been to access high-powered GPU:s). I don't currently have access to our building environment to mess with the building configurations, but I get the same result as you report (no Mesa offscreen rendering), although I don't get any warnings even on a node without any GPU. My guess is that your use-case would work if we set
Since as far as I can see the main use-case for this would be when EGL is not available, it might make sense to then also set
I would have to check which combinations of these work and which do not to be sure, however, about which is the most appropriate, but an offscreen OSMesa only build sounds like a reasonable option. |
Remove now-obsolete file
OK, so I went over everything again.
|
This I don't get. EGL does not provide software-based rendering by itself, only OSMESA does. Unless you have your OpenGL environment set up in such a way that the software implementation from the Mesa package is used as a fallback when NVIDIA can't be used (which kind of defeats the purpose of the OSMESA alternative). I really need to play around with your latest easyconfig to get my head around this stuff. Btw, isn't one of the main reasons to go for an EGL build so you won't need an X server and therefore have less of a security and configuration hassle?
With regards to CPU-based OpenGL rendering, how often would you envision that happens in the standalone GUI versus in the Paraview server? For us the software-rendering server is merely a fallback option for really large datasets that don't fit on our GPU nodes. Most uses fit on a single GPU node and so can use a set of GPU-rendering server processes (through X, although I still need to configure EGL to get rid of X). I wish we had some more data on use cases to make a good choice among all the combinations :) I also don't really know how many sites have (only) EGL enabled versus X. |
Managed to build a 2022a version (as that's what we're running) of your ParaView-5.11.1-foss-2022b.eb. GUI version runs fine with NVIDIA OpenGL rendering, no surprise there. Offscreen rendering (i.e. avoiding a window getting created) does not seem to work, though. E.g. The 5.11 version also seems to need an X server for all different modes of operation. |
Looking a bit closer into EGL support on our GPU nodes it is available, as I wasn't aware the device-file configuration for
If I override the EGL ICD search path to
@lcniel Don't you see the same issue when Mesa is loaded? Or have you changed the ICD search paths? |
See this PR of mine: #18630 As I mentioned above in this PR:
|
Personally, the only thing I would ever need is headless EGL without X, in GPU mode, to run pvserver. Very large datasets can be streamed over multiple GPUs (and I think maybe even nodes) with NVidia Index, but I am still working on a separate build for that (there are CUDA dependencies). In practice, I suspect a chunk of users will want to run ParaView interactively through a virtual desktop client using VirtualGL. Because they find pvserver to be a hassle, or don't have the right version of Paraview, or have occasionally unstable connections and don't want to risk Paraview crashing from that, or they just want to do a quick sanity check. You get the idea. What I'd like is thus for there to be a single build that supports at minimum both of these "typical" options. Software rendering I see more as a bonus or fallback, like you say. If I have to choose between EGL + X and EGL + Headless Software Rendering I would choose EGL + X any day of the week. But as you seem to note I don't think that choice really needs to be made - EGL supports software rendering on GPU-less nodes. If there are very specialized pvpython or pvweb (the latter of which is AFAIK heavily undersupported/abandoned atm, but has promising uses, see Trame) users that need special builds, I would say that is also a separate issue. (It's unfortunate that the various Paraview web client-server apps are so underdeveloped, I tried writing an applet in Trame for simple volume rendering, it was awesome but the lack of an interactive color bar/opacity editor, or the ability to load regular Paraview color maps, kills it. I guess |
On our NVidia VirtualGL login node while starting ParaView I am getting
and for
while for older ParaView we get cleaner output
I would say that this FAT build is not correct for now since it shows only Mesa rendering and disregards LD_PRELOADed VIrtualGL imposter libraries. Anyway, I could use it for PR #18711 |
Forgot to also update checksum for this patch here after adding description.
Test report by @lcniel |
Test report by @lcniel |
Test report by @akesandgren |
configopts += '-DPARAVIEW_ENABLE_XDMF2=ON ' | ||
configopts += '-DPARAVIEW_ENABLE_XDMF3=ON ' | ||
configopts += '-DPython3_ROOT_DIR=$EBROOTPYTHON ' | ||
configopts += '-DCMAKE_CUDA_ARCHITECTURES=70 72 80 86 87 89 90 90a ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not work, it would need " around the list which should be ";" separated, and the list should not be hardcoded but taken from the cuda_compute_capabilities parameter like this
-DCMAKE_CUDA_ARCHITECTURES="%(cuda_cc_cmake)s"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're also beginning to use this construct instead of the "+=" stuff:
_copts = [
'-Dparam1',
'-Dparam2',
]
configopts = ' '.join(_copts)
to reduce the risk of missing a " " between the arguments.
Test report by @lcniel |
] | ||
|
||
_copts = [ | ||
'-DPARAVIEW_INSTALL_DEVELOPMENT_FILES=ON -DPARAVIEW_BUILD_SHARED_LIBS=ON', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be split into two lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anfd it is ok to keep relevant comments from the "configopts += " version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will have to see if the style tests accept comments (I know e.g. flake8 is quite touchy about them).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
] | ||
|
||
_copts = [ | ||
'-DPARAVIEW_INSTALL_DEVELOPMENT_FILES=ON -DPARAVIEW_BUILD_SHARED_LIBS=ON', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split into two lines here too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Test report by @akesandgren |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Going in, thanks @lcniel! |
Test report by @lcniel |
(created using
eb --new-pr
)Edit 09-06: With the merging of #18630 and easybuilders/easybuild-easyblocks#2985 this should now function after building more straightforwardly, provided config for EGL exists.