Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallel hdf5 #51

Closed
minrk opened this issue Oct 19, 2016 · 15 comments · Fixed by #90
Closed

parallel hdf5 #51

minrk opened this issue Oct 19, 2016 · 15 comments · Fixed by #90

Comments

@minrk
Copy link
Member

minrk commented Oct 19, 2016

Is it possible to get a parallel-enabled build of hdf5? Assuming a serial build is still desirable, how should they be separated? Different package (e.g. hdf5-parallel) or feature tracking in the hdf5 package (like numpy+blas)?

I'm looking at building fenics packages, and one impediment that has been pointed out is that it needs parallel hdf5 and vtk, but vtk pulls in this serial hdf5.

@mikaem
Copy link

mikaem commented Oct 19, 2016

@minrk I think the best solution would be to create an hdf5 parallel package. There is also the question of which MPI version (mpich/openmpi), so perhaps even a division into hdf5-mpich or hdf5-openmpi are needed? The parallel hdf5 build uses mpicc to compile and requires the —enable-parallel flag. Not sure this can be combined with a serial package. Note that ubuntu has different packages for serial or parallel hdf5, so perhaps it makes perfect sense that conda-forge have the same?

@jakirkham
Copy link
Member

PR ( #48 ) is an attempt at a thread safe build. Don't think that is related though, but feel free to correct me.

@jakirkham
Copy link
Member

I think the big question to answer is how MPI is done. Thus far PR ( conda-forge/staged-recipes#1501 ) of having an MPI feature package seems sensible. If we go that route, we can add a branch at this feedstock for the MPI vs the non-MPI versions.

Maybe you have MPI preferences @mikaem . If so, could you weigh in on PR ( conda-forge/staged-recipes#1501 ) if you haven't already?

@mikaem
Copy link

mikaem commented Oct 20, 2016

@jakirkham PR #48 is not really related.
I like the mpi metapackage. In reality people will want to use either openmpi or mpich, but who's to say which is better? I prefer mpich because it gives me less headache when it comes to installing Fenics, but that hardly counts as a general opinion. I think the performance of both packages are pretty much the same. So the meta package seems like a good solution. I didn't know there could be different branches of the same package, though. Do you have an example using this approach @jakirkham ?

@xmjiao
Copy link

xmjiao commented Mar 15, 2017

Has there been some progress regarding parallel hdf5? I think this is a critical feature to be supported (for fenics). Thanks!

@jakirkham
Copy link
Member

If we went ahead with threaded HDF5, previously PR ( #48 ), now PR ( #57 ), how would that impact MPI support in HDF5? I don't have a clear picture of this ATM (as I don't use either) and would appreciate it if people interested in MPI could weigh in on PR ( #57 ) should it affect your use case or otherwise let us know if it doesn't.

@jakirkham jakirkham mentioned this issue Mar 28, 2017
@jakirkham
Copy link
Member

Any thoughts on the comment above?

@minrk
Copy link
Member Author

minrk commented Mar 28, 2017

I simply don't know about the interactions of HDF5 threads and MPI, so I can't really speak to that. Since the MPI build will presumably be another feature/variant, I don't think it should matter much whether the default build has this flag or not. If it conflicts with the MPI build, the MPI build can disable it.

Your point over there about putting threads in a non-default build makes sense to me if it is really unsafe. I can't speak to the actual safety of it, though. It sounds like it's not actually less safe, just the same as with no thread support enabled for some APIs.

@minrk
Copy link
Member Author

minrk commented Mar 28, 2017

Speaking of which, any way I could get a green light on MPI variants?

@mikaem
Copy link

mikaem commented Mar 28, 2017

The thread support #57 does not impact the MPI support. They are two different issues. And you may have an MPI version of HDF5 built with or without threading AFAIK. Not really an expert on threading.

@xmjiao
Copy link

xmjiao commented Mar 28, 2017

It is a common practice to overlap I/O with computation using multithreading, so HDF5 should be built with the thread-safety option by default. There is some overhead with thread-safety in general, but HDF5 is I/O bound, so the overhead is negligible.

+1 on libhdf5-openmpi and libhdf5-mpich variants.

@ocefpaf
Copy link
Member

ocefpaf commented Mar 28, 2017

so HDF5 should be built with the thread-safety option by default.

That is true pretty much everywhere in the packaging world BTW. conda-forge is behind on this.

@minrk
Copy link
Member Author

minrk commented Feb 6, 2018

I'd love to get this going.

Based on recent experiences and discussions with mumps-mpi and the mpi metapackage, I think features is not the way to go, and that a dedicated hdf5-mpi package is probably the right thing. The current choices seem to be:

  1. one package, differentiated by build string
  2. use a separate package for mpi builds

Downsides to build string vs features:

  • unlike features, build strings don't have a clear no-features default. The empty build string does happen to come first, but I don't think that's something that can be depended upon.
  • it's harder to express that the 'non-mpi' variant should be the default unless specified manually

Upside to build string vs features:

  • Features don't seem to do what people expect, and seem to be discouraged by conda devs (see mpi variants discussion)

Pros for build string vs separate package:

  • single package, single recipe
  • packages that have only runtime dependency on hdf5 can work with both
  • packages that can use either mpi or non-mpi hdf5 can't express this dependency (are there any examples of this - where building against serial hdf5 and running against parallel hdf5 would work?)
  • No need to resolve conflicts between packages providing the same files

Pros for separate package:

  • MPI will never be pulled in implicitly for people depending on hdf5
  • existing packages depending on hdf5 do not need to be updated to add hdf5 *nompi* to avoid pulling in mpi-linked hdf5, which may not be compatible

Cons for separate package:

  • separate packages for one library opens the possibility for envs that try to install both, introducing possible conflicts. Conflicts can be solved by a third metapackage, if desired.
  • separate recipe (typically in a branch, see ptscotch and mumps-mpi) is a bit tedious to maintain

So I would say that the deciding factor is whether it is common for packages that link hdf5 without MPI will be able to run with hdf5 that is build with MPI. If that's not true, then I think it would be better to create a separate package to avoid any existing packages pulling in the mpi variants when only the nompi variant will work. If it is true, then using build strings is probably best and simplest.

@ChristopherHogan
Copy link

What's the status here? Is there anything I can do to help get this going?

@minrk
Copy link
Member Author

minrk commented Nov 20, 2018

hdf5 mpi builds are all working in #90, if folks want to give it a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants