Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

f08: resolve osx link constant issues #5682

Merged
merged 7 commits into from
Jan 25, 2022
Merged

Conversation

hzhou
Copy link
Contributor

@hzhou hzhou commented Nov 17, 2021

Pull Request Description

The old mechanism uses separate symbols, e.g. MPIR_C_MPI_UNWEIGHTED, as a "link associated" variable between Fortran and C. However, this mechanism depends on how the system linker and may break, for example, on current osx. Instead, we can simply implement utility C functions to get the actual constant, e.g. MPI_UNWEIGHTED at runtime. It is more robust and removes the extra complexity of maintaining separate exposed variables.

Other constants, such as MPI_ARGV_NULL are simple constants. The old mechanism initializes the corresponding MPIR_C_MPI_ARGV_NULL at compile time. While works, this is based on the knowledge/assumption that these constants are defined in mpi.h. In this PR, we convert those to use the C utility functions as well for simplification and consistency.

Fixes #4374

[skip warnings]

Author Checklist

  • Provide Description
    Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • Commits Follow Good Practice
    Commits are self-contained and do not do two things at once.
    Commit message is of the form: module: short description
    Commit message explains what's in the commit.
  • Passes All Tests
    Whitespace checker. Warnings test. Additional tests via comments.
  • Contribution Agreement
    For non-Argonne authors, check contribution agreement.
    If necessary, request an explicit comment from your companies PR approval manager.

@hzhou hzhou force-pushed the 2111_f08_const branch 2 times, most recently from 1b7939e to bac5eff Compare November 17, 2021 17:50
@hzhou hzhou mentioned this pull request Nov 17, 2021
4 tasks
@hzhou
Copy link
Contributor Author

hzhou commented Nov 17, 2021

test:mpich/ch3/tcp
test:mpich/ch4/ofi

all ✔️

@hzhou hzhou requested a review from raffenet November 17, 2021 22:16
@raffenet
Copy link
Contributor

The Jenkins results don't contain f08 tests.

This will probably need a review from Cray before we can merge, as they were the original contributors of this code, and helped write it up in our paper. We should also test with the Cray ftn compiler.

@raffenet
Copy link
Contributor

For reference: https://www.mcs.anl.gov/papers/P5139-0514.pdf section 3.3 describes the named constant implementation.

@hzhou
Copy link
Contributor Author

hzhou commented Nov 18, 2021

The Jenkins results don't contain f08 tests.

Only the tests of ch3-osx and ch4-asan/ubsan enables f08. The former uses gcc-10 and the latter use gcc-9. The f08 tests are there.

@hzhou
Copy link
Contributor Author

hzhou commented Nov 18, 2021

This will probably need a review from Cray before we can merge, as they were the original contributors of this code, and helped write it up in our paper. We should also test with the Cray ftn compiler.

Sure, who should we ask for review? Let me ask on the slack.

Copy link

@SteveO-Cray SteveO-Cray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hopefully will leave a general comment.

I think the Fortran/C interface routines look fine, but I have sent a sample to Bill Long to check the Fortran, I'm not good at Fortran 2008 either.

The changes in general look good to me, but I don't know python or the python binding script and don't have the time right now to figure it out (and I'm out Friday). I will try and look at this more while we wait for a response from Bill.

My two other questions should not be considered blocking issues.

If you are in a hurry for some reason to merge this PR, I would say merge it (i.e. I approve) if you can build MPICH and pass the F08 regression tests. It would be preferable if you could pass using at least 2 different compilers that support Fortran 2008.

no|none - No Fortran support
],,[enable_fortran=f77,f90])
],,[enable_fortran=all])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So a probably dumb question, but what if someone wants to build MPICH with a Fortran compiler with F08 support? How do they build the tests and run them? Admittedly most compiler are F08 compliant but there might still some out there. Or someone wants to build with an old GNU compiler. I don't care, and I doubt HPE cares, but in the interest of portability and compatibility you might want to reconsider.

Or I might be completely missing the point of this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! It is not dumb at all. How to match the test suite to the mpich build is tricky. The issue we were having is even though mpich is built with f08, but the tests is not enabled unless we hard code the option into Jenkins' job. We have been missing the test coverage because of this.

Later in the configure there is code to check whether the compiler actually has 2008 support:

mpich/test/mpi/configure.ac

Lines 1411 to 1413 in 26cc6a9

if test "$enable_f08" = "yes" ; then
PAC_FC_2008_SUPPORT([enable_f08=yes],[enable_f08=no])
fi

So if the compiler is too "old" (e.g. gcc-8), the tests will be disabled later.

@@ -24,9 +24,6 @@ module mpi_f08_link_constants
type(MPI_Status), bind(C, name="MPIR_F08_MPI_STATUS_IGNORE_OBJ"), target :: MPI_STATUS_IGNORE
type(MPI_Status), dimension(1), bind(C, name="MPIR_F08_MPI_STATUSES_IGNORE_OBJ"), target :: MPI_STATUSES_IGNORE

type(c_ptr), bind(C, name="MPIR_C_MPI_STATUS_IGNORE") :: MPIR_C_MPI_STATUS_IGNORE
type(c_ptr), bind(C, name="MPIR_C_MPI_STATUSES_IGNORE") :: MPIR_C_MPI_STATUSES_IGNORE

! Though these two variables are required by MPI-3 Standard, they are not used in MPICH
type(c_ptr), bind(C, name="MPI_F08_STATUS_IGNORE") :: MPI_F08_STATUS_IGNORE ! Point to MPI_STATUS_IGNORE
type(c_ptr), bind(C, name="MPI_F08_STATUSES_IGNORE") :: MPI_F08_STATUSES_IGNORE ! Point to MPI_STATUSES_IGNORE

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are MPI_F08_STATUS_IGNORE and MPI_F08_STATUSES_IGNORE broken similar to MPIR_C_MPI_STATUS_IGNORE and MPIR_C_MPI_STATUSES_IGNORE? I know MPICH does not use them, but what if the user does?

Copy link
Contributor Author

@hzhou hzhou Nov 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both MPI_F08_STATUS_IGNORE and MPI_F08_STATUSES_IGNORE are constants (1) defined in mpi.h and they are statically initialized in src/binding/fortran/use_mpi_f08/cdesc.c. It worked, but it relied on the value being constants defined in mpi.h. The new mechanism should be more robust.

MPI_F08_STATUS_IGNORE and MPI_F08_STATUSES_IGNORE works the same way as MPIR_C_MPI_UNWEIGHTED and MPIR_C_MPI_WEIGHTS_EMPTY, but since the symbols on C side is missing MPICH_API_PUBLIC, thus not exposed, it is likely broken. But there is nothing checks it or make it matter

hzhou added a commit to hzhou/mpich that referenced this pull request Nov 26, 2021
This test is fixed with pmodels#5682. Xfail for now.
hzhou added a commit to hzhou/mpich that referenced this pull request Nov 26, 2021
This test is fixed with pmodels#5682. Xfail for now.
hzhou added a commit to hzhou/mpich that referenced this pull request Nov 27, 2021
This test is fixed with pmodels#5682. Xfail for now.
hzhou added a commit to hzhou/mpich that referenced this pull request Nov 29, 2021
This test is fixed with pmodels#5682. Xfail for now.
hzhou added a commit to hzhou/mpich that referenced this pull request Nov 29, 2021
This test is fixed with pmodels#5682. Xfail for now.
hzhou added a commit to hzhou/mpich that referenced this pull request Nov 29, 2021
This test is fixed with pmodels#5682. Xfail for now.
@hzhou hzhou force-pushed the 2111_f08_const branch 2 times, most recently from 4ecea5a to df3a057 Compare November 30, 2021 18:25
@SteveO-Cray
Copy link

FYI - I am still waiting on a confirmation from Bill Long about the F08 code. I didn't send him enough info first round (my mistake). I unfortunately did not get back to this PR, will try to get to it yet today.

@hzhou
Copy link
Contributor Author

hzhou commented Nov 30, 2021

FYI - I am still waiting on a confirmation from Bill Long about the F08 code. I didn't send him enough info first round (my mistake). I unfortunately did not get back to this PR, will try to get to it yet today.

No worries. Meanwhile, I noticed that on osx+intel compiler we can't even resolve MPI_IN_PLACE and MPI_BOTTOM. Those two symbols can't be fixed in the same way as the rest of the symbols since they are directly passed to wrappers_c. I am working on the last commit to see if that is due to dynamic linker issue. You can ignore the last commit in your review for now.

@hzhou
Copy link
Contributor Author

hzhou commented Nov 30, 2021

test:mpich/custom/pipeline
label: osx
compiler: intel

✔️ That worked.

@SteveO-Cray
Copy link

No worries. Meanwhile, I noticed that on osx+intel compiler we can't even resolve MPI_IN_PLACE and MPI_BOTTOM. Those two symbols can't be fixed in the same way as the rest of the symbols since they are directly passed to wrappers_c. I am working on the last commit to see if that is due to dynamic linker issue. You can ignore the last commit in your review for now.

That was part of what I was going to look into (sorry, pulled away again). I wondered why MPI_IN_PLACE and MPI_BOTTOM and possibly others were not affected, I guess they are. I will get to finishing this review at some point.

@hzhou
Copy link
Contributor Author

hzhou commented Dec 1, 2021

No worries. Meanwhile, I noticed that on osx+intel compiler we can't even resolve MPI_IN_PLACE and MPI_BOTTOM. Those two symbols can't be fixed in the same way as the rest of the symbols since they are directly passed to wrappers_c. I am working on the last commit to see if that is due to dynamic linker issue. You can ignore the last commit in your review for now.

That was part of what I was going to look into (sorry, pulled away again). I wondered why MPI_IN_PLACE and MPI_BOTTOM and possibly others were not affected, I guess they are. I will get to finishing this review at some point.

F08 has two types of interfaces. For functions with choice buffers, it is directly linked to C wrappers in wrappers_c/f08_cdesc.c. For other functions, it is using Fortran wrappers in wrappers_f/f08ts.f90. Since MPI_IN_PLACE and MPI_BOTTOM only used in choice buffer parameters, it needs a linked C symbol that can be directly accessed in the C wrappers. For the other symbols, they are checked in the Fortran wrappers, so direct linkage can be bypassed using the C "getter" functions in this PR.

However, the status (MPI_STATUS_IGNORE, MPI_STATUSES_IGNORE) are used in both C wrappers (e.g. MPI_Recv) and Fortran Wrappers (e.g. MPI_Wait), so it probably needs both the C symbol linkage and C getter. This is likely broken since the beginning and we simply missed it due to lack of tests. I'll verify with a test case and then add an additional commit to fix it.

@hzhou
Copy link
Contributor Author

hzhou commented Dec 1, 2021

However, the status (MPI_STATUS_IGNORE, MPI_STATUSES_IGNORE) are used in both C wrappers (e.g. MPI_Recv) and Fortran Wrappers (e.g. MPI_Wait), so it probably needs both the C symbol linkage and C getter. This is likely broken since the beginning and we simply missed it due to lack of tests. I'll verify with a test case and then add an additional commit to fix it.

Actually the functions with choice buffer parameters go through both Fortran wrappers and C wrappers, so the status constants are checked in the Fortran wrappers, thus we are good. In fact, we could check MPI_IN_PLACE and MPI_BOTTOM in the Fortran wrappers as well. It maybe better since that brings consistency to how we treat these constants. If that is preferred, I can amend the current code.

@hzhou
Copy link
Contributor Author

hzhou commented Dec 1, 2021

In fact, we could check MPI_IN_PLACE and MPI_BOTTOM in the Fortran wrappers as well.

Alas, we can't. The C wrapper need directly access the choice buffer parameter as CFI_cdesc_t, thus we can't replace the constant parameter in the Fortran wrappers. The current code is the only solution.

@hzhou
Copy link
Contributor Author

hzhou commented Dec 1, 2021

NOTE: added a commit to cleanup MPI_F08_STATUS_IGNORE and MPI_F08_STATUSES_IGNORE

test:mpich/custom/pipeline
label: osx
compiler: intel
✔️

test:mpich/ch3/most
test:mpich/ch4/most

Failed to build f08 tests. Apparently, the linkage of C symbols happens at the stage of linking user program, thus the symbols need be visible.

@hzhou
Copy link
Contributor Author

hzhou commented Dec 1, 2021

test:mpich/ch3/most
test:mpich/ch4/most

✔️

@hzhou
Copy link
Contributor Author

hzhou commented Jan 7, 2022

@raffenet @SteveO-Cray I don't want this PR to become too stale. Should we try to make a list of concerns (which may include "approval from Bill Long")?

@raffenet
Copy link
Contributor

raffenet commented Jan 7, 2022

@raffenet @SteveO-Cray I don't want this PR to become too stale. Should we try to make a list of concerns (which may include "approval from Bill Long")?

No other concerns from me.

Copy link

@SteveO-Cray SteveO-Cray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies. I forgot about this PR. I have not heard any further comments/objections from BIll and I think it is long enough to wait. I think you should proceed to merge.

hzhou added 7 commits January 24, 2022 20:38
Extend linebreak for general line with comma separations. This is to
allow using linebreaking for Fortran declarations.
The mechanism for external linkage of global variable from fortran
dynamic library to C dynamic library is fragile despite the fortran
specification. For example, currently this mechanism does not work
on osx.

Rather than adding more hacking and work arounds, avoid the mess by
simply getting the symbol using a C function. The f08 interoperability
using C functions is fairly robust.
These are due to link constants issues, addressed in this PR.
Both symbols are not used by the C binding and can be contained in the
Fortran binding. This avoids the issue of resolving common symbols
depending on the behavior of dynamic linker.
Both MPI_F08_STATUS_IGNORE and MPI_F08_STATUSES_IGNORE don't need live
in the C Binding since application is not suppose to use them unless the
fortran binding is linked. Move them to the fortran binding avoids the
linkage dependency on dynamic linker, which appears very fragile and
currently does not work on osx.

Both symbols are C symbols, thus there is no need to declare them in
mpi_f08_link_constants.f90.

The original comment -- "Although ..., they are not used in ..." is
misleading. The two symbols are for application use, regardless whether
implementation use it or not.

Define the symbols in mpi.h as external is safe. User is not supposed to
use it unless they link with fortran binding, i.e. libmpifort.so, which
will resolve the symbol.
This is needed for `HAVE_VISIBILITY` option. It is also necessary to
honor the configure for any potential system features.
Use assert to avoid including the whole mpich internal header stack.
@hzhou hzhou merged commit f21944a into pmodels:main Jan 25, 2022
@hzhou hzhou deleted the 2111_f08_const branch January 25, 2022 02:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug/jenkins: F08 failures on osx
3 participants