Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mpi walker transmission fix #2232

Merged
merged 13 commits into from
Jan 24, 2020

Conversation

PDoakORNL
Copy link
Contributor

putting this here mostly to get CI, but nearly there.

@@ -22,14 +22,14 @@ ENDIF(QMC_OMP)
# Set clang specific flags (which we always want)
ADD_DEFINITIONS( -Drestrict=__restrict__ )

SET(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fomit-frame-pointer -fstrict-aliasing")
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fomit-frame-pointer -fstrict-aliasing -D__forceinline=inline")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this causing problems in debug build?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it appears that the first -fomit -fno-omit wins.

Copy link
Contributor

@ye-luo ye-luo Jan 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CMAKE_XXX_FLAGS is added to all builds.
For a build release CMake invokes both $CMAKE_XXX_FLAGS $CMAKE_XXX_FLAGS_RELEASE. If all the builds are intended, add to CMAKE_XXX_FLAGS. If release builds are intended, add to CMAKE_XXX_FLAGS_RELEASE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean the CMAKE_C(XX)_FLAGS should not contain -f-omit-frame-pointer if you don't want that in the flags for the debug build. This fixes that. While leaving this optimization in the default RelWithDebugInfo and Release.

++it_walker_elecs;
}
for(int i = 0; i < mcp_walkers_.size(); ++i)
walker_elecs_[i].get().loadWalker(mcp_walkers_[i], true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like this much better than iterators.

@@ -30,7 +30,7 @@ TEST_CASE("DMCBatched::calc_default_local_walkers", "[drivers]")


auto testWRTWalkersPerRank = [&dtest](int walkers_per_rank) {
DMCBatched dmc_batched(std::move(dtest()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to fix here if it works now, but it is unsafe. When I was working on #2221, I noticed the MCPopulation move constructor and move assign operator cannot be compiler generated by default because ParticleSet cannot be moved. The reason is that the compiler tries to use copy assign operator of ParticleSet which should be either deleted or implemented just like the copy constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since most of this PR has been long exploration of the unintended consequences of "unsafe" things I'd like to just leave this fixed.

@@ -42,7 +42,6 @@ OperatorBase::OperatorBase() : myIndex(-1), Dependants(0), Value(0.0), tWalker(0
*/
void OperatorBase::mw_evaluate(const RefVector<OperatorBase>& O_list, const RefVector<ParticleSet>& P_list)
{
#pragma omp parallel for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not change this. This is needed so far. I verified that this works very well with OMP_NUM_THREADS=16 and 1 crowd in the input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So now this is OMP_NUM_THREADS=1,16 behavior but induced by forcing one crowd? I'll put this back in for now (as I expect this has to do with AMD support), but we need to continue to abstract threading and the actual runtime thread setup needs to be handled in one place and not via a proprietary colleciton of ENV. The adhoc nature of OMP thread safety and its opaqueness when reading the source and compiling time is going to continue to be an issue until we do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If running 1 crowd is requested, the first OpenMP level is an inactive threading level, there is no need to set nested threading. When using a list of thread numbers like OMP_NUM_THREADS=1,16, they refer to active threading levels. I used this on Summit to compare with the CUDA code which is conceptually 1 crowd.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get what you mean about the semantics of openmp but this throws threading control back to implicit openmp threading. In application threading model we should be using to reason about thread scopes and synchronization this is a 1,16

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can discuss in person with more details. This raw OpenMP implementation doesn't ignore the crowd level concurrency. Both crowd level and walker level are needed.

@PDoakORNL PDoakORNL changed the title [WIP] Mpi walker transmission fix Mpi walker transmission fix Jan 23, 2020
@prckent
Copy link
Contributor

prckent commented Jan 23, 2020

General comment: Getting MPI working with DMC is blocking the release. Once this is in the focus should be on getting correct statistics. Threading philosophy and design discussions will need to be done, but later.

@@ -457,11 +457,13 @@ void DMCBatched::setMultiplicities(const DMCDriverInput& dmcdriver_input,
auto setMultiplicity = [&dmcdriver_input, &rng](MCPWalker& walker) {
constexpr RealType onehalf(0.5);
constexpr RealType cone(1);
RealType M = walker.Weight;
RealType M;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I submitted #2243 to follow up the remaining issue here.

@ye-luo ye-luo merged commit 6c69d7d into QMCPACK:develop Jan 24, 2020
@ye-luo
Copy link
Contributor

ye-luo commented Jan 24, 2020

Not sure if this should fix the non-MPI case as well or not.
I still get short test failure

Command: "/scratch/opt/cleanup/PDoakORNL/tests/scripts/check_scalars.py" "--ns" "3" "--series" "1" "-p" "qmc_short_vmcbatch_dmcbatch" "-e" "2" "--le" "-21.844975 0.00779"
Directory: /scratch/opt/cleanup/PDoakORNL/build_gnu_real/tests/solids/diamondC_2x1x1_pp/short-diamondC_2x1x1_pp-vmcbatch_dmcbatch_sdj-1-16
"short-diamondC_2x1x1_pp-vmcbatch_dmcbatch_sdj-1-16-totenergy" start time: Jan 23 21:26 CST
Output:
----------------------------------------------------------
Tests for series 1
  Testing quantity: LocalEnergy
    reference mean value     : -21.84497500
    reference error bar      :   0.00779000
    computed  mean value     : -21.68150807
    computed  error bar      :   0.01288663
    pass tolerance           :   0.02337000  (  3.00000000 sigma)
    deviation from reference :   0.16346693  ( 20.98420103 sigma)
    error bar of deviation   :   0.01505820
    significance probability :   1.00000000  (gaussian statistics)
    status of this test      :   fail

Test status: fail
<end of output>

@ye-luo
Copy link
Contributor

ye-luo commented Jan 24, 2020

I ran the same input file with 2 MPI and the result is still far from reference value -21.84497500 listed above.

                            LocalEnergy               Variance           ratio 
qmc_short_vmcbatch_dmcbatch  series 0  -21.645694 +/- 0.000000   1.215875 +/- 0.000000   0.0562 
qmc_short_vmcbatch_dmcbatch  series 1  -21.681356 +/- 0.011070   0.988482 +/- 0.012590   0.0456 

More fixes are on the way?

@PDoakORNL PDoakORNL deleted the mpi_walker_transmission_fix branch February 23, 2022 00:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants