-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial LCAO vgl batched implementation using GEMM #4407
Conversation
Lcao vgl offload merge
8b08510
to
e9231a4
Compare
Test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take a look at my changes and let me know if you have questions.
//elec.update(); | ||
elec.R[0][0] = 0.0001; | ||
elec.R[0][1] = 0.0; | ||
elec.R[0][2] = 0.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elec.R[0] = {0.0001, 0.0, 0.0}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have better idea how to do this throughout the code base.
// auto elec2 = elec.makeClone(); | ||
sposet->evaluateVGL(elec, 0, psiref_0, dpsiref_0, d2psiref_0); | ||
|
||
REQUIRE(std::real(psiref_0[0]) == Approx(-0.001664403313)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change all the value checks from REQUIRE to CHECK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have better idea how to do this throughout the code base.
e9231a4
to
602599f
Compare
Test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I changed a lot in the code. So need non-ANL approval.
Hello Ye, thank you for the review and changes. After taking a look, your changes make sense in terms of removing some of the intermediate objects and more readable/compact initialization of other objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a design issue that OffloadMWVGLArrays need to be multiwalker resources owned around the DiracDeterminantBatched level but are getting defined in method scopes per call.
see: phi_v_g_l in LCAOrbitalSet::mw_evaluateVGL
and basis_mw in LCAOrbitalSet::mw_evaluateVGLImplGEMM
There are a bunch of small things here that make this hard to read and are adding to tech debt.
Requiring manipulation of an XMLTree to write a test is awful but beyond the scope of this PR.
using vgl_type = VectorSoaContainer<T, OHMMS_DIM + 2>; | ||
using vgh_type = VectorSoaContainer<T, 10>; | ||
using vghgh_type = VectorSoaContainer<T, 20>; | ||
using value_type = T; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type
and _type
are redundant here the naming convention is types are leading capital mixed case so:
Value
implies a type already.
Other types here would be properly named Value
Vgl
, Vgh
, Vghgh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess there SoaBasisSetBase::value_type being used. Need to do renaming outside this PR.
std::vector<ValueType>& ratios, | ||
std::vector<GradType>& grads) const | ||
{ | ||
assert(this == &spo_list.getLeader()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs test coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add once I have the shared resource set up.
@@ -30,10 +30,11 @@ namespace qmcplusplus | |||
struct LCAOrbitalSet : public SPOSet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a class
What are the thoughts on getting useful performance either with or without updating the old data structures? If the old structures have the wrong layout, any (re)mapping is an undesired cost that will presumably have to be dealt with "sometime". |
Happy to merge but definitely interested in the answer to my above question. I would not be surprised if changing the data layouts in the old code improved its performance. If changes like this are needed, please keep an eye on the carbon molecule performance tests. |
Test this please |
We are still exploring right now. The new layout makes sense only when walkers are batched I assume. Before this PR, we only have one type of implementation (splines) in the new layout. Now we have two. There will be refactoring in the use side as well. Before it becomes clear what are the most suitable APIs, I'd like to hold massive update to the rest of the code base. |
Addresses QMCPACK#4666. mw_evaluate_notranspose and evaluate_notranspose don't produce bitwise identical results. The problem started by QMCPACK#4407 which introduce a batched evaluation and caused bitwise matrix value checking failure. I printed diff and found the value is at the epsilon of float or double. So no concern. The intention of the assertion was to catch any missing D2H transfer earlier and there is no need of recomputing. So the solution is removing the re-computation.
Proposed changes
This pull request introduces progress towards a batched implementation for the LCAO code. A CPU-only GEMM is introduced for the AO->MO transformation for phi, dphi and d2phi. The API is left unchanged and instead a conversion between old data types and the OffloadVGLArray is done to allow the SPOSet base class to maintain compatibility with the spline code. A separate PR could work towards unifying the API for LCAO and splines.
What type(s) of changes does this code introduce?
Does this introduce a breaking change?
What systems has this change been tested on?
He and EtOH which which are incorporated as unit tests.
Checklist