Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase tolerance for k-shell identification in LongRange potentials #2137

Merged
merged 8 commits into from
Jan 6, 2020

Conversation

rcclay
Copy link
Contributor

@rcclay rcclay commented Dec 9, 2019

This is a temporary fix for Issue #2105 until a more general solution can be discussed.

@prckent prckent changed the title Increase tolerance for k-shell identification Increase tolerance for k-shell identification in LongRange potentials Dec 9, 2019
@prckent
Copy link
Contributor

prckent commented Dec 9, 2019

OK to test

@jtkrogel
Copy link
Contributor

jtkrogel commented Dec 9, 2019

It would be very interesting to test the limits of the code following this change, e.g. find the size of cell for which the LR code fails against the reference.

for (int ik = 0; ik < numk; ik++)
{
int k_ind = static_cast<int>(ksq_tmp[ik] * 1000);
int k_ind = static_cast<int>(ksq_tmp[ik] * 100000000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also change to long int here? I don't see why we are limiting ourselves to int for an operation that has proved delicate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Since numk can be huge, as can all the other integer indices, why not have long ints throughout?

@prckent
Copy link
Contributor

prckent commented Dec 9, 2019

Well, this update might have already pushed the limits because the CI is failing in Coulomb related unit tests! Perhaps a cast is needed as @jtkrogel suggests. I suggest a hard check for possible overflow as well.

Simply not using a integer map and using the actual k-vector length as a float would have much less risk. ("Premature optimization causes bug"). To be discussed later once we have some less wrong / more safe code in production.

@jtkrogel
Copy link
Contributor

jtkrogel commented Dec 9, 2019

We could definitely use a different algorithm to populate the shell table. Simply sorting the ksq_tmp vector and then running a single pass over it to check the difference between adjacent values vs. a real valued tolerance should allow a more robust tabulation into shells.

Another issue is more fundamental: the radial difference between adjacent shells in any type of rectilinear grid will rapidly decrease with radius, potentially leading to precision problems. It may or may not be an issue in practice, though.

@prckent
Copy link
Contributor

prckent commented Dec 9, 2019

@rcclay Your update arrived as I was commenting. Use a long long for the numk loop as well? All the integer indices here can be large.

@rcclay
Copy link
Contributor Author

rcclay commented Dec 9, 2019

One thing I'd like to establish is whether there are any actual computational savings by doing this grouping by k-shell. Looking at the actual long-range evaluation, this is done by a sum over rho_k where rho_k is not radially symmetric... the code first sums over every rho_k in the shell, then multiplies the sum by v_|k|. While this is fine, I'm not sure how much we save by doing it this way as opposed to summing over all k-points.

@rcclay
Copy link
Contributor Author

rcclay commented Dec 9, 2019

@prckent Here's my thinking: provided that int is 32 bits, this will accommodate a k-space grid of roughly 1024x1024x1024 before overflow. If we don't think this is big enough, I'll go ahead and change the loop variable to 64 bit long long.

@prckent
Copy link
Contributor

prckent commented Dec 9, 2019

@rcclay If we can improve the code by simplifying it as you suggest that would be a big improvement. However beware that there could be some numerical reasons why things are done as they are. We can discuss after getting this fix in.

@prckent
Copy link
Contributor

prckent commented Dec 9, 2019

@rcclay long long or error trap the too many k for range of int situation.

@rcclay
Copy link
Contributor Author

rcclay commented Dec 9, 2019

Rather expectedly, the number of k-shells is pretty sensitive to tolerance in mixed precision. I will need to think about this. This might be a compelling reason to do the full sum over k-points...

@ye-luo
Copy link
Contributor

ye-luo commented Dec 9, 2019

My two cents. 1. Need an immediate fix. 2. Need a long term fix with better thinking. Feel headache with the mapping. QE has g vector sorting which can be very slow. Parallel sorting is tricky. Need to reevaluate the cost and see if some current optimization is worthwhile.

@ye-luo
Copy link
Contributor

ye-luo commented Dec 9, 2019

I don’t think the key problem is sorting but some unsafe assumptions made in the past.

@rcclay
Copy link
Contributor Author

rcclay commented Dec 10, 2019

OK. Making the mixed precision build work with this change is going to be a bit nontrivial. I think this workaround might be better implemented by a user rather than being pushed into the main code... at least until a proper long-term fix is in place.

@prckent
Copy link
Contributor

prckent commented Dec 10, 2019

Where is the problem? Can you upgrade the precision to be double/full throughout the setup code? I would prefer that the mixed precision code is slower but less dangerous in v3.9.0

@rcclay
Copy link
Contributor Author

rcclay commented Dec 12, 2019

Problem is that the KContainer is PosType, which means it's float in multiprecision. If you make the determination of shell based on |k|^2, you don't really have the precision to resolve much past where the threshold is currently set. This can be worked around by changing the types of KContainer and |k|^2, but it starts looking like a lot of work for what was supposed to be a quick workaround...

@prckent
Copy link
Contributor

prckent commented Dec 16, 2019

Confirming that this builds with Intel2019.

The "bug" tests for graphene z10 nearly pass while for z30 the errors are still large. Is this consistent with your runs? z30 seems very slow to converge even with changing LR_dim_cutoff.

Unless you have other suggestions I think we should merge this, putting mainline in a much safer state.

[pk7@oxygen build_intel2019]$ ctest -VV -R "bug-deterministic-grapheneC_1x1_pp-vmc_sdj" |grep -A 10 "ion-ion energy"
1201:   Reference ion-ion energy: 14.002118126747
1201:   QMCPACK   ion-ion energy: 14.003381954628
1201:             ion-ion diff  : 0.0012638278801305
1201:             diff/atom     : 0.00063191394006523
1201:             tolerance     : 0.0003
1201:
1201: Please try increasing the LR_dim_cutoff parameter in the <simulationcell/>
1201: input.  Alternatively, the tolerance can be increased by setting the
1201: LR_tol parameter in <simulationcell/> to a value greater than 0.0003.
1201: If you increase the tolerance, please perform careful checks of energy
1201: differences to ensure this error is controlled for your application.
1201:
--
1203:   Reference ion-ion energy: 81.605978721118
1203:   QMCPACK   ion-ion energy: 82.688199774306
1203:             ion-ion diff  : 1.0822210531883
1203:             diff/atom     : 0.54111052659414
1203:             tolerance     : 0.0003
1203:
1203: Please try increasing the LR_dim_cutoff parameter in the <simulationcell/>
1203: input.  Alternatively, the tolerance can be increased by setting the
1203: LR_tol parameter in <simulationcell/> to a value greater than 0.0003.
1203: If you increase the tolerance, please perform careful checks of energy
1203: differences to ensure this error is controlled for your application.
1203:
Errors while running CTest

@rcclay
Copy link
Contributor Author

rcclay commented Dec 16, 2019

@prckent I've reproduced the failing tests. It looks like for the Z30, I can't increase LR_DIM_CUTOFF much past 20 before the optimized breakup starts turning up singular values in the SVD solve for the breakup coefficients. I will check this against the Ewald handler and see what happens.

@rcclay
Copy link
Contributor Author

rcclay commented Dec 16, 2019

The plot thickens. Here's what I get when I swap out the Optimized Breakup with the Ewald3D handler. This indicates to me that there might be another issue in the optimized breakup handler.

Here's for the Z10 case:
LR_DIM_CUTOFF ION-ION ENERGY
15 13.999613108
30 14.002116515
Reference: 14.002118126747

Here's for the Z30 case
LR_DIM_CUTOFF ION-ION ENERGY
15 81.603421755
30 81.605977135
Reference: 81.605978721118

Comparing this with the old behavior, it looks like this PR just makes sure that the operation of the K-space + short range sums converges to the correct answer if the breakup handler is implemented correctly... compared to the numbers posted here, the ewald numbers in the current develop branch are a little farther away from the correct answer (to the tune of < 1mHa), but the big error we're seeing in the failed deterministic test has almost certainly got to be localized somewhere in LRHandlerTemp.h

@prckent
Copy link
Contributor

prckent commented Jan 6, 2020

@rcclay I suggest we merge this now since it fixes at least one bug, if not the slow convergence of the optimized breakup. Do you agree?

@prckent prckent added this to the v3.9.0 Release milestone Jan 6, 2020
@rcclay
Copy link
Contributor Author

rcclay commented Jan 6, 2020

I agree.

@prckent prckent merged commit 6406aa3 into QMCPACK:develop Jan 6, 2020
@rcclay rcclay deleted the shell_tolerance branch May 8, 2020 18:51
prckent added a commit that referenced this pull request Feb 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants