Increase tolerance for k-shell identification in LongRange potentials #2137

rcclay · 2019-12-09T13:36:42Z

This is a temporary fix for Issue #2105 until a more general solution can be discussed.

prckent · 2019-12-09T14:23:06Z

OK to test

jtkrogel · 2019-12-09T14:23:10Z

It would be very interesting to test the limits of the code following this change, e.g. find the size of cell for which the LR code fails against the reference.

jtkrogel · 2019-12-09T14:25:35Z

src/LongRange/KContainer.cpp

  for (int ik = 0; ik < numk; ik++)
  {
-    int k_ind = static_cast<int>(ksq_tmp[ik] * 1000);
+    int k_ind = static_cast<int>(ksq_tmp[ik] * 100000000);


Should we also change to long int here? I don't see why we are limiting ourselves to int for an operation that has proved delicate.

Agree. Since numk can be huge, as can all the other integer indices, why not have long ints throughout?

prckent · 2019-12-09T14:26:44Z

Well, this update might have already pushed the limits because the CI is failing in Coulomb related unit tests! Perhaps a cast is needed as @jtkrogel suggests. I suggest a hard check for possible overflow as well.

Simply not using a integer map and using the actual k-vector length as a float would have much less risk. ("Premature optimization causes bug"). To be discussed later once we have some less wrong / more safe code in production.

jtkrogel · 2019-12-09T14:42:06Z

We could definitely use a different algorithm to populate the shell table. Simply sorting the ksq_tmp vector and then running a single pass over it to check the difference between adjacent values vs. a real valued tolerance should allow a more robust tabulation into shells.

Another issue is more fundamental: the radial difference between adjacent shells in any type of rectilinear grid will rapidly decrease with radius, potentially leading to precision problems. It may or may not be an issue in practice, though.

prckent · 2019-12-09T15:18:48Z

@rcclay Your update arrived as I was commenting. Use a long long for the numk loop as well? All the integer indices here can be large.

rcclay · 2019-12-09T15:20:53Z

One thing I'd like to establish is whether there are any actual computational savings by doing this grouping by k-shell. Looking at the actual long-range evaluation, this is done by a sum over rho_k where rho_k is not radially symmetric... the code first sums over every rho_k in the shell, then multiplies the sum by v_|k|. While this is fine, I'm not sure how much we save by doing it this way as opposed to summing over all k-points.

rcclay · 2019-12-09T15:28:42Z

@prckent Here's my thinking: provided that int is 32 bits, this will accommodate a k-space grid of roughly 1024x1024x1024 before overflow. If we don't think this is big enough, I'll go ahead and change the loop variable to 64 bit long long.

prckent · 2019-12-09T15:29:10Z

@rcclay If we can improve the code by simplifying it as you suggest that would be a big improvement. However beware that there could be some numerical reasons why things are done as they are. We can discuss after getting this fix in.

prckent · 2019-12-09T15:30:58Z

@rcclay long long or error trap the too many k for range of int situation.

rcclay · 2019-12-09T15:53:28Z

Rather expectedly, the number of k-shells is pretty sensitive to tolerance in mixed precision. I will need to think about this. This might be a compelling reason to do the full sum over k-points...

ye-luo · 2019-12-09T15:58:31Z

My two cents. 1. Need an immediate fix. 2. Need a long term fix with better thinking. Feel headache with the mapping. QE has g vector sorting which can be very slow. Parallel sorting is tricky. Need to reevaluate the cost and see if some current optimization is worthwhile.

ye-luo · 2019-12-09T16:01:25Z

I don’t think the key problem is sorting but some unsafe assumptions made in the past.

rcclay · 2019-12-10T13:51:59Z

OK. Making the mixed precision build work with this change is going to be a bit nontrivial. I think this workaround might be better implemented by a user rather than being pushed into the main code... at least until a proper long-term fix is in place.

prckent · 2019-12-10T15:47:38Z

Where is the problem? Can you upgrade the precision to be double/full throughout the setup code? I would prefer that the mixed precision code is slower but less dangerous in v3.9.0

rcclay · 2019-12-12T16:44:45Z

Problem is that the KContainer is PosType, which means it's float in multiprecision. If you make the determination of shell based on |k|^2, you don't really have the precision to resolve much past where the threshold is currently set. This can be worked around by changing the types of KContainer and |k|^2, but it starts looking like a lot of work for what was supposed to be a quick workaround...

prckent · 2019-12-16T21:38:50Z

Confirming that this builds with Intel2019.

The "bug" tests for graphene z10 nearly pass while for z30 the errors are still large. Is this consistent with your runs? z30 seems very slow to converge even with changing LR_dim_cutoff.

Unless you have other suggestions I think we should merge this, putting mainline in a much safer state.

[pk7@oxygen build_intel2019]$ ctest -VV -R "bug-deterministic-grapheneC_1x1_pp-vmc_sdj" |grep -A 10 "ion-ion energy"
1201:   Reference ion-ion energy: 14.002118126747
1201:   QMCPACK   ion-ion energy: 14.003381954628
1201:             ion-ion diff  : 0.0012638278801305
1201:             diff/atom     : 0.00063191394006523
1201:             tolerance     : 0.0003
1201:
1201: Please try increasing the LR_dim_cutoff parameter in the <simulationcell/>
1201: input.  Alternatively, the tolerance can be increased by setting the
1201: LR_tol parameter in <simulationcell/> to a value greater than 0.0003.
1201: If you increase the tolerance, please perform careful checks of energy
1201: differences to ensure this error is controlled for your application.
1201:
--
1203:   Reference ion-ion energy: 81.605978721118
1203:   QMCPACK   ion-ion energy: 82.688199774306
1203:             ion-ion diff  : 1.0822210531883
1203:             diff/atom     : 0.54111052659414
1203:             tolerance     : 0.0003
1203:
1203: Please try increasing the LR_dim_cutoff parameter in the <simulationcell/>
1203: input.  Alternatively, the tolerance can be increased by setting the
1203: LR_tol parameter in <simulationcell/> to a value greater than 0.0003.
1203: If you increase the tolerance, please perform careful checks of energy
1203: differences to ensure this error is controlled for your application.
1203:
Errors while running CTest

rcclay · 2019-12-16T23:11:38Z

@prckent I've reproduced the failing tests. It looks like for the Z30, I can't increase LR_DIM_CUTOFF much past 20 before the optimized breakup starts turning up singular values in the SVD solve for the breakup coefficients. I will check this against the Ewald handler and see what happens.

rcclay · 2019-12-16T23:53:02Z

The plot thickens. Here's what I get when I swap out the Optimized Breakup with the Ewald3D handler. This indicates to me that there might be another issue in the optimized breakup handler.

Here's for the Z10 case:
LR_DIM_CUTOFF ION-ION ENERGY
15 13.999613108
30 14.002116515
Reference: 14.002118126747

Here's for the Z30 case
LR_DIM_CUTOFF ION-ION ENERGY
15 81.603421755
30 81.605977135
Reference: 81.605978721118

Comparing this with the old behavior, it looks like this PR just makes sure that the operation of the K-space + short range sums converges to the correct answer if the breakup handler is implemented correctly... compared to the numbers posted here, the ewald numbers in the current develop branch are a little farther away from the correct answer (to the tune of < 1mHa), but the big error we're seeing in the failed deterministic test has almost certainly got to be localized somewhere in LRHandlerTemp.h

prckent · 2020-01-06T14:56:14Z

@rcclay I suggest we merge this now since it fixes at least one bug, if not the slow convergence of the optimized breakup. Do you agree?

rcclay · 2020-01-06T16:10:19Z

I agree.

Apply #2137 to mixed precision

rcclay added 2 commits December 9, 2019 06:34

increase tolerance for k-shell identification

d56b0e2

On second though, let's add some more digits

9bc7d4a

prckent changed the title ~~Increase tolerance for k-shell identification~~ Increase tolerance for k-shell identification in LongRange potentials Dec 9, 2019

jtkrogel reviewed Dec 9, 2019

View reviewed changes

Change map keys to long long type

730fb98

Merge branch 'develop' into shell_tolerance

896cf01

rcclay added 2 commits December 16, 2019 10:41

Merge branch 'develop' into shell_tolerance

c8dd8fd

Add a mixed precision guard for workaround

5b2c017

Merge branch 'develop' into shell_tolerance

369a21f

prckent added this to the v3.9.0 Release milestone Jan 6, 2020

prckent approved these changes Jan 6, 2020

View reviewed changes

Merge branch 'develop' into shell_tolerance

e5e1a3a

prckent merged commit 6406aa3 into QMCPACK:develop Jan 6, 2020

prckent mentioned this pull request Jan 7, 2020

Incorrect ion-ion energy in some periodic systems #2105

Closed

rcclay deleted the shell_tolerance branch May 8, 2020 18:51

Hyeondeok-Shin mentioned this pull request May 11, 2020

Remove bug tag on ion-ion deterministic test for a graphene. #2445

Merged

prckent added a commit that referenced this pull request Feb 8, 2022

Merge pull request #3811 from ye-luo/update-kshell-mp

074b51e

Apply #2137 to mixed precision

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase tolerance for k-shell identification in LongRange potentials #2137

Increase tolerance for k-shell identification in LongRange potentials #2137

rcclay commented Dec 9, 2019

prckent commented Dec 9, 2019

jtkrogel commented Dec 9, 2019

jtkrogel Dec 9, 2019

prckent Dec 9, 2019

prckent commented Dec 9, 2019 •

edited

Loading

jtkrogel commented Dec 9, 2019

prckent commented Dec 9, 2019

rcclay commented Dec 9, 2019

rcclay commented Dec 9, 2019

prckent commented Dec 9, 2019

prckent commented Dec 9, 2019

rcclay commented Dec 9, 2019 •

edited

Loading

ye-luo commented Dec 9, 2019

ye-luo commented Dec 9, 2019

rcclay commented Dec 10, 2019

prckent commented Dec 10, 2019

rcclay commented Dec 12, 2019

prckent commented Dec 16, 2019 •

edited

Loading

rcclay commented Dec 16, 2019

rcclay commented Dec 16, 2019 •

edited

Loading

prckent commented Jan 6, 2020

rcclay commented Jan 6, 2020

Increase tolerance for k-shell identification in LongRange potentials #2137

Increase tolerance for k-shell identification in LongRange potentials #2137

Conversation

rcclay commented Dec 9, 2019

prckent commented Dec 9, 2019

jtkrogel commented Dec 9, 2019

jtkrogel Dec 9, 2019

Choose a reason for hiding this comment

prckent Dec 9, 2019

Choose a reason for hiding this comment

prckent commented Dec 9, 2019 • edited Loading

jtkrogel commented Dec 9, 2019

prckent commented Dec 9, 2019

rcclay commented Dec 9, 2019

rcclay commented Dec 9, 2019

prckent commented Dec 9, 2019

prckent commented Dec 9, 2019

rcclay commented Dec 9, 2019 • edited Loading

ye-luo commented Dec 9, 2019

ye-luo commented Dec 9, 2019

rcclay commented Dec 10, 2019

prckent commented Dec 10, 2019

rcclay commented Dec 12, 2019

prckent commented Dec 16, 2019 • edited Loading

rcclay commented Dec 16, 2019

rcclay commented Dec 16, 2019 • edited Loading

prckent commented Jan 6, 2020

rcclay commented Jan 6, 2020

prckent commented Dec 9, 2019 •

edited

Loading

rcclay commented Dec 9, 2019 •

edited

Loading

prckent commented Dec 16, 2019 •

edited

Loading

rcclay commented Dec 16, 2019 •

edited

Loading