-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase tolerance for k-shell identification in LongRange potentials #2137
Conversation
OK to test |
It would be very interesting to test the limits of the code following this change, e.g. find the size of cell for which the LR code fails against the reference. |
src/LongRange/KContainer.cpp
Outdated
for (int ik = 0; ik < numk; ik++) | ||
{ | ||
int k_ind = static_cast<int>(ksq_tmp[ik] * 1000); | ||
int k_ind = static_cast<int>(ksq_tmp[ik] * 100000000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also change to long int
here? I don't see why we are limiting ourselves to int
for an operation that has proved delicate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Since numk can be huge, as can all the other integer indices, why not have long ints throughout?
Well, this update might have already pushed the limits because the CI is failing in Coulomb related unit tests! Perhaps a cast is needed as @jtkrogel suggests. I suggest a hard check for possible overflow as well. Simply not using a integer map and using the actual k-vector length as a float would have much less risk. ("Premature optimization causes bug"). To be discussed later once we have some less wrong / more safe code in production. |
We could definitely use a different algorithm to populate the shell table. Simply sorting the Another issue is more fundamental: the radial difference between adjacent shells in any type of rectilinear grid will rapidly decrease with radius, potentially leading to precision problems. It may or may not be an issue in practice, though. |
@rcclay Your update arrived as I was commenting. Use a long long for the numk loop as well? All the integer indices here can be large. |
One thing I'd like to establish is whether there are any actual computational savings by doing this grouping by k-shell. Looking at the actual long-range evaluation, this is done by a sum over rho_k where rho_k is not radially symmetric... the code first sums over every rho_k in the shell, then multiplies the sum by v_|k|. While this is fine, I'm not sure how much we save by doing it this way as opposed to summing over all k-points. |
@prckent Here's my thinking: provided that int is 32 bits, this will accommodate a k-space grid of roughly 1024x1024x1024 before overflow. If we don't think this is big enough, I'll go ahead and change the loop variable to 64 bit long long. |
@rcclay If we can improve the code by simplifying it as you suggest that would be a big improvement. However beware that there could be some numerical reasons why things are done as they are. We can discuss after getting this fix in. |
@rcclay long long or error trap the too many k for range of int situation. |
Rather expectedly, the number of k-shells is pretty sensitive to tolerance in mixed precision. I will need to think about this. This might be a compelling reason to do the full sum over k-points... |
My two cents. 1. Need an immediate fix. 2. Need a long term fix with better thinking. Feel headache with the mapping. QE has g vector sorting which can be very slow. Parallel sorting is tricky. Need to reevaluate the cost and see if some current optimization is worthwhile. |
I don’t think the key problem is sorting but some unsafe assumptions made in the past. |
OK. Making the mixed precision build work with this change is going to be a bit nontrivial. I think this workaround might be better implemented by a user rather than being pushed into the main code... at least until a proper long-term fix is in place. |
Where is the problem? Can you upgrade the precision to be double/full throughout the setup code? I would prefer that the mixed precision code is slower but less dangerous in v3.9.0 |
Problem is that the KContainer is PosType, which means it's float in multiprecision. If you make the determination of shell based on |k|^2, you don't really have the precision to resolve much past where the threshold is currently set. This can be worked around by changing the types of KContainer and |k|^2, but it starts looking like a lot of work for what was supposed to be a quick workaround... |
Confirming that this builds with Intel2019. The "bug" tests for graphene z10 nearly pass while for z30 the errors are still large. Is this consistent with your runs? z30 seems very slow to converge even with changing LR_dim_cutoff. Unless you have other suggestions I think we should merge this, putting mainline in a much safer state.
|
@prckent I've reproduced the failing tests. It looks like for the Z30, I can't increase LR_DIM_CUTOFF much past 20 before the optimized breakup starts turning up singular values in the SVD solve for the breakup coefficients. I will check this against the Ewald handler and see what happens. |
The plot thickens. Here's what I get when I swap out the Optimized Breakup with the Ewald3D handler. This indicates to me that there might be another issue in the optimized breakup handler. Here's for the Z10 case: Here's for the Z30 case Comparing this with the old behavior, it looks like this PR just makes sure that the operation of the K-space + short range sums converges to the correct answer if the breakup handler is implemented correctly... compared to the numbers posted here, the ewald numbers in the current develop branch are a little farther away from the correct answer (to the tune of < 1mHa), but the big error we're seeing in the failed deterministic test has almost certainly got to be localized somewhere in LRHandlerTemp.h |
@rcclay I suggest we merge this now since it fixes at least one bug, if not the slow convergence of the optimized breakup. Do you agree? |
I agree. |
This is a temporary fix for Issue #2105 until a more general solution can be discussed.