-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent results from qmca when analysing multiple files #4556
Comments
FYI. If you lock the equlibration steps, consistent results can be obtained.
|
So this relates just to the default method of selecting the equilibration length? If so, the observed behavior is not a bug. That algorithm uses a random number generator to avoid bias in the selection process. |
I would agree with @prckent that qmca should not give a different set of results.
|
If you choose either edge of the selection window, you will systematically bias the result to be high or low. If you choose the middle, you are biasing the result toward the mean of the last half of the data. Up to you/choose your bias. |
While I appreciate the need to have a random number to avoid bias, the randomness has caught me out several times when looking at run-to-run reproducibility from qmca. There's always the xkcd solution :) |
If you don't want to choose, but like the aesthetics of same-input/same-output, we could just make a hash of the input data and use that to seed the rng. |
Thoughts here? |
I haven't had any chance since yesterday to examine this but I think at least a simple fix is needed here. First of all, the use of randomness isn't documented or at least I totally missed it and was surprised. This could be fixed with a few extra sentences in the help docs. Second, and I consider this an actual bug, my understanding is that the analysis results today depend on the order that the files are specified in or if they are analyzed separately. One solution would be to use the same seed for each file. Third, if we are going to use randomness at all, there needs to be a way of setting the seed. I favor the XKCD solution suggested by Mark as the default with an option for using a robust initialization from recent numpy. If the randomness is only used to find the equilibration period, I am pretty sure that there are several published/peer-reviewed schemes that do not require any randomness. |
The equilibration detection algorithm is heuristic and meant for convenience (good faith attempt at statistically valid result) rather than to be relied on in all cases for production. I will implement a modification that sets the seed reproducibly for each file. In general, a more deeply vetted scheme is desirable. If one is proposed here, I will look into implementing it. |
See #4557 |
I'm concerned about creating a situation where the inconsistent results are more subtle. The hash fix reduces the chances of running into inconsistent results, but doesn't eliminate them completely (will give details on the PR). Is there any way to get the equilibration time as an output from qmca? It shows up in the trace graph, but is there any textual output? Should be something printed by default, at least with "-e auto"?. The cost is more complicated output, but it might be a good reminder that the equilibration time is being computed and removed from the data. |
Describe the bug
While checking #4549 I noticed that at some recent point we have reordered the scalar.dat output. The columns are correctly labeled. Unfortunately, when files of different versions are mixedm qmca doesn't completely handle the situation and gives inconsistent results . They are nearly OK which suggests the correct data is being used for the most part.
To Reproduce
Analyze different versions of scalar files simultaneously. In the following notice the debug* results are reported consistently but the orig* ones are not. Quantities other than the energy are suspect, e.g. samples, although I noticed this first by the energy.
Archive of the respective files attached. Used qmca from develop.
Expected behavior
Either abort in this scenario (easiest) or give consistent results.
System:
Develop, runs & analysis on nitrogen2 with amdclang cpu nightly configuration.
Additional context
scalar.tar.zip
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: