Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nexus: use data hashing in heuristic equilibration detection algorithm #4557

Merged
merged 5 commits into from
Apr 26, 2023

Conversation

jtkrogel
Copy link
Contributor

@jtkrogel jtkrogel commented Apr 18, 2023

Proposed changes

The heuristic equilibration detection algorithm used by qmca when -e is unspecified uses a random number to avoid stastistical bias in the selection of the equilibration length. This approach generates an estimate of the mean that varies each time qmca is run. While statistically correct, users expect deterministic behavior. This PR seeds the RNG based on the hash of the data being analyzed. This preserves use of the RNG for unbiasedness while presenting deterministic behavior to the user.

Addresses #4556. The reported means are now consistent when multiple files are used (or when the same file is used repeatedly):

>qmca -q e */*/*.scalar.dat
debug/vmc_perf_pbe_u_None_3x3x1_1x1x1_4990_ccecparep_legacy_500_500_1.5_50_True_hf/vmc  series 0  LocalEnergy           =  -197.290314 +/- 0.029507 
debug/vmc_perf_pbe_u_None_3x3x1_1x1x1_5000_ccecparep_legacy_500_500_1.5_50_True_hf/vmc  series 0  LocalEnergy           =  -197.336379 +/- 0.093084 
debug/vmc_perf_pbe_u_None_3x3x1_1x1x1_5010_ccecparep_legacy_500_500_1.5_50_True_hf/vmc  series 0  LocalEnergy           =  -197.176166 +/- 0.030095 
orig/vmc_perf_pbe_u_None_3x3x1_1x1x1_4990_ccecparep_legacy_500_500_1.5_50_True_hf/vmc  series 0  LocalEnergy           =  -198.198560 +/- 0.027040 
orig/vmc_perf_pbe_u_None_3x3x1_1x1x1_5000_ccecparep_legacy_500_500_1.5_50_True_hf/vmc  series 0  LocalEnergy           =  -197.274630 +/- 0.026488 
orig/vmc_perf_pbe_u_None_3x3x1_1x1x1_5010_ccecparep_legacy_500_500_1.5_50_True_hf/vmc  series 0  LocalEnergy           =  -198.168970 +/- 0.031373 

>qmca -q e d*/*/*.scalar.dat
debug/vmc_perf_pbe_u_None_3x3x1_1x1x1_4990_ccecparep_legacy_500_500_1.5_50_True_hf/vmc  series 0  LocalEnergy           =  -197.290314 +/- 0.029507 
debug/vmc_perf_pbe_u_None_3x3x1_1x1x1_5000_ccecparep_legacy_500_500_1.5_50_True_hf/vmc  series 0  LocalEnergy           =  -197.336379 +/- 0.093084 
debug/vmc_perf_pbe_u_None_3x3x1_1x1x1_5010_ccecparep_legacy_500_500_1.5_50_True_hf/vmc  series 0  LocalEnergy           =  -197.176166 +/- 0.030095 

>qmca -q e o*/*/*.scalar.dat
orig/vmc_perf_pbe_u_None_3x3x1_1x1x1_4990_ccecparep_legacy_500_500_1.5_50_True_hf/vmc  series 0  LocalEnergy           =  -198.198560 +/- 0.027040 
orig/vmc_perf_pbe_u_None_3x3x1_1x1x1_5000_ccecparep_legacy_500_500_1.5_50_True_hf/vmc  series 0  LocalEnergy           =  -197.274630 +/- 0.026488 
orig/vmc_perf_pbe_u_None_3x3x1_1x1x1_5010_ccecparep_legacy_500_500_1.5_50_True_hf/vmc  series 0  LocalEnergy           =  -198.168970 +/- 0.031373 

What type(s) of changes does this code introduce?

  • New feature

Does this introduce a breaking change?

  • No

What systems has this change been tested on?

Workstation

Checklist

  • Yes. This PR is up to date with current the current state of 'develop'

@markdewing
Copy link
Contributor

There are still some ways this might not reproduce the value.

  • The hash function (https://docs.python.org/3/reference/datamodel.html#object.__hash__ ) includes a random salt for string and byte values. The data here is probably only numeric (numpy array?), so this most likely won't be an issue. Since Python definitions don't have a declared type, the type of x is not immediately obvious.
  • The hash value is not guaranteed to be the same between Python versions (but it usually is).
  • Changing the data (removing the last line, for instance), could result in a larger than expected change in the output.

These are probably sufficiently unlikely scenarios that having a consistent output via hashing is the more useful solution, but I did want to get them listed and considered.

@jtkrogel
Copy link
Contributor Author

The data are numeric (numpy float64). If a deterministic solution other than hashing is desired, please state it and I will implement that.

@prckent
Copy link
Contributor

prckent commented Apr 18, 2023

This would work for me. Have you thought about printing out the equilibration length? Giving an indication that equilibration length was taken into account was a good suggestion. (Definitely could be done in another PR)

@markdewing
Copy link
Contributor

Test this please

@prckent
Copy link
Contributor

prckent commented Apr 25, 2023

Test this please

@prckent
Copy link
Contributor

prckent commented Apr 25, 2023

Test this please

@prckent
Copy link
Contributor

prckent commented Apr 26, 2023

Test this please

@prckent
Copy link
Contributor

prckent commented Apr 26, 2023

Test this please

@markdewing markdewing merged commit 421d1b5 into QMCPACK:develop Apr 26, 2023
@prckent prckent mentioned this pull request Aug 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants