Nexus: use data hashing in heuristic equilibration detection algorithm #4557

jtkrogel · 2023-04-18T14:50:19Z

Proposed changes

The heuristic equilibration detection algorithm used by qmca when -e is unspecified uses a random number to avoid stastistical bias in the selection of the equilibration length. This approach generates an estimate of the mean that varies each time qmca is run. While statistically correct, users expect deterministic behavior. This PR seeds the RNG based on the hash of the data being analyzed. This preserves use of the RNG for unbiasedness while presenting deterministic behavior to the user.

Addresses #4556. The reported means are now consistent when multiple files are used (or when the same file is used repeatedly):

>qmca -q e */*/*.scalar.dat
debug/vmc_perf_pbe_u_None_3x3x1_1x1x1_4990_ccecparep_legacy_500_500_1.5_50_True_hf/vmc  series 0  LocalEnergy           =  -197.290314 +/- 0.029507 
debug/vmc_perf_pbe_u_None_3x3x1_1x1x1_5000_ccecparep_legacy_500_500_1.5_50_True_hf/vmc  series 0  LocalEnergy           =  -197.336379 +/- 0.093084 
debug/vmc_perf_pbe_u_None_3x3x1_1x1x1_5010_ccecparep_legacy_500_500_1.5_50_True_hf/vmc  series 0  LocalEnergy           =  -197.176166 +/- 0.030095 
orig/vmc_perf_pbe_u_None_3x3x1_1x1x1_4990_ccecparep_legacy_500_500_1.5_50_True_hf/vmc  series 0  LocalEnergy           =  -198.198560 +/- 0.027040 
orig/vmc_perf_pbe_u_None_3x3x1_1x1x1_5000_ccecparep_legacy_500_500_1.5_50_True_hf/vmc  series 0  LocalEnergy           =  -197.274630 +/- 0.026488 
orig/vmc_perf_pbe_u_None_3x3x1_1x1x1_5010_ccecparep_legacy_500_500_1.5_50_True_hf/vmc  series 0  LocalEnergy           =  -198.168970 +/- 0.031373 

>qmca -q e d*/*/*.scalar.dat
debug/vmc_perf_pbe_u_None_3x3x1_1x1x1_4990_ccecparep_legacy_500_500_1.5_50_True_hf/vmc  series 0  LocalEnergy           =  -197.290314 +/- 0.029507 
debug/vmc_perf_pbe_u_None_3x3x1_1x1x1_5000_ccecparep_legacy_500_500_1.5_50_True_hf/vmc  series 0  LocalEnergy           =  -197.336379 +/- 0.093084 
debug/vmc_perf_pbe_u_None_3x3x1_1x1x1_5010_ccecparep_legacy_500_500_1.5_50_True_hf/vmc  series 0  LocalEnergy           =  -197.176166 +/- 0.030095 

>qmca -q e o*/*/*.scalar.dat
orig/vmc_perf_pbe_u_None_3x3x1_1x1x1_4990_ccecparep_legacy_500_500_1.5_50_True_hf/vmc  series 0  LocalEnergy           =  -198.198560 +/- 0.027040 
orig/vmc_perf_pbe_u_None_3x3x1_1x1x1_5000_ccecparep_legacy_500_500_1.5_50_True_hf/vmc  series 0  LocalEnergy           =  -197.274630 +/- 0.026488 
orig/vmc_perf_pbe_u_None_3x3x1_1x1x1_5010_ccecparep_legacy_500_500_1.5_50_True_hf/vmc  series 0  LocalEnergy           =  -198.168970 +/- 0.031373

What type(s) of changes does this code introduce?

New feature

Does this introduce a breaking change?

No

What systems has this change been tested on?

Workstation

Checklist

Yes. This PR is up to date with current the current state of 'develop'

markdewing · 2023-04-18T15:36:54Z

There are still some ways this might not reproduce the value.

The hash function (https://docs.python.org/3/reference/datamodel.html#object.__hash__ ) includes a random salt for string and byte values. The data here is probably only numeric (numpy array?), so this most likely won't be an issue. Since Python definitions don't have a declared type, the type of x is not immediately obvious.
The hash value is not guaranteed to be the same between Python versions (but it usually is).
Changing the data (removing the last line, for instance), could result in a larger than expected change in the output.

These are probably sufficiently unlikely scenarios that having a consistent output via hashing is the more useful solution, but I did want to get them listed and considered.

jtkrogel · 2023-04-18T15:44:12Z

The data are numeric (numpy float64). If a deterministic solution other than hashing is desired, please state it and I will implement that.

prckent · 2023-04-18T20:34:50Z

This would work for me. Have you thought about printing out the equilibration length? Giving an indication that equilibration length was taken into account was a good suggestion. (Definitely could be done in another PR)

markdewing · 2023-04-20T15:59:00Z

Test this please

prckent · 2023-04-25T23:08:32Z

Test this please

prckent · 2023-04-25T23:24:59Z

Test this please

prckent · 2023-04-26T12:21:39Z

Test this please

prckent · 2023-04-26T20:01:22Z

Test this please

nexus: use data hashing in heuristic equilibration detection algorithm

da38bd3

jtkrogel mentioned this pull request Apr 18, 2023

Inconsistent results from qmca when analysing multiple files #4556

Open

markdewing approved these changes Apr 20, 2023

View reviewed changes

Merge branch 'develop' into nx_deterministic_equil

dc06e79

Merge branch 'develop' into nx_deterministic_equil

0471c50

Merge branch 'develop' into nx_deterministic_equil

671999f

Merge branch 'develop' into nx_deterministic_equil

7dc9a02

markdewing merged commit 421d1b5 into QMCPACK:develop Apr 26, 2023

prckent mentioned this pull request Aug 18, 2023

Rc 3170 #4702

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nexus: use data hashing in heuristic equilibration detection algorithm #4557

Nexus: use data hashing in heuristic equilibration detection algorithm #4557

jtkrogel commented Apr 18, 2023 •

edited

Loading

markdewing commented Apr 18, 2023

jtkrogel commented Apr 18, 2023

prckent commented Apr 18, 2023

markdewing commented Apr 20, 2023

prckent commented Apr 25, 2023

prckent commented Apr 25, 2023

prckent commented Apr 26, 2023

prckent commented Apr 26, 2023

Nexus: use data hashing in heuristic equilibration detection algorithm #4557

Nexus: use data hashing in heuristic equilibration detection algorithm #4557

Conversation

jtkrogel commented Apr 18, 2023 • edited Loading

Proposed changes

What type(s) of changes does this code introduce?

Does this introduce a breaking change?

What systems has this change been tested on?

Checklist

markdewing commented Apr 18, 2023

jtkrogel commented Apr 18, 2023

prckent commented Apr 18, 2023

markdewing commented Apr 20, 2023

prckent commented Apr 25, 2023

prckent commented Apr 25, 2023

prckent commented Apr 26, 2023

prckent commented Apr 26, 2023

jtkrogel commented Apr 18, 2023 •

edited

Loading