-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Constant size matrix errors for estimator properties #2413
Comments
Since the default build includes this SET(WALKER_MAX_PROPERTIES 256 CACHE STRING "Maximum number of properties tracked by walkers") This is working as it should. You can't have 8114 properties per walker unless cmake ... -DWALKER_MAX_PROPERTIES=8114 or larger. |
This is not working as it should. From QMCPACK input, users can request the skall estimator which adds elements to the property list so the data can be written to scalar.dat (the only type of output supported for this estimator). The number of elements added depends on the system under study, and so hard wiring it into the build request is untenable. |
Confirming that removal of the skall estimator from my input results in no abort. |
i.e. exactly the same issue as in #2238 |
This would at least provide a runnable route temporarily. In general, I think having build flags depend on runtime requests (user input) undermines the robustness and usability of the code. |
At issue here are primarily the finite size corrections, which we recently obtained a tool for post-processing: |
Since we don't want people to have to recompile the code for routine finite size corrections, this will need to be addressed somehow. Improving the error message would be a good temporary fix. We ought to be able to compute the max properties before creating any walkers with better plumbing. |
The life cycle of the walker elements and mpi send/receive buffers were greatly complicated by this. It was hard to track who was messing with properties when resulting in lots of opportunity for overwrites and overreads. I'd be in favor of setting WALKER_MAX_PROPERTIES rather high by default as it is in host memory. The only reason its so low right now is in case someone wants to run a huge number of walkers in which case you are going to need to make some tradeoffs. This is what we support now since I have the impression that supporting running huge numbers of walkers is more important. However by my calculation at double precision: And of course there could be a cache hit issue caused by this (now the local energies are strided more widely through system ram). But even with the dynamic Matrix that used to be here there is a possible design problem although only to be pursued if its proven to be a performance bottleneck. I'd also mention that a input scheme that finished parsing and translating input to an intermediate representation before beginning construction of "simulation" objects would allow WALKER_MAX_PROPERTIES to be dropped while preserving the ability to set the properties data structure at construct time for the walkers. |
I think another good target for a longer term fix is to move output data from estimators like this one to stat.h5. This would avoid the properties buffer entirely (making a small fixed length of the properties buffer more robust), but would require significant updates to the |
@jtkrogel skall does output to stat.h5, as I have been using it exclusively with the qmcfinitesize tool with v3.8.0. the qmcfinitesize tool doesn't do twist averaging or anything yet, so all of that postprocessing needs to be by hand initially from the stat.h5 and passed as an ascii file to the tool. I actually have never tested the scalar output for skall and used it with the qmcfinitesize tool because of twist averaging not being in the code currently. |
That's good. This makes it all the easier. I think all that is needed, then, is to transition forces to stat.h5 and generally wall off access to scalar.dat for more than a handful of values per observable (actual scalars). |
Just to add in, If you add |
From this POV, fixes here do not merit an additional release IMO. I would be in favor of removing the |
It is a good suggestion to only support hdf5 output. Then we only have one code path and one format. Win. |
I am still working on this but hit some complications in testing the fix. Should be in tomorrow. |
I am getting fatal aborts when attempting to run with current develop. The inputs work for a build from April 2019 and fail with develop.
The abort is right at the start of VMC, related to property (observable) handling:
This is identical to aborts present in the nightly tests:
https://cdash.qmcpack.org/CDash/testDetails.php?test=8024789&build=113981
The code change that likely introduced the current abort scenario was introduced in #2213. The nightly test goes from steady pass to steady failure around the merge of #2213:
https://cdash.qmcpack.org/CDash/testDetails.php?test=8026616&build=114000
Aborts of this type are similar to those seen in #2238.
The abort likely triggers every time an observable adds properties to scalar.dat. Currently both the skall and force estimators do this.
Given that standard estimators are affected, I think the in process release (#2412) should be gated until this issue is fixed.
The text was updated successfully, but these errors were encountered: