Skip to content

Commit

Permalink
Merge pull request #4475 from kayahans/qmcfit
Browse files Browse the repository at this point in the history
qmc-fit: Add parameter fitting with jackknife for e.g. DFT+U, EXX scans
  • Loading branch information
ye-luo authored Mar 2, 2023
2 parents 8baac43 + 26418aa commit 56e2f6c
Show file tree
Hide file tree
Showing 4 changed files with 254 additions and 48 deletions.
71 changes: 63 additions & 8 deletions docs/analyzing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1057,15 +1057,13 @@ Production quality checklist

.. _qmcfit:

Using the qmc-fit tool for statistical time step extrapolation and curve fitting
--------------------------------------------------------------------------------
Using the qmc-fit tool for statistical time step extrapolation, trial wavefunction optimization and curve fitting
-----------------------------------------------------------------------------------------------------------------

The ``qmc-fit`` tool is used to provide statistical estimates of
curve-fitting parameters based on QMCPACK data. Although ``qmc-fit``
will eventually support many types of fitted curves (e.g., Morse
potential binding curves and various equation-of-state fitting curves),
it is currently limited to estimating fitting parameters related to time
step extrapolation.
The ``qmc-fit`` tool is used to provide statistical estimates of curve-fitting parameters based on QMCPACK data.
``qmc-fit`` is currently limited to estimating fitting parameters related to time step extrapolation and trial wavefunction
optimization (optimal U for DFT+U, EXX fractions), it will eventually support many types of fitted curves (e.g., Morse
potential binding curves and various equation-of-state fitting curves).

The jackknife statistical technique
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -1226,6 +1224,63 @@ estimated value of :math:`-3848.28(7)` instead.

Linear (top) and quadratic (bottom) time step fits to DMC data for a 32-atom supercell of MnO obtained with ``qmc-fit``. Zero time step estimates are indicated by the red data point on the left side of either panel.

Performing trial wavefunction optimization fitting, e.g., to find optimal DFT+U
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In this section, we use a 24-atom supercell of monolayer FeCl\ :sub:`2`\ as an example system for wavefunction optimization
fitting. Using single determinant DFT wavefunctions, a practical method to perform wavefunction optimization is done through
scanning the Hubbard-U parameter in a DFT+U calculation used to generate the trial wavefunction. Similarly, one can also scan
different exact exchange ratio parameters in hybrid-DFT calculations. Here, we will show an example of this fitting for the
Hubbard-U parameter, but the same procedure can be applied to any single-parameter scans of trial wavefunctions. Data for this
system has been collected in DMC using the following sequence of Hubbard-U values on Fe-d orbitals: :math:`0 1 2 3 4 5` eV. Some
non-zero U value often minimizes the DMC energy, but optimized U values have limited transferability across different systems.
Similar to the procedure for performing timestep statistical fitting, the quality of the input statistics must be checked using
``qmca`` utility to determine the reblocking factor and equilibration periods. Assuming that an equilibration period of initial 50
steps, ``-e 50``, and a reblocking period of 4, ``-b 6``, is sufficient to remove correlations in the statistical local energies,
the ``qmc-fit`` tool can be used in the following way to obtain a quadratic fit of the data:

::

>qmc-fit u -e 50 -b 6 -u "0 1 2 3 4 5" -f quadratic dmc_u_*/dmc.s001.scalar.dat
fit function : quadratic
fitted formula: (-1230.1071 +/- 0.0045) + (-0.0683 +/- 0.0040)*t + (0.00883 +/- 0.00077)*t^2
root 1 minimum_u : 3.87 +/- 0.14 eV
root 1 minimum_e : -1230.2391 +/- 0.0026 Ha
root 1 curvature : 0.0177 +/- 0.0015

Here, ``qmc-fit u`` indicates we are performing a Hubbard-U/exact-exchange ratio fit,
``-u`` option provides a list of Hubbard-U values :math:`0 1 2 3 4 5` corresponding to the auto-sorted
dmc scalar files with wildcard ``dmc_u_*/dmc.s001.scalar.dat``. Here, ``qmc-fit`` command is invoked at a
directory where folders such as ``dmc_u_0_2x2x1, dmc_u_1_2x2x1, dmc_u_2_2x2x1`` reside.
Here, the text output provides the U value (``minimum_u``) and local energies (``minimum_e``)
at the minima of the polynomial which falls within the range of Hubbard-U values provided in the command
line, e.g. from 0 to 5. Therefore, a U value of :math:`3.8(1)` eV minimizes the DMC energy of the system.
The ``curvature`` is printed for informative purposes only, but a curvature with small error bar could
indicate a higher quality polynomial fit. Similar to the timestep fit, a plot of the fit will also
produced as default where the minima of the polynomial is shown as a red dot as in :numref:`fig13`.

Different fitting functions are supported via the ``-f`` option.
Currently supported options include ``quadratic`` (:math:`a+bt+ct^2`), and
``cubic`` (:math:`a+bt+ct^2+dt^3`) and ``quartic`` (:math:`a+bt+ct^2+dt^3+et^4`).
An example of a cubic fit is given as below:

::

>qmc-fit u -e 50 -b 6 -u "0 1 2 3 4 5" -f cubic dmc_u_*/dmc.s001.scalar.dat

fit function : cubic
fitted formula: (-1230.1087 +/- 0.0045) + (-0.0608 +/- 0.0073)*t + (0.0047 +/- 0.0033)*t^2 + (0.00055 +/- 0.00041)*t^3
root 1 minimum_u : 3.85 +/- 0.11 eV
root 1 minimum_e : -1230.2415 +/- 0.0033 Ha
root 1 curvature : 0.0221 +/- 0.0034

.. _fig13:
.. figure:: /figs/qmcfit_hubbard_quadratic.png
:width: 400
:align: center

Quadratic Hubbard-U fits to DMC data for a 24-atom supercell of monolayer FeCl\ :sub:`2`\ obtained with ``qmc-fit``. DMC local energy minima are indicated by the red data point on the bottom halves of either panel.

.. _qdens:

Using the qdens tool to obtain electron densities
Expand Down
Binary file added docs/figs/qmcfit_hubbard_quadratic.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
229 changes: 190 additions & 39 deletions nexus/bin/qmc-fit
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

import os
import sys
from optparse import OptionParser
import argparse

try:
import numpy as np
Expand Down Expand Up @@ -82,6 +82,10 @@ except:
#end try


roots_n = lambda p, x: np.roots(np.polyder(p[::-1], x))
root_vals_n = lambda p, x: np.polyval(p[::-1], roots_n(p, x))
moment_n = lambda p, x: np.polyval(np.polyder(p[::-1], x), roots_n(p, x-1))


all_fit_functions = obj(
ts = obj(
Expand All @@ -104,6 +108,32 @@ all_fit_functions = obj(
params = [('intercept',lambda p: p[0])],
),
),
u = obj(
quadratic = obj(
nparam = 3,
function = lambda p,t: p[0]+p[1]*t+p[2]*t*t,
format = '{0} + {1}*t + {2}*t^2',
params = [('minimum_x',lambda p: roots_n(p, 1)),
('minimum_e',lambda p: root_vals_n(p,1)),
('curvature',lambda p: moment_n(p, 2))],
),
cubic = obj(
nparam = 4,
function = lambda p,t: p[0]+p[1]*t+p[2]*t*t+p[3]*t*t*t,
format = '{0} + {1}*t + {2}*t^2 + {3}*t^3',
params = [('minimum_x',lambda p: roots_n(p, 1)[moment_n(p,2) > 0]),
('minimum_e',lambda p: root_vals_n(p,1)[moment_n(p,2) > 0]),
('curvature',lambda p: moment_n(p, 2)[moment_n(p,2) > 0])],
),
quartic = obj(
nparam = 5,
function = lambda p,t: p[0]+p[1]*t+p[2]*t*t+p[3]*t*t*t+p[4]*t*t*t*t,
format = '{0} + {1}*t + {2}*t^2 + {3}*t^3 + {4}*t^4',
params = [('minimum_x',lambda p: roots_n(p, 1)[moment_n(p,2) > 0]),
('minimum_e',lambda p: root_vals_n(p,1)[moment_n(p,2) > 0]),
('curvature',lambda p: moment_n(p, 2)[moment_n(p,2) > 0])],
),
),
)

fit_functions = obj()
Expand Down Expand Up @@ -132,6 +162,10 @@ def qmcfit(q,E,fname='linear',minimizer=least_squares):
#end for

# setup initial guess parameters
if fname=='quartic':
pp = np.polyfit(q,E,4)
if fname=='cubic':
pp = np.polyfit(q,E,3)
if fname=='quadratic':
pp = np.polyfit(q,E,2)
else:
Expand Down Expand Up @@ -300,39 +334,9 @@ def parse_list(opt,name,dtype,len1=False):
#end def parse_list


def timestep_fit():
# read command line inputs
usage = '''usage: %prog ts [options] [scalar files]'''
parser = OptionParser(usage=usage,add_help_option=False,version='%prog {}.{}.{}'.format(*nexus_version))

parser.add_option('-f','--fit',dest='fit_function',
default='linear',
help='Fitting function, options are {0} (default=%default).'.format(sorted(fit_functions.keys()))
)
parser.add_option('-s', '--series_start',dest='series_start',
type="int", default=None,
help='Series number for first DMC run. Use to exclude prior VMC scalar files if they have been provided (default=%default).'
)
parser.add_option('-t','--timesteps',dest='timesteps',
default=None,
help='Timesteps corresponding to scalar files, excluding any prior to --series_start (default=%default).'
)
parser.add_option('-e','--equils',dest='equils',
default=None,
help='Equilibration lengths corresponding to scalar files, excluding any prior to --series_start. Can be a single value for all files. If not provided, equilibration periods will be estimated.'
)
parser.add_option('-b','--reblock_factors',dest='reblock_factors',
default=None,
help='Reblocking factors corresponding to scalar files, excluding any prior to --series_start. Can be a single value for all files. If not provided, reblocking factors will be estimated.'
)
parser.add_option('--noplot',dest='noplot',
action='store_true',default=False,
help='Do not show plots. (default=%default).'
)

options,args = parser.parse_args()
scalar_files = list(sorted(args[1:]))
opt = obj(**options.__dict__)
def timestep_fit(args):
opt = obj(**args.__dict__)
scalar_files = sorted(opt.scalar_files)

if len(scalar_files)==0:
log('\n'+parser.format_help().strip()+'\n')
Expand Down Expand Up @@ -400,14 +404,159 @@ def timestep_fit():
#end if
#end def timestep_fit

def hubbard_u_fit(args):
opt = obj(**args.__dict__)
scalar_files = sorted(opt.scalar_files)
if opt.fit_function not in fit_functions:
error('invalid fitting function: {0}\nvalid options are: {1}'.format(opt.fit_function,sorted(fit_functions.keys())))
#end if
# read in scalar energy data
Edata,Emean,Eerror,scalar_files = process_scalar_files(
scalar_files = scalar_files,
series_start = opt.series_start,
equils = opt.equils,
reblock_factors = opt.reblock_factors,
)
hubbard_u = True
if opt.hubbards is not None:
parse_list(opt,'hubbards',float)
if len(Edata)!=len(opt.hubbards):
error('must provide one hubbard_u value per scalar file\nnumber of hubbard_u provided: {0}\nnumber of scalar files provided: {1}\hubbards provided: {2}\nscalar files provided: {3}'.format(len(opt.hubbards),len(scalar_files),opt.hubbards,scalar_files))
#end if
x = opt.hubbards
elif opt.exx is not None:
hubbard_u = False
parse_list(opt,'exx',float)
if len(Edata)!=len(opt.exx):
error('must provide one hubbard_u value per scalar file\nnumber of hubbard_u provided: {0}\nnumber of scalar files provided: {1}\hubbards provided: {2}\nscalar files provided: {3}'.format(len(opt.hubbards),len(scalar_files),opt.hubbards,scalar_files))
#end if
x = opt.exx
else:
log("\n Please provide either EXX or Hubbard_U values")
#end if

parse_list(opt,'equils',int,len1=True)
parse_list(opt,'reblock_factors',int,len1=True)

# perform jackknife analysis of the fit

pf,pmean,perror,auxres = qmcfit(x,Edata,opt.fit_function)

func_info = fit_functions[opt.fit_function]
pvals = []
for n in range(len(pmean)):
pvals.append('({0} +/- {1})'.format(*stat_strings(pmean[n],perror[n])))
#end for

log('\nfit function : '+opt.fit_function)
log('fitted formula: '+func_info.format.format(*pvals))
for pname,pfunc in func_info.params:
for i in range(len(auxres[pname][0])):
pm,pe = stat_strings(*np.array(auxres[pname])[:,i])
unit = ""
if pname == "minimum_e":
param = pname
unit = "Ha"
elif pname == "minimum_x":
if hubbard_u:
param = "U at minimum"
unit = "eV"
else:
param = "EXX at minimum"
unit = ''
else:
param = pname
#end if
log('root {0} {1:<14}: {2} +/- {3} {4}'.format(i+1,param,pm,pe, unit))
#end for
# plot the fit (if available)

if plots_available and not opt.noplot:
lw = 2
ms = 10
ts = x
tsmin = ts.min()
tsmax = ts.max()
tsrange = tsmax-tsmin

U_min,U_err = auxres.minimum_x
E_min,E_err = auxres.minimum_e
real_U_in_range = np.isreal(U_min) & (U_min < tsmax) & (U_min > tsmin)
U_min = np.real(U_min[real_U_in_range])
U_err = np.real(U_err[real_U_in_range])
E_min = np.real(E_min[real_U_in_range])
tsfit = np.linspace(0,1.1*tsmax,400)
Efit = func_info.function(pmean,tsfit)
plt.figure()
plt.plot(tsfit,Efit,'k-',lw=lw)
plt.errorbar(ts,Emean,Eerror,fmt='b.',ms=ms)
plt.errorbar(U_min,E_min,xerr=U_err,fmt='r.',ms=ms)
plt.xlim([tsmin-0.1*tsrange,tsmax + 0.1*tsrange])
if hubbard_u:
plt.xlabel('Hubbard U (eV)')
else:
plt.xlabel('EXX ratio')
plt.ylabel('DMC Energy (Ha)')
plt.show()
#end if
#end def hubbard_u_fit

def parse_args():
"""This utility provides a fit to the one-dimensional parameter scans of QMC
observables. Currently, the functionality in place is to fit linear/quadratic polynomial
fits to the timestep VMC/DMC studies and single parameter optimization of trial wavefunctions
using DMC local energies and quadratic, cubic and quartic fits.
"""

parser = argparse.ArgumentParser(description=parse_args.__doc__,
formatter_class=argparse.ArgumentDefaultsHelpFormatter)

parser.add_argument('fit_type', choices=['ts', 'u'], default = 'ts',
help='One dimensional parameter used to fit QMC local energies. Options are ts for timestep and u for hubbard_u parameter fitting'
)
parser.add_argument('-f','--fit',dest='fit_function',default='linear',
help='Fitting function, options are {0}.'.format(sorted(fit_functions.keys()))
)
# List of 1-D parameters, mutually exclusive
parameters = parser.add_mutually_exclusive_group(required=True)
parameters.add_argument('-t',dest='timesteps',default=None,
help='Timesteps corresponding to scalar files, excluding any prior to --series_start'
)
parameters.add_argument('-u',dest='hubbards',default=None,
help='Hubbard U values (eV)'
)
parameters.add_argument('--exx',dest='exx',default=None,
help='EXX ratios'
)

parser.add_argument('-s', '--series_start',dest='series_start',type=int, default=None,
help='Series number for first DMC run. Use to exclude prior VMC scalar files if they have been provided'
)
parser.add_argument('-e','--equils',dest='equils',default=None,
help='Equilibration lengths corresponding to scalar files, excluding any prior to --series_start. Can be a single value for all files. If not provided, equilibration periods will be estimated.'
)
parser.add_argument('-b','--reblock_factors',dest='reblock_factors',default=None,
help='Reblocking factors corresponding to scalar files, excluding any prior to --series_start. Can be a single value for all files. If not provided, reblocking factors will be estimated.'
)
parser.add_argument('--noplot',dest='noplot',action='store_true',default=False,
help='Do not show plots.'
)
parser.add_argument('scalar_files', nargs='+',
help='Scalar files used in the fit. An explicit list of scalar files with space or a wildcard (e.g. dmc*/dmc.s001.scalar.dat) is acceptable.'
)

args = parser.parse_args()
return parser, args

if __name__=='__main__':
fit_types = sorted(all_fit_functions.keys())
if len(sys.argv)<2:
error('first argument must be type of fit\ne.g. for a timestep fit, type "qmc-fit ts ..."\nvalid fit types are: {0}'.format(fit_types))
parser, args = parse_args()
if len(args.scalar_files) == 0:
log('\n'+'Please provide scalar files'+'\n')
parser.print_help()
exit()
#end if
fit_type = sys.argv[1]
fit_type = args.fit_type
if fit_type in fit_types:
fit_functions.clear()
fit_functions.transfer_from(all_fit_functions[fit_type])
Expand All @@ -416,8 +565,10 @@ if __name__=='__main__':
#end if

if fit_type=='ts':
timestep_fit()
timestep_fit(args)
elif fit_type == 'u':
hubbard_u_fit(args)
else:
error('unsupported fit type: {0}'.format(fit_type))
#end if
#end if
#end if
2 changes: 1 addition & 1 deletion nexus/tests/unit/test_qmc_fit.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ def test_fit():
dmc_infile = os.path.join(dmc_path,'dmc.in.xml')
assert(os.path.exists(dmc_infile))

command = "{} ts --noplot -e 10 -s 1 -t '0.02 0.01 0.005' {}/*scalar*".format(exe,dmc_path)
command = "{} ts --noplot -e 10 -s 1 -t '0.02 0.01 0.005' -f linear {}/*scalar*".format(exe,dmc_path)

out,err,rc = execute(command)

Expand Down

0 comments on commit 56e2f6c

Please sign in to comment.