diff --git a/docs/analyzing.rst b/docs/analyzing.rst index 91b5069dc9..074644fd3a 100644 --- a/docs/analyzing.rst +++ b/docs/analyzing.rst @@ -1057,15 +1057,13 @@ Production quality checklist .. _qmcfit: -Using the qmc-fit tool for statistical time step extrapolation and curve fitting --------------------------------------------------------------------------------- +Using the qmc-fit tool for statistical time step extrapolation, trial wavefunction optimization and curve fitting +----------------------------------------------------------------------------------------------------------------- -The ``qmc-fit`` tool is used to provide statistical estimates of -curve-fitting parameters based on QMCPACK data. Although ``qmc-fit`` -will eventually support many types of fitted curves (e.g., Morse -potential binding curves and various equation-of-state fitting curves), -it is currently limited to estimating fitting parameters related to time -step extrapolation. +The ``qmc-fit`` tool is used to provide statistical estimates of curve-fitting parameters based on QMCPACK data. +``qmc-fit`` is currently limited to estimating fitting parameters related to time step extrapolation and trial wavefunction +optimization (optimal U for DFT+U, EXX fractions), it will eventually support many types of fitted curves (e.g., Morse +potential binding curves and various equation-of-state fitting curves). The jackknife statistical technique ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -1226,6 +1224,63 @@ estimated value of :math:`-3848.28(7)` instead. Linear (top) and quadratic (bottom) time step fits to DMC data for a 32-atom supercell of MnO obtained with ``qmc-fit``. Zero time step estimates are indicated by the red data point on the left side of either panel. +Performing trial wavefunction optimization fitting, e.g., to find optimal DFT+U +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In this section, we use a 24-atom supercell of monolayer FeCl\ :sub:`2`\ as an example system for wavefunction optimization +fitting. Using single determinant DFT wavefunctions, a practical method to perform wavefunction optimization is done through +scanning the Hubbard-U parameter in a DFT+U calculation used to generate the trial wavefunction. Similarly, one can also scan +different exact exchange ratio parameters in hybrid-DFT calculations. Here, we will show an example of this fitting for the +Hubbard-U parameter, but the same procedure can be applied to any single-parameter scans of trial wavefunctions. Data for this +system has been collected in DMC using the following sequence of Hubbard-U values on Fe-d orbitals: :math:`0 1 2 3 4 5` eV. Some +non-zero U value often minimizes the DMC energy, but optimized U values have limited transferability across different systems. +Similar to the procedure for performing timestep statistical fitting, the quality of the input statistics must be checked using +``qmca`` utility to determine the reblocking factor and equilibration periods. Assuming that an equilibration period of initial 50 +steps, ``-e 50``, and a reblocking period of 4, ``-b 6``, is sufficient to remove correlations in the statistical local energies, +the ``qmc-fit`` tool can be used in the following way to obtain a quadratic fit of the data: + +:: + + >qmc-fit u -e 50 -b 6 -u "0 1 2 3 4 5" -f quadratic dmc_u_*/dmc.s001.scalar.dat + fit function : quadratic + fitted formula: (-1230.1071 +/- 0.0045) + (-0.0683 +/- 0.0040)*t + (0.00883 +/- 0.00077)*t^2 + root 1 minimum_u : 3.87 +/- 0.14 eV + root 1 minimum_e : -1230.2391 +/- 0.0026 Ha + root 1 curvature : 0.0177 +/- 0.0015 + +Here, ``qmc-fit u`` indicates we are performing a Hubbard-U/exact-exchange ratio fit, +``-u`` option provides a list of Hubbard-U values :math:`0 1 2 3 4 5` corresponding to the auto-sorted +dmc scalar files with wildcard ``dmc_u_*/dmc.s001.scalar.dat``. Here, ``qmc-fit`` command is invoked at a +directory where folders such as ``dmc_u_0_2x2x1, dmc_u_1_2x2x1, dmc_u_2_2x2x1`` reside. +Here, the text output provides the U value (``minimum_u``) and local energies (``minimum_e``) +at the minima of the polynomial which falls within the range of Hubbard-U values provided in the command +line, e.g. from 0 to 5. Therefore, a U value of :math:`3.8(1)` eV minimizes the DMC energy of the system. +The ``curvature`` is printed for informative purposes only, but a curvature with small error bar could +indicate a higher quality polynomial fit. Similar to the timestep fit, a plot of the fit will also +produced as default where the minima of the polynomial is shown as a red dot as in :numref:`fig13`. + +Different fitting functions are supported via the ``-f`` option. +Currently supported options include ``quadratic`` (:math:`a+bt+ct^2`), and +``cubic`` (:math:`a+bt+ct^2+dt^3`) and ``quartic`` (:math:`a+bt+ct^2+dt^3+et^4`). +An example of a cubic fit is given as below: + +:: + + >qmc-fit u -e 50 -b 6 -u "0 1 2 3 4 5" -f cubic dmc_u_*/dmc.s001.scalar.dat + + fit function : cubic + fitted formula: (-1230.1087 +/- 0.0045) + (-0.0608 +/- 0.0073)*t + (0.0047 +/- 0.0033)*t^2 + (0.00055 +/- 0.00041)*t^3 + root 1 minimum_u : 3.85 +/- 0.11 eV + root 1 minimum_e : -1230.2415 +/- 0.0033 Ha + root 1 curvature : 0.0221 +/- 0.0034 + +.. _fig13: +.. figure:: /figs/qmcfit_hubbard_quadratic.png + :width: 400 + :align: center + + Quadratic Hubbard-U fits to DMC data for a 24-atom supercell of monolayer FeCl\ :sub:`2`\ obtained with ``qmc-fit``. DMC local energy minima are indicated by the red data point on the bottom halves of either panel. + .. _qdens: Using the qdens tool to obtain electron densities diff --git a/docs/figs/qmcfit_hubbard_quadratic.png b/docs/figs/qmcfit_hubbard_quadratic.png new file mode 100644 index 0000000000..dcdc859399 Binary files /dev/null and b/docs/figs/qmcfit_hubbard_quadratic.png differ diff --git a/nexus/bin/qmc-fit b/nexus/bin/qmc-fit index 0213f0e251..b89be78943 100755 --- a/nexus/bin/qmc-fit +++ b/nexus/bin/qmc-fit @@ -2,7 +2,7 @@ import os import sys -from optparse import OptionParser +import argparse try: import numpy as np @@ -82,6 +82,10 @@ except: #end try +roots_n = lambda p, x: np.roots(np.polyder(p[::-1], x)) +root_vals_n = lambda p, x: np.polyval(p[::-1], roots_n(p, x)) +moment_n = lambda p, x: np.polyval(np.polyder(p[::-1], x), roots_n(p, x-1)) + all_fit_functions = obj( ts = obj( @@ -104,6 +108,32 @@ all_fit_functions = obj( params = [('intercept',lambda p: p[0])], ), ), + u = obj( + quadratic = obj( + nparam = 3, + function = lambda p,t: p[0]+p[1]*t+p[2]*t*t, + format = '{0} + {1}*t + {2}*t^2', + params = [('minimum_x',lambda p: roots_n(p, 1)), + ('minimum_e',lambda p: root_vals_n(p,1)), + ('curvature',lambda p: moment_n(p, 2))], + ), + cubic = obj( + nparam = 4, + function = lambda p,t: p[0]+p[1]*t+p[2]*t*t+p[3]*t*t*t, + format = '{0} + {1}*t + {2}*t^2 + {3}*t^3', + params = [('minimum_x',lambda p: roots_n(p, 1)[moment_n(p,2) > 0]), + ('minimum_e',lambda p: root_vals_n(p,1)[moment_n(p,2) > 0]), + ('curvature',lambda p: moment_n(p, 2)[moment_n(p,2) > 0])], + ), + quartic = obj( + nparam = 5, + function = lambda p,t: p[0]+p[1]*t+p[2]*t*t+p[3]*t*t*t+p[4]*t*t*t*t, + format = '{0} + {1}*t + {2}*t^2 + {3}*t^3 + {4}*t^4', + params = [('minimum_x',lambda p: roots_n(p, 1)[moment_n(p,2) > 0]), + ('minimum_e',lambda p: root_vals_n(p,1)[moment_n(p,2) > 0]), + ('curvature',lambda p: moment_n(p, 2)[moment_n(p,2) > 0])], + ), + ), ) fit_functions = obj() @@ -132,6 +162,10 @@ def qmcfit(q,E,fname='linear',minimizer=least_squares): #end for # setup initial guess parameters + if fname=='quartic': + pp = np.polyfit(q,E,4) + if fname=='cubic': + pp = np.polyfit(q,E,3) if fname=='quadratic': pp = np.polyfit(q,E,2) else: @@ -300,39 +334,9 @@ def parse_list(opt,name,dtype,len1=False): #end def parse_list -def timestep_fit(): - # read command line inputs - usage = '''usage: %prog ts [options] [scalar files]''' - parser = OptionParser(usage=usage,add_help_option=False,version='%prog {}.{}.{}'.format(*nexus_version)) - - parser.add_option('-f','--fit',dest='fit_function', - default='linear', - help='Fitting function, options are {0} (default=%default).'.format(sorted(fit_functions.keys())) - ) - parser.add_option('-s', '--series_start',dest='series_start', - type="int", default=None, - help='Series number for first DMC run. Use to exclude prior VMC scalar files if they have been provided (default=%default).' - ) - parser.add_option('-t','--timesteps',dest='timesteps', - default=None, - help='Timesteps corresponding to scalar files, excluding any prior to --series_start (default=%default).' - ) - parser.add_option('-e','--equils',dest='equils', - default=None, - help='Equilibration lengths corresponding to scalar files, excluding any prior to --series_start. Can be a single value for all files. If not provided, equilibration periods will be estimated.' - ) - parser.add_option('-b','--reblock_factors',dest='reblock_factors', - default=None, - help='Reblocking factors corresponding to scalar files, excluding any prior to --series_start. Can be a single value for all files. If not provided, reblocking factors will be estimated.' - ) - parser.add_option('--noplot',dest='noplot', - action='store_true',default=False, - help='Do not show plots. (default=%default).' - ) - - options,args = parser.parse_args() - scalar_files = list(sorted(args[1:])) - opt = obj(**options.__dict__) +def timestep_fit(args): + opt = obj(**args.__dict__) + scalar_files = sorted(opt.scalar_files) if len(scalar_files)==0: log('\n'+parser.format_help().strip()+'\n') @@ -400,14 +404,159 @@ def timestep_fit(): #end if #end def timestep_fit +def hubbard_u_fit(args): + opt = obj(**args.__dict__) + scalar_files = sorted(opt.scalar_files) + if opt.fit_function not in fit_functions: + error('invalid fitting function: {0}\nvalid options are: {1}'.format(opt.fit_function,sorted(fit_functions.keys()))) + #end if + # read in scalar energy data + Edata,Emean,Eerror,scalar_files = process_scalar_files( + scalar_files = scalar_files, + series_start = opt.series_start, + equils = opt.equils, + reblock_factors = opt.reblock_factors, + ) + hubbard_u = True + if opt.hubbards is not None: + parse_list(opt,'hubbards',float) + if len(Edata)!=len(opt.hubbards): + error('must provide one hubbard_u value per scalar file\nnumber of hubbard_u provided: {0}\nnumber of scalar files provided: {1}\hubbards provided: {2}\nscalar files provided: {3}'.format(len(opt.hubbards),len(scalar_files),opt.hubbards,scalar_files)) + #end if + x = opt.hubbards + elif opt.exx is not None: + hubbard_u = False + parse_list(opt,'exx',float) + if len(Edata)!=len(opt.exx): + error('must provide one hubbard_u value per scalar file\nnumber of hubbard_u provided: {0}\nnumber of scalar files provided: {1}\hubbards provided: {2}\nscalar files provided: {3}'.format(len(opt.hubbards),len(scalar_files),opt.hubbards,scalar_files)) + #end if + x = opt.exx + else: + log("\n Please provide either EXX or Hubbard_U values") + #end if + + parse_list(opt,'equils',int,len1=True) + parse_list(opt,'reblock_factors',int,len1=True) + + # perform jackknife analysis of the fit + + pf,pmean,perror,auxres = qmcfit(x,Edata,opt.fit_function) + + func_info = fit_functions[opt.fit_function] + pvals = [] + for n in range(len(pmean)): + pvals.append('({0} +/- {1})'.format(*stat_strings(pmean[n],perror[n]))) + #end for + log('\nfit function : '+opt.fit_function) + log('fitted formula: '+func_info.format.format(*pvals)) + for pname,pfunc in func_info.params: + for i in range(len(auxres[pname][0])): + pm,pe = stat_strings(*np.array(auxres[pname])[:,i]) + unit = "" + if pname == "minimum_e": + param = pname + unit = "Ha" + elif pname == "minimum_x": + if hubbard_u: + param = "U at minimum" + unit = "eV" + else: + param = "EXX at minimum" + unit = '' + else: + param = pname + #end if + log('root {0} {1:<14}: {2} +/- {3} {4}'.format(i+1,param,pm,pe, unit)) + #end for + # plot the fit (if available) + + if plots_available and not opt.noplot: + lw = 2 + ms = 10 + ts = x + tsmin = ts.min() + tsmax = ts.max() + tsrange = tsmax-tsmin + + U_min,U_err = auxres.minimum_x + E_min,E_err = auxres.minimum_e + real_U_in_range = np.isreal(U_min) & (U_min < tsmax) & (U_min > tsmin) + U_min = np.real(U_min[real_U_in_range]) + U_err = np.real(U_err[real_U_in_range]) + E_min = np.real(E_min[real_U_in_range]) + tsfit = np.linspace(0,1.1*tsmax,400) + Efit = func_info.function(pmean,tsfit) + plt.figure() + plt.plot(tsfit,Efit,'k-',lw=lw) + plt.errorbar(ts,Emean,Eerror,fmt='b.',ms=ms) + plt.errorbar(U_min,E_min,xerr=U_err,fmt='r.',ms=ms) + plt.xlim([tsmin-0.1*tsrange,tsmax + 0.1*tsrange]) + if hubbard_u: + plt.xlabel('Hubbard U (eV)') + else: + plt.xlabel('EXX ratio') + plt.ylabel('DMC Energy (Ha)') + plt.show() + #end if +#end def hubbard_u_fit + +def parse_args(): + """This utility provides a fit to the one-dimensional parameter scans of QMC + observables. Currently, the functionality in place is to fit linear/quadratic polynomial + fits to the timestep VMC/DMC studies and single parameter optimization of trial wavefunctions + using DMC local energies and quadratic, cubic and quartic fits. + """ + + parser = argparse.ArgumentParser(description=parse_args.__doc__, + formatter_class=argparse.ArgumentDefaultsHelpFormatter) + + parser.add_argument('fit_type', choices=['ts', 'u'], default = 'ts', + help='One dimensional parameter used to fit QMC local energies. Options are ts for timestep and u for hubbard_u parameter fitting' + ) + parser.add_argument('-f','--fit',dest='fit_function',default='linear', + help='Fitting function, options are {0}.'.format(sorted(fit_functions.keys())) + ) + # List of 1-D parameters, mutually exclusive + parameters = parser.add_mutually_exclusive_group(required=True) + parameters.add_argument('-t',dest='timesteps',default=None, + help='Timesteps corresponding to scalar files, excluding any prior to --series_start' + ) + parameters.add_argument('-u',dest='hubbards',default=None, + help='Hubbard U values (eV)' + ) + parameters.add_argument('--exx',dest='exx',default=None, + help='EXX ratios' + ) + + parser.add_argument('-s', '--series_start',dest='series_start',type=int, default=None, + help='Series number for first DMC run. Use to exclude prior VMC scalar files if they have been provided' + ) + parser.add_argument('-e','--equils',dest='equils',default=None, + help='Equilibration lengths corresponding to scalar files, excluding any prior to --series_start. Can be a single value for all files. If not provided, equilibration periods will be estimated.' + ) + parser.add_argument('-b','--reblock_factors',dest='reblock_factors',default=None, + help='Reblocking factors corresponding to scalar files, excluding any prior to --series_start. Can be a single value for all files. If not provided, reblocking factors will be estimated.' + ) + parser.add_argument('--noplot',dest='noplot',action='store_true',default=False, + help='Do not show plots.' + ) + parser.add_argument('scalar_files', nargs='+', + help='Scalar files used in the fit. An explicit list of scalar files with space or a wildcard (e.g. dmc*/dmc.s001.scalar.dat) is acceptable.' + ) + + args = parser.parse_args() + return parser, args if __name__=='__main__': fit_types = sorted(all_fit_functions.keys()) - if len(sys.argv)<2: - error('first argument must be type of fit\ne.g. for a timestep fit, type "qmc-fit ts ..."\nvalid fit types are: {0}'.format(fit_types)) + parser, args = parse_args() + if len(args.scalar_files) == 0: + log('\n'+'Please provide scalar files'+'\n') + parser.print_help() + exit() #end if - fit_type = sys.argv[1] + fit_type = args.fit_type if fit_type in fit_types: fit_functions.clear() fit_functions.transfer_from(all_fit_functions[fit_type]) @@ -416,8 +565,10 @@ if __name__=='__main__': #end if if fit_type=='ts': - timestep_fit() + timestep_fit(args) + elif fit_type == 'u': + hubbard_u_fit(args) else: error('unsupported fit type: {0}'.format(fit_type)) #end if -#end if +#end if \ No newline at end of file diff --git a/nexus/tests/unit/test_qmc_fit.py b/nexus/tests/unit/test_qmc_fit.py index 89d699975a..afda8191d6 100644 --- a/nexus/tests/unit/test_qmc_fit.py +++ b/nexus/tests/unit/test_qmc_fit.py @@ -21,7 +21,7 @@ def test_fit(): dmc_infile = os.path.join(dmc_path,'dmc.in.xml') assert(os.path.exists(dmc_infile)) - command = "{} ts --noplot -e 10 -s 1 -t '0.02 0.01 0.005' {}/*scalar*".format(exe,dmc_path) + command = "{} ts --noplot -e 10 -s 1 -t '0.02 0.01 0.005' -f linear {}/*scalar*".format(exe,dmc_path) out,err,rc = execute(command)