QMCPACK · ye-luo · Mar 2, 2023 · Feb 21, 2023 · Feb 21, 2023 · Feb 21, 2023
diff --git a/docs/analyzing.rst b/docs/analyzing.rst
@@ -1057,15 +1057,13 @@ Production quality checklist
 
 .. _qmcfit:
 
-Using the qmc-fit tool for statistical time step extrapolation and curve fitting
---------------------------------------------------------------------------------
+Using the qmc-fit tool for statistical time step extrapolation, trial wavefunction optimization and curve fitting
+-----------------------------------------------------------------------------------------------------------------
 
-The ``qmc-fit`` tool is used to provide statistical estimates of
-curve-fitting parameters based on QMCPACK data. Although ``qmc-fit``
-will eventually support many types of fitted curves (e.g., Morse
-potential binding curves and various equation-of-state fitting curves),
-it is currently limited to estimating fitting parameters related to time
-step extrapolation.
+The ``qmc-fit`` tool is used to provide statistical estimates of curve-fitting parameters based on QMCPACK data. 
+``qmc-fit`` is currently limited to estimating fitting parameters related to time step extrapolation and trial wavefunction
+optimization (optimal U for DFT+U, EXX fractions), it will eventually support many types of fitted curves (e.g., Morse
+potential binding curves and various equation-of-state fitting curves).
 
 The jackknife statistical technique
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -1226,6 +1224,63 @@ estimated value of :math:`-3848.28(7)` instead.
 
   Linear (top) and quadratic (bottom) time step fits to DMC data for a 32-atom supercell of MnO obtained with ``qmc-fit``.  Zero time step estimates are indicated by the red data point on the left side of either panel.
 
+Performing trial wavefunction optimization fitting, e.g., to find optimal DFT+U
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In this section, we use a 24-atom supercell of monolayer FeCl:math:`_{2}` as an example system for wavefunction optimization
+fitting. Using single determinant DFT wavefunctions, a practical method to perform wavefunction optimization is done through
+scanning the Hubbard-U parameter in a DFT+U calculation used to generate the trial wavefunction. Similarly, one can also scan
+different exact exchange ratio parameters in hybrid-DFT calculations. Here, we will show an example of this fitting for the
+Hubbard-U parameter, but the same procedure can be applied to any single-parameter scans of trial wavefunctions. Data for this
+system has been collected in DMC using the following sequence of Hubbard-U values on Fe-d orbitals: :math:`0 1 2 3 4 5` eV. Some
+non-zero U value often minimizes the DMC energy, but optimized U values have limited transferability across different systems.
+Similar to the procedure for performing timestep statistical fitting, the quality of the input statistics must be checked using
+``qmca`` utility to determine the reblocking factor and equilibration periods. Assuming that an equilibration period of initial 50
+steps, ``-e 50``, and a reblocking period of 4, ``-b 6``, is sufficient to remove correlations in the statistical local energies,
+the ``qmc-fit`` tool can be used in the following way to obtain a quadratic fit of the data:
+
+::
+
+  >qmc-fit u -e 50 -b 6 -u "0 1 2 3 4 5" -f quadratic dmc_u_*/dmc.s001.scalar.dat
+  fit function  : quadratic
+  fitted formula: (-1230.1071 +/- 0.0045) + (-0.0683 +/- 0.0040)*t + (0.00883 +/- 0.00077)*t^2
+  root 1 minimum_u     : 3.87 +/- 0.14 eV
+  root 1 minimum_e     : -1230.2391 +/- 0.0026 Ha
+  root 1 curvature     : 0.0177 +/- 0.0015
+
+Here, ``qmc-fit u`` indicates we are performing a Hubbard-U/exact-exchange ratio fit, 
+``-u`` option provides a list of Hubbard-U values :math:`0 1 2 3 4 5` corresponding to the auto-sorted
+dmc scalar files with wildcard ``dmc_u_*/dmc.s001.scalar.dat``. Here, ``qmc-fit`` command is invoked at a 
+directory where folders such as ``dmc_u_0_2x2x1, dmc_u_1_2x2x1, dmc_u_2_2x2x1`` reside. 
+Here, the text output provides the U value (``minimum_u``) and local energies (``minimum_e``) 
+at the minima of the polynomial which falls within the range of Hubbard-U values provided in the command 
+line, e.g. from 0 to 5. Therefore, a U value of :math:`3.8(1)` eV minimizes the DMC energy of the system. 
+The ``curvature`` is printed for informative purposes only, but a curvature with small error bar could 
+indicate a higher quality polynomial fit. Similar to the timestep fit, a plot of the fit will also 
+produced as default where the minima of the polynomial is shown as a red dot as in :numref:`fig13`.
+
+Different fitting functions are supported via the ``-f`` option. 
+Currently supported options include ``quadratic`` (:math:`a+bt+ct^2`), and 
+``cubic`` (:math:`a+bt+ct^2+dt^3`) and ``quartic`` (:math:`a+bt+ct^2+dt^3+et^4`). 
+An example of a cubic fit is given as below:
+
+::
+
+  >qmc-fit u -e 50 -b 6 -u "0 1 2 3 4 5" -f cubic dmc_u_*/dmc.s001.scalar.dat
+
+  fit function  : cubic
+  fitted formula: (-1230.1087 +/- 0.0045) + (-0.0608 +/- 0.0073)*t + (0.0047 +/- 0.0033)*t^2 + (0.00055 +/- 0.00041)*t^3
+  root 1 minimum_u     : 3.85 +/- 0.11 eV
+  root 1 minimum_e     : -1230.2415 +/- 0.0033 Ha
+  root 1 curvature     : 0.0221 +/- 0.0034
+
+.. _fig13:
+.. figure:: /figs/qmcfit_hubbard_quadratic.png
+  :width: 400
+  :align: center
+
+  Quadratic Hubbard-U fits to DMC data for a 24-atom supercell of monolayer FeCl:math:`_{2}` obtained with ``qmc-fit``.  DMC local energy minima are indicated by the red data point on the bottom halves of either panel.
+
 .. _qdens:
 
 Using the qdens tool to obtain electron densities

diff --git a/docs/figs/qmcfit_hubbard_quadratic.png b/docs/figs/qmcfit_hubbard_quadratic.png
diff --git a/nexus/bin/qmc-fit b/nexus/bin/qmc-fit
@@ -82,6 +82,10 @@ except:
 #end try
 
 
+roots_n     = lambda p, x: np.roots(np.polyder(p[::-1], x))
+root_vals_n = lambda p, x: np.polyval(p[::-1], roots_n(p, x))
+moment_n    = lambda p, x: np.polyval(np.polyder(p[::-1], x), roots_n(p, x-1))
+
 
 all_fit_functions = obj(
     ts = obj(
@@ -104,6 +108,32 @@ all_fit_functions = obj(
             params   = [('intercept',lambda p: p[0])],
             ),
         ),
+    u = obj(
+        quadratic = obj(
+            nparam   = 3,
+            function = lambda p,t: p[0]+p[1]*t+p[2]*t*t,
+            format   = '{0} + {1}*t + {2}*t^2',
+            params   = [('minimum_u',lambda p: roots_n(p, 1)),
+                        ('minimum_e',lambda p: root_vals_n(p,1)),
+                        ('curvature',lambda p: moment_n(p, 2))],
+            ),
+        cubic = obj(
+            nparam   = 4,
+            function = lambda p,t: p[0]+p[1]*t+p[2]*t*t+p[3]*t*t*t,
+            format   = '{0} + {1}*t + {2}*t^2 + {3}*t^3',
+            params   = [('minimum_u',lambda p: roots_n(p, 1)[moment_n(p,2) > 0]),
+                        ('minimum_e',lambda p: root_vals_n(p,1)[moment_n(p,2) > 0]),
+                        ('curvature',lambda p: moment_n(p, 2)[moment_n(p,2) > 0])],
+            ),            
+        quartic = obj(
+            nparam   = 5,
+            function = lambda p,t: p[0]+p[1]*t+p[2]*t*t+p[3]*t*t*t+p[4]*t*t*t*t,
+            format   = '{0} + {1}*t + {2}*t^2 + {3}*t^3 + {4}*t^4',
+            params   = [('minimum_u',lambda p: roots_n(p, 1)[moment_n(p,2) > 0]),
+                        ('minimum_e',lambda p: root_vals_n(p,1)[moment_n(p,2) > 0]),
+                        ('curvature',lambda p: moment_n(p, 2)[moment_n(p,2) > 0])],
+            ),            
+        ),        
     )
 
 fit_functions = obj()
@@ -132,6 +162,10 @@ def qmcfit(q,E,fname='linear',minimizer=least_squares):
     #end for
 
     # setup initial guess parameters
+    if fname=='quartic':
+        pp = np.polyfit(q,E,4)
+    if fname=='cubic':
+        pp = np.polyfit(q,E,3)
     if fname=='quadratic':
         pp = np.polyfit(q,E,2)
     else:
@@ -400,7 +434,117 @@ def timestep_fit():
     #end if
 #end def timestep_fit
 
+def hubbard_u_fit():
+       # read command line inputs
+    usage = '''usage: %prog u [options] [scalar files]'''
+    parser = OptionParser(usage=usage,add_help_option=False,version='%prog {}.{}.{}'.format(*nexus_version))
+
+    parser.add_option('-f','--fit',dest='fit_function',
+                          default='quadratic',
+                          help='Fitting function, options are {0} (default=%default).'.format(sorted(fit_functions.keys()))
+                      )
+    parser.add_option('-s', '--series_start',dest='series_start',
+                      type="int", default=None,
+                      help='Series number for first DMC run.  Use to exclude prior VMC scalar files if they have been provided (default=%default).'
+                      )
+    parser.add_option('-u','--hubbard_u',dest='hubbards',
+                          default=None,
+                          help='Hubbard_u values corresponding to scalar files, excluding any prior to --series_start (default=%default).'
+                      )
+    parser.add_option('-e','--equils',dest='equils',
+                          default=None,
+                          help='Equilibration lengths corresponding to scalar files, excluding any prior to --series_start.  Can be a single value for all files.  If not provided, equilibration periods will be estimated.'
+                      )
+    parser.add_option('-b','--reblock_factors',dest='reblock_factors',
+                          default=None,
+                          help='Reblocking factors corresponding to scalar files, excluding any prior to --series_start.  Can be a single value for all files.  If not provided, reblocking factors will be estimated.'
+                      )
+    parser.add_option('--noplot',dest='noplot',
+                      action='store_true',default=False,
+                      help='Do not show plots. (default=%default).'
+                      )
+
+    options,args = parser.parse_args()
+    scalar_files = list(sorted(args[1:]))
+    opt = obj(**options.__dict__)
+    if len(scalar_files)==0:
+        log('\n'+parser.format_help().strip()+'\n')
+        exit()
+    #end if
+
+    if opt.fit_function not in fit_functions:
+        error('invalid fitting function: {0}\nvalid options are: {1}'.format(opt.fit_function,sorted(fit_functions.keys())))
+    #end if
+    if opt.hubbards is None:
+        opt.hubbards = ''
+    #end if
+    parse_list(opt,'hubbards',float)
+    parse_list(opt,'equils',int,len1=True)
+    parse_list(opt,'reblock_factors',int,len1=True)
+
+    # read in scalar energy data
+    Edata,Emean,Eerror,scalar_files = process_scalar_files(
+        scalar_files    = scalar_files,
+        series_start    = opt.series_start,
+        equils          = opt.equils,
+        reblock_factors = opt.reblock_factors,
+        )
+
+    if len(Edata)!=len(opt.hubbards):
+        error('must provide one hubbard_u value per scalar file\nnumber of hubbard_u provided: {0}\nnumber of scalar files provided: {1}\hubbards provided: {2}\nscalar files provided: {3}'.format(len(opt.hubbards),len(scalar_files),opt.hubbards,scalar_files))
+    #end if
+    # perform jackknife analysis of the fit
+
+    pf,pmean,perror,auxres = qmcfit(opt.hubbards,Edata,opt.fit_function) 
+
+    func_info = fit_functions[opt.fit_function]
+    pvals = []
+    for n in range(len(pmean)):
+        pvals.append('({0} +/- {1})'.format(*stat_strings(pmean[n],perror[n])))
+    #end for
+
+    log('\nfit function  : '+opt.fit_function)
+    log('fitted formula: '+func_info.format.format(*pvals))
+    num_roots = len(auxres[pname][0])
+    for pname,pfunc in func_info.params:
+        for i in range(num_roots):
+            pm,pe = stat_strings(*np.array(auxres[pname])[:,i])
+            unit = ""
+            if pname == "minimum_e":
+                unit = "Ha"
+            elif pname == "minimum_u":
+                unit = "eV"
+            #end if
+            log('root {0} {1:<14}: {2} +/- {3} {4}'.format(i+1,pname,pm,pe, unit))
+    #end for
+    # plot the fit (if available)
 
+    if plots_available and not opt.noplot:
+        lw = 2
+        ms = 10
+        ts = opt.hubbards
+        tsmin = ts.min()
+        tsmax = ts.max()
+        tsrange = tsmax-tsmin
+
+        U_min,U_err = auxres.minimum_u
+        E_min,E_err = auxres.minimum_e
+        real_U_in_range = np.isreal(U_min) & (U_min < tsmax) &  (U_min > tsmin)
+        U_min = np.real(U_min[real_U_in_range])
+        U_err = np.real(U_err[real_U_in_range])
+        E_min = np.real(E_min[real_U_in_range])
+        tsfit = np.linspace(0,1.1*tsmax,400)
+        Efit  = func_info.function(pmean,tsfit)
+        plt.figure()
+        plt.plot(tsfit,Efit,'k-',lw=lw)
+        plt.errorbar(ts,Emean,Eerror,fmt='b.',ms=ms)
+        plt.errorbar(U_min,E_min,xerr=U_err,fmt='r.',ms=ms)
+        plt.xlim([tsmin-0.1*tsrange,tsmax + 0.1*tsrange])
+        plt.xlabel('Hubbard U (eV)')
+        plt.ylabel('DMC Energy (Ha)')
+        plt.show()
+    #end if    
+#end def hubbard_u_fit 
 
 if __name__=='__main__':
     fit_types = sorted(all_fit_functions.keys())
@@ -417,7 +561,9 @@ if __name__=='__main__':
 
     if fit_type=='ts':
         timestep_fit()
+    elif fit_type == 'u':
+        hubbard_u_fit()
     else:
         error('unsupported fit type: {0}'.format(fit_type))
     #end if
-#end if
+#end if