Modeltweaks (#34)

* save calcinfo_nmbe * adjust genopt schema * version schema * details on max_nbody * changelog * semi-arb fields in manybodyproperties * opt tests * opt markers * final opt marker fix * add non-opaque labeler/delabeler * centralize modelchem labeling * add changelog * touch ups and test renaming * fixups * another check
MolSSI · Jul 21, 2024 · 8e689e6 · 8e689e6
1 parent ddefb75
commit 8e689e6
Show file tree

Hide file tree

Showing 25 changed files with 924 additions and 295 deletions.
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -201,7 +201,7 @@ jobs:
     - name: Build Documentation
       run: |
         python -m pip install . --no-deps
-        PYTHONPATH=docs/extensions mkdocs build
+        QCMANYBODY_MAX_NBODY=5 PYTHONPATH=docs/extensions mkdocs build
         cd docs
 
     - name: GitHub Pages Deploy

diff --git a/README.md b/README.md
@@ -13,7 +13,7 @@ QCManyBody
 [![Conda (channel only)](https://img.shields.io/conda/vn/conda-forge/qcmanybody?color=blue&logo=anaconda&logoColor=white)](https://anaconda.org/conda-forge/qcmanybody)
 ![python](https://img.shields.io/badge/python-3.8+-blue.svg)
 
-QCManyBody is a python package for running quantum chemistry manybody expansions and interaction calculations in a
+QCManyBody is a python package for running quantum chemistry many-body expansions and interaction calculations in a
 package-independent way.
 
 ## Installation

diff --git a/docs/changelog.md b/docs/changelog.md
@@ -19,7 +19,7 @@
 -->
 
 
-## v0.3.0 / 2024-MM-DD (Unreleased)
+## v0.3.0 / 2024-07-21
 
 #### Breaking Changes
 
@@ -38,10 +38,13 @@
 
 #### New Features
 
- * [\#32](https://github.com/MolSSI/QCManyBody/pull/32) QCSchema -- a new function
+ * [\#32](https://github.com/MolSSI/QCManyBody/pull/32) Schema -- a new function
    `ManyBodyResultProperties.to_qcvariables()` returns a translation map to QCVariables keys. @loriab
- * [\#32](https://github.com/MolSSI/QCManyBody/pull/32) QCSchema -- a new function
+ * [\#32](https://github.com/MolSSI/QCManyBody/pull/32) Schema -- a new function
    `qcmanybody.utils.translate_qcvariables(map)` switches between QCVariable and QCSchema keys. @loriab
+ * [\#33](https://github.com/MolSSI/QCManyBody/pull/33) Schema -- `ManyBodySpecification.extras` added. @loriab
+ * [\#34](https://github.com/MolSSI/QCManyBody/pull/34) Schema -- Add schema_version to `AtomicSpecification`,
+   `ManyBodySpecification`, `ManyBodyKeywords`, `ManyBodyInput`, and `ManyBodyResultProperties`. @loriab
 
 #### Enhancements
 
@@ -54,14 +57,23 @@
    is a single large fragment, an error is thrown. @loriab
  * [\#30](https://github.com/MolSSI/QCManyBody/pull/30) Docs -- add end-to-end demos in test_examples. @loriab
  * [\#31](https://github.com/MolSSI/QCManyBody/pull/31) Schema -- add "none" as a bsse_type alias to "nocp". @loriab
-
-#### Bug Fixes
-
-#### Misc.
-
-#### MUST (Unmerged)
-
-#### WIP (Unmerged)
+ * [\#34](https://github.com/MolSSI/QCManyBody/pull/34) Schema -- Allow environment variable QCMANYBODY_MAX_NBODY to
+   influence the body-level to which `ManyBodyResultProperties` is defined added. @loriab
+ * [\#34](https://github.com/MolSSI/QCManyBody/pull/34) Schema -- added discriminator to input for
+   `GeneralizedOptimizationInput` and `GeneralizedOptimizationResult` models to allow input from dicts (rather than
+   models) in OptKing. Further specialized QCElemental. @loriab
+ * [\#34](https://github.com/MolSSI/QCManyBody/pull/34) Schema -- `ManyBodyResultProperties` is still only explicitly
+   enumerated up to tetramers, but now it allows through higher-body fields when they match a pattern. @loriab
+ * [\#34](https://github.com/MolSSI/QCManyBody/pull/34) Maint -- start testing optimizations through QCEngine. @loriab
+ * [\#34](https://github.com/MolSSI/QCManyBody/pull/34) Util -- add `labeler(..., opaque=False)` option to produce
+   eye-friendly `(1)@(1, 2)` style lablels as well as the semi-opaque internal style. Also always convent single ints
+   to tuples now. Function `delabeler` can decode the new style. @loriab
+ * [\#34](https://github.com/MolSSI/QCManyBody/pull/34) Intf -- sort "core" `nbodies_per_mc_level` dictionary so model
+   chemistries are in a predictable 1b, 2b, ..., supersystem order. Check that high-level (different data structure) agrees.
+ * [\#34](https://github.com/MolSSI/QCManyBody/pull/34) Util -- add short ordinal model chemistry level (e.g., §A) to the
+   `format_calc_plan` and `print_nbody_energy` summaries. @loriab
+ * [\#34](https://github.com/MolSSI/QCManyBody/pull/34) Util -- add function `modelchem_labels` to associate n-body
+   level, model chemistry level, one-char ordinal modelchem label, and n-bodies-covered modelchem label. @loriab
 
 
 ## v0.2.1 / 2024-05-14

diff --git a/docs/qcschema.md b/docs/qcschema.md
@@ -48,11 +48,16 @@ $pydantic: qcmanybody.models.manybody_input_pydv1.ManyBodyProtocols
 
     The properties model is generated dynamically based on a constant
     ``MAX_NBODY``. To not overload the docs table, this is set to 5, which
-    covers full calculations on tetramers. To use a larger
-    ``ManyBodyKeywords.max_nbody``, reset this value.
-
-        import qcmanybody as qcmb
-        qcmb.models.MAX_NBODY = 8
+    covers full calculations on tetramers. In practice this isn't a problem for
+    larger clusters because ``cp_corrected_total_energy_through_12_body``, for
+    example, is allowed dynamically for a model instance. Nevertheless, to use a
+    larger ``ManyBodyKeywords.max_nbody``, reset this value *outside* the interpreter.
+
+        python -c "import qcmanybody as qcmb;print(qcmb.models.MAX_NBODY)"
+        #> 5
+        export QCMANYBODY_MAX_NBODY=9  # explicitly enumerates octamer properties
+        python -c "import qcmanybody as qcmb;print(qcmb.models.MAX_NBODY)"
+        #> 9
 
 
 ::: qcmanybody.models.ManyBodyResultProperties

diff --git a/pyproject.toml b/pyproject.toml
@@ -87,4 +87,8 @@ markers = [
     "cfour: tests using CFOUR software; skip if unavailable",
     "nwchem: tests using classic NWChem software; skip if unavailable",
     "psi4: tests using Psi4 software; skip if unavailable",
+    "geometric: tests using GeomeTRIC software; skip if unavailable",
+    "geometric_genopt: tests using GeomeTRIC software with new additions; skip if unavailable",
+    "optking: tests using OptKing software; skip if unavailable",
+    "optking_genopt: tests using OptKing software with new additions; skip if unavailable",
 ]
diff --git a/qcmanybody/computer.py b/qcmanybody/computer.py
@@ -331,6 +331,9 @@ def set_max_nbody(cls, v: Any, values) -> int:
         nfr = len(values["molecule"].fragments)
         # print(f" {levels_max_nbody=} {nfr=}", end="")
 
+        if len(set(values["levels"].values())) != len(values["levels"]):
+            raise ValueError("Cannot have duplicate model chemistries in levels.")
+
         # ALT if v == -1:
         if v is None:
             v = levels_max_nbody
@@ -424,6 +427,14 @@ def from_manybodyinput(cls, input_model: ManyBodyInput, build_tasks: bool = True
             embedding_charges=computer_model.embedding_charges,
         )
 
+        # check that core and computer storage are consistent in mc ordering and grouping and nbody levels
+        assert (
+            list(computer_model.qcmb_core.nbodies_per_mc_level.values()) == computer_model.nbodies_per_mc_level
+        ), f"CORE {computer_model.qcmb_core.nbodies_per_mc_level.values()} != COMPUTER {computer_model.nbodies_per_mc_level}"
+        assert list(computer_model.qcmb_core.nbodies_per_mc_level.keys()) == list(
+            computer_model.levels.values()
+        ), f"CORE {computer_model.qcmb_core.nbodies_per_mc_level.keys()} != COMPUTER {computer_model.levels.values()}"
+
         if not build_tasks:
             return computer_model
 

diff --git a/qcmanybody/core.py b/qcmanybody/core.py
@@ -2,6 +2,7 @@
 
 import logging
 import math
+import string
 from collections import Counter, defaultdict
 from typing import Any, Dict, Iterable, Literal, Mapping, Sequence, Set, Tuple, Union
 
@@ -17,6 +18,7 @@
     delabeler,
     find_shape,
     labeler,
+    modelchem_labels,
     print_nbody_energy,
     resize_gradient,
     resize_hessian,
@@ -91,7 +93,23 @@ def __init__(
         for k, v in self.levels_no_ss.items():
             self.nbodies_per_mc_level[v].append(k)
 
-        self.nbodies_per_mc_level = {k: sorted(v) for k, v in self.nbodies_per_mc_level.items()}
+        # order nbodies_per_mc_level keys (modelchems) by the lowest n-body level covered; any
+        #   supersystem key (replaced below) is at the end. Order nbodies within each modelchem.
+        #   Reset mc_levels to match.
+        self.nbodies_per_mc_level = {
+            k: sorted(v)
+            for (k, v) in sorted(self.nbodies_per_mc_level.items(), key=lambda item: sorted(item[1] or [1000])[0])
+        }
+        assert self.mc_levels == set(self.nbodies_per_mc_level.keys())  # remove after some downstream testing
+        self.mc_levels = self.nbodies_per_mc_level.keys()
+
+        for mc, nbs in self.nbodies_per_mc_level.items():
+            if nbs and ((nbs[-1] - nbs[0]) != len(nbs) - 1):
+                raise ValueError(
+                    f"QCManyBody: N-Body levels must be contiguous within a model chemistry spec ({mc}: {nbs}). Use an alternate spec name to accomplish this input."
+                )
+                # TODO - test and reenable if appropriate. my guess is that noncontig nb is fine on the core computing side,
+                #   but trouble for computer and nbodies_per_mc_level inverting and indexing. Safer to deflect for now since input tweak allows the calc.
 
         # Supersystem is always at the end
         if "supersystem" in levels:
@@ -158,17 +176,17 @@ def format_calc_plan(self, sset: str = "all") -> Tuple[str, Dict[str, Dict[int,
         info
             A text summary with per- model chemistry and per- n-body-level job counts.
             ```
-            Model chemistry "c4-ccsd":    22
+            Model chemistry "c4-ccsd" (§A):         22
                  Number of 1-body computations:     16 (nocp: 0, cp: 0, vmfc_compute: 16)
                  Number of 2-body computations:      6 (nocp: 0, cp: 0, vmfc_compute: 6)
 
-            Model chemistry "c4-mp2":    28
+            Model chemistry "c4-mp2" (§B):          28
                  Number of 1-body computations:     12 (nocp: 0, cp: 0, vmfc_compute: 12)
                  Number of 2-body computations:     12 (nocp: 0, cp: 0, vmfc_compute: 12)
                  Number of 3-body computations:      4 (nocp: 0, cp: 0, vmfc_compute: 4)
             ```
         Dict[str, Dict[int, int]]
-            Data structure with outer key mc-label, inner key 1-indexed n-body, value job count.
+            Data structure with outer key mc-label, inner key 1-indexed n-body, and value job count.
         """
         # Rearrange compute_list from key nb having values (species) to compute all of that nb
         #   to key nb having values counting that nb.
@@ -179,10 +197,13 @@ def format_calc_plan(self, sset: str = "all") -> Tuple[str, Dict[str, Dict[int,
                 all_calcs = set().union(*compute_dict[sub].values())
                 compute_list_count[mc][sub] = Counter([len(frag) for (frag, _) in all_calcs])
 
+        mc_labels = modelchem_labels(self.nbodies_per_mc_level, presorted=True)
+        full_to_ordinal_mc_lbl = {v[0]: v[1] for v in mc_labels.values()}
         info = []
         for mc, counter in compute_list_count.items():
             all_counter = counter["all"]
-            info.append(f'    Model chemistry "{mc}" (???):    {sum(all_counter.values())}')
+            mcheader = f'    Model chemistry "{mc}" ({full_to_ordinal_mc_lbl[mc]}):'
+            info.append(f"{mcheader:38} {sum(all_counter.values()):6}")
             for nb, count in sorted(all_counter.items()):
                 other_counts = [f"{sub}: {counter[sub][nb]}" for sub in ["nocp", "cp", "vmfc_compute"]]
                 info.append(f"        Number of {nb}-body computations: {count:6} ({', '.join(other_counts)})")
@@ -556,6 +577,7 @@ def analyze(
                 all_results["energy_body_dict"][bt],
                 f"{bt.formal()} ({bt.abbr()})",
                 self.nfragments,
+                modelchem_labels(self.nbodies_per_mc_level, presorted=True),
                 is_embedded,
                 self.supersystem_ie_only,
                 self.max_nbody if self.has_supersystem else None,

diff --git a/qcmanybody/models/generalized_optimization.py b/qcmanybody/models/generalized_optimization.py
@@ -11,23 +11,32 @@
 from .manybody_input_pydv1 import ManyBodySpecification
 from .manybody_output_pydv1 import ManyBodyResult
 
-# note that qcel AtomicResult.schema_name needs editing
-ResultTrajectories = Annotated[Union[AtomicResult, ManyBodyResult], Field(discriminator="schema_name")]
+# note that qcel QCInputSpecification and AtomicResult.schema_name needs editing
+ResultTrajectories = Annotated[
+    Union[AtomicResult, ManyBodyResult],
+    Field(
+        discriminator="schema_name",
+        description="A result object for a single step in the optimization. Either an ordinary atomic/single-point or a many-body result.",
+    ),
+]
+InputSpecifications = Annotated[
+    Union[QCInputSpecification, ManyBodySpecification],
+    Field(
+        discriminator="schema_name",
+        description="A directive to compute a gradient. Either an ordinary atomic/single-point or a many-body spec.",
+    ),
+]
 
 
 class GeneralizedOptimizationInput(OptimizationInput):
     schema_name: Literal["qcschema_generalizedoptimizationinput"] = "qcschema_generalizedoptimizationinput"
     schema_version: int = 1
-    input_specification: Union[QCInputSpecification, ManyBodySpecification] = Field(
-        ..., description="ordinary or mbe grad spec"
-    )
+    input_specification: InputSpecifications
 
 
 class GeneralizedOptimizationResult(OptimizationResult):
     schema_name: Literal["qcschema_generalizedoptimizationresult"] = "qcschema_generalizedoptimizationresult"
     trajectory: List[ResultTrajectories] = Field(
         ..., description="A list of ordered Result objects for each step in the optimization."
     )
-    input_specification: Union[QCInputSpecification, ManyBodySpecification] = Field(
-        ..., description="ordinary or mbe grad spec"
-    )
+    input_specification: InputSpecifications
diff --git a/qcmanybody/models/manybody_input_pydv1.py b/qcmanybody/models/manybody_input_pydv1.py
@@ -27,6 +27,11 @@ class AtomicSpecification(ProtoModel):
     program: str = Field(..., description="The program for which the Specification is intended.")
 
     schema_name: Literal["qcschema_atomicspecification"] = "qcschema_atomicspecification"
+    schema_version: Literal[1] = Field(
+        1,
+        description="The version number of ``schema_name`` to which this model conforms.",
+    )
+
     driver: DriverEnum = Field(..., description=DriverEnum.__doc__)
     model: Model = Field(..., description=Model.__doc__)
     protocols: AtomicResultProtocols = Field(
@@ -121,6 +126,11 @@ def abbr(self):
 class ManyBodyKeywords(ProtoModel):
     """The many-body-specific keywords for user control."""
 
+    schema_name: Literal["qcschema_manybodykeywords"] = "qcschema_manybodykeywords"
+    schema_version: Literal[1] = Field(
+        1,
+        description="The version number of ``schema_name`` to which this model conforms.",
+    )
     bsse_type: List[BsseEnum] = Field(
         [BsseEnum.cp],
         # definitive description
@@ -205,6 +215,10 @@ class ManyBodySpecification(ProtoModel):
     """Combining the what (ManyBodyKeywords) with the how (AtomicSpecification)."""
 
     schema_name: Literal["qcschema_manybodyspecification"] = "qcschema_manybodyspecification"
+    schema_version: Literal[1] = Field(
+        1,
+        description="The version number of ``schema_name`` to which this model conforms.",
+    )
     # provenance: Provenance = Field(Provenance(**provenance_stamp(__name__)), description=Provenance.__doc__)
     keywords: ManyBodyKeywords = Field(..., description=ManyBodyKeywords.__doc__)
     # program: str = Field(..., description="The program for which the Specification is intended.")  # TODO is qcmanybody
@@ -218,6 +232,10 @@ class ManyBodySpecification(ProtoModel):
         ...,
         description="??? TODO expand to cbs, fd",
     )
+    extras: Dict[str, Any] = Field(
+        {},
+        description="Additional information to bundle with the computation. Use for schema development and scratch space.",
+    )
 
     # v2: @field_validator("specification", mode="before")
     @validator("specification", pre=True)
@@ -235,6 +253,10 @@ class ManyBodyInput(ProtoModel):
     """Combining the what and how (ManyBodySpecification) with the who (Molecule)."""
 
     schema_name: Literal["qcschema_manybodyinput"] = "qcschema_manybodyinput"
+    schema_version: Literal[1] = Field(
+        1,
+        description="The version number of ``schema_name`` to which this model conforms.",
+    )
     # provenance: Provenance = Field(Provenance(**provenance_stamp(__name__)), description=Provenance.__doc__)
     specification: ManyBodySpecification = Field(
         ...,