Merge pull request #2 from MolSSI-MDI/energy-calculation

Adds energy calculation section for comparison of Psi4 and ANI energies
MolSSI-MDI · May 14, 2024 · 7b29983 · 7b29983
2 parents c0ce5a9 + 231c745
commit 7b29983
Show file tree

Hide file tree

Showing 5 changed files with 354 additions and 5 deletions.
diff --git a/driver_tutorial.md b/driver_tutorial.md
@@ -10,13 +10,13 @@ It is also suggested that you have completed the [TorchANI Tutorial](torchani_tu
 
 :::
 
-To complete this tutorial, clone the [starting repository](https://github.com/janash/mdi-ani-workshop.git).
+To complete this tutorial, clone the [starting repository](https://github.com/MolSSI-MDI/mdi-ani-workshop.git).
 
 ````{tab-set} 
 ```{tab-item} shell
 
 :::{code-block} shell  
-git clone https://github.com/janash/mdi-ani-workshop.git
+git clone https://github.com/MolSSI-MDI/mdi-ani-workshop.git
 :::
 
 ```

diff --git a/energy_calculations.md b/energy_calculations.md
@@ -0,0 +1,306 @@
+# Calculating Molecular Energies
+
+Before we start to write a molecular dynamics driver for ANI, we might want to calculate the energies of a few molecules to make sure we get reasonable energy estimates and get a feel for using TorchANI.
+
+In this tutorial, we will calculate the energy of water clusters using both ANI and Psi4. We will compare our energy results and compare the amount of time it takes to calculate the energy of a water cluster using ANI and Psi4.
+The benchmarks we will do as part of this section are basic and are far from thorough.
+However, they are a good starting point for understanding the performance of ANI and Psi4.  
+
+## Water Cluster Configurations
+We have provided a number of XYZ files for water clusters in the `xyz` folder. 
+We will first work with files in the `xyz/scaling` folder to examine calculated energies and scaling behavior. 
+These XYZ files come from a [database of water cluster minima](https://sites.uw.edu/wdbase/database-of-water-clusters/).
+
+
+## Energy Calculation and Benchmarking Script
+
+Open the script `xyz/energy_calculations.py` to see the starting code for this section.
+The starting code includes functions for caluculating the energy of a system using ANI and Psi4.
+
+:::{tab-set}
+
+````{tab-item} xyz/energy_calculations.py
+```python
+"""
+Retrieve xyz from a URL and calculate the energy using ANI
+"""
+
+import glob
+import os 
+import time
+
+import psi4
+import torch
+import torchani
+
+import numpy as np
+import matplotlib.pyplot as plt
+
+import tabulate
+
+
+device = torch.device("cpu")
+model = torchani.models.ANI2x(periodic_table_index=True).to(device)
+
+def elements_to_atomic_numbers(elements):
+    """
+    Convert element symbols to atomic numbers using a predefined dictionary.
+
+    Parameters
+    ----------
+    elements : list of str
+        A list of element symbols.
+
+    Returns
+    -------
+    list of int
+        A list of atomic numbers corresponding to the element symbols.
+    """
+
+    conversion_dict =  {
+    "H": 1,     # Hydrogen
+    "C": 6,    # Carbon
+    "N": 7,    # Nitrogen
+    "O": 8,    # Oxygen
+    "F": 9,    # Fluorine
+    "Cl": 17,   # Chlorine
+    "S": 16    # Sulfur
+    }
+
+    return [ conversion_dict[element] for element in elements ]
+
+def calculate_ANI_energy(xyz_file):
+    """
+    Calculate the ANI2x energy for system of atoms.
+
+    Parameters
+    ----------
+    elements:
+        An array or list containing the elements of the atoms in the system.
+    coordinates:
+        An array containing the 3D coordinates of the atoms.
+    
+    Returns
+    -------
+    tuple(float, float)
+        The energy of the system and the time taken to calculate the energy.
+    """
+
+    coordinates = np.loadtxt(xyz_file, skiprows=2, usecols=(1, 2, 3))
+    elements = np.loadtxt(xyz_file, skiprows=2, usecols=0, dtype=str)
+
+    coords_reshape = coordinates.reshape(1, -1, 3)
+    atomic_numbers = elements_to_atomic_numbers(elements)
+
+    torch_coords = torch.tensor(coords_reshape, requires_grad=True, device=device).float()
+    atomic_numbers_torch = torch.tensor([atomic_numbers], device=device)
+
+    # Calculate the energy
+    start = time.time() 
+    energy = model((atomic_numbers_torch, torch_coords)).energies
+    end = time.time()   
+
+    # Return the energy
+    return energy.item(), end - start   
+
+def calculate_Psi4_energy(xyz_file, name=None):
+    """
+    Calculate the energy of a system using Psi4 and the wb97x/6-31g* level of theory.
+
+    Parameters
+    ----------
+    xyz_file:
+        The path to the xyz file.
+    name:   
+        The name of the output file. If not provided, the name of the xyz file will be used.
+
+    Returns
+    -------
+    tuple(float, float)
+        The energy of the system as calculated by Psi4 using the wb97x/6-31g* level of theory.    
+    """
+
+    with open(xyz_file, 'r') as file:
+        xyz_data = file.read()
+
+    if not name:
+        name = os.path.basename(xyz_file).split(".")[0]
+
+    # set the geometry for Psi4
+    psi4.geometry(xyz_data)
+
+    # set output file for Psi4
+    psi4.set_output_file(F'{name}.dat', False)
+
+    # calculate the energy
+    start = time.time()
+    psi4_energy = psi4.energy("wb97x/6-31g*")
+    end = time.time()   
+
+    return psi4_energy, end - start
+
+
+if __name__ == "__main__":
+
+    xyz_files = glob.glob("scaling/*.xyz")
+
+    energies = []
+    times = []
+    differences = []   
+
+    for xyz in xyz_files:
+        ani_energy, ani_time = calculate_ANI_energy(xyz)
+        psi4_energy, psi4_time = calculate_Psi4_energy(xyz)
+
+        n_water = int(os.path.basename(xyz).split(".")[0].split("_")[1])
+
+        energies.append((n_water, ani_energy, psi4_energy))
+        times.append((n_water, ani_time, psi4_time))
+        differences.append((n_water, ani_energy - psi4_energy))
+
+    # Sort the results
+    energies = sorted(energies)
+    times = sorted(times)
+    differences = sorted(differences)
+
+    # Print energies
+    energy_headers = ["Number of Water Molecules", "ANI Energy (Ha)", "Psi4 Energy (Ha)"]
+    print(tabulate.tabulate(energies, headers=energy_headers, tablefmt="grid"))
+
+    # Print energy differences
+    energy_diff_headers = ["Number of Water Molecules", "Energy Difference (Ha)"]
+    print(tabulate.tabulate(differences, headers=energy_diff_headers, tablefmt="grid"))
+
+    # Print times
+    time_headers = ["Number of Water Molecules", "ANI Time (s)", "Psi4 Time (s)"]
+    print(tabulate.tabulate(times, headers=time_headers, tablefmt="grid"))
+
+    # Write the energies to a file
+    with open("benchmark_energies.txt", "w") as f:
+        f.write("Number of water molecules, ANI Energy (Ha), Psi4 Energy (Ha)\n")
+        for energy in energies:
+            f.write(f"{energy[0]}, {energy[1]}, {energy[2]}\n")
+
+    # Write the timing information to a file
+    with open("benchmark_times.txt", "w") as f:
+        f.write("Number of water molecules, ANI Time (s), Psi4 Time (s)\n")
+        for time in times:
+            f.write(f"{time[0]}, {time[1]}, {time[2]}\n")
+
+    # Create a plot of the energy differences
+    plt.figure()
+    plt.plot([x[0] for x in differences], [x[1] for x in differences], 'o-', label='Energy Difference (ANI - Psi4)')
+    plt.xlabel("Number of Water Molecules")
+    plt.ylabel("Energy Difference (Ha)")
+    plt.legend()
+
+    plt.savefig("energy_differences.png")
+
+    # Create a plot of the timings
+    plt.figure()
+    plt.plot([x[0] for x in times], [x[1] for x in times], 'o-', label='ANI Time')
+    plt.plot([x[0] for x in times], [x[2] for x in times], 's-', label='Psi4 Time')
+    plt.xlabel("Number of Water Molecules")
+    plt.ylabel("Time (s)")
+    plt.legend()
+
+    plt.savefig("timing.png") 
+```
+````
+:::
+
+Run this script by opening the MDI Mechanic container in interactive mode.
+
+:::{tab-set}
+
+````{tab-item} ANI Energy Calculation
+```bash
+mdi interactive
+```
+````
+:::
+
+Next, `cd` to the `xyz` directory and run the script.
+
+:::{tab-set}
+
+````{tab-item} ANI Energy Calculation
+```bash
+cd xyz
+python energy_calculations.py
+```
+````
+:::
+
+While the script is running, you can examine the code to see how the energy calculations are done using ANI and Psi4.
+
+## Exercises
+
+### Exercise 1: Quantifying Scaling Behavior
+Using the timings we obtained from the energy calculation (written to `benchmark_times.txt`), calculate the scaling behavior of ANI and Psi4.
+The scaling behavior, also called the computational complexity, of an algorithm is the relationship between the size of the input and the time it takes to run the algorithm. 
+This is also sometimes called "big O" notation, with the "O" standing for "order of magnitude".
+
+To calculate the computational complexity, we can fit a curve to the timings and determine the order of the curve.
+
+```{math}
+T(N) = aN^b
+```
+
+Where {math}`T(N)` is the time taken to calculate the energy of a system with {math}`N` water molecules, {math}`a` is a constant, and {math}`b` is the order of the curve. The {math}`b` value is the scaling behavior of the algorithm.
+
+When performing this fit, consider that you can linearize the equation by taking the logarithm of both sides:
+
+```{math}
+\log(T(N)) = \log(a) + b \log(N)
+```
+
+For performing this task, you might consider the [Numpy Polyfit function](https://numpy.org/doc/stable/reference/generated/numpy.polyfit.html), the [Scipy Curve Fit function](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html), or another fitting method of your choice.
+
+**Hint** - You should not be re-running the energy calculations to complete this exercise. You can use the timings written to `benchmark_times.txt`.
+
+**Hint 2** - Depending on you you choose to complete the problem, you may need to install additional Python packages. You can do this in `mdimechanic interactive` mode using `pip install`, or by modifying `mdimechanic.yaml` to include the packages you need.
+
+### Exercise 2: Water Potential Energy Surface
+Using the starting configuration for the water dimer (`xyz/water_dimer.xyz`), calculate the potential energy surface of the water dimer using ANI and Psi4. You should move one of the water molecules closer to or further from the other water molecule and calculate the energy at each step. Use the oxygen-oxygen distance as the reaction coordinate.
+
+You can use this z-matrix template for the water dimer:
+
+```
+water_dimer = """
+O1
+H2 1 1.0
+H3 1 1.0 2 104.52
+x4 2 1.0 1 90.0 3 180.0
+--
+O5 2 **R** 4 90.0 1 180.0
+H6 5 1.0 2 120.0 4 90.0
+H7 5 1.0 2 120.0 4 -90.0
+"""
+```
+
+Where `**R**` is the oxygen-oxygen distance.
+This can be made into a Psi4 geometry object by replacing `**R**` with the desired separation distance.
+
+You can use this as starting code:
+
+```python
+rvals = [1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0, 2.1, 2.2, 2.3, 2.4, 2.5, 3.0, 3.5]
+energies = []
+for r in rvals:
+    # Build a new molecule at each separation
+    mol = psi4.geometry(water_dimer.replace('**R**', str(r)))
+```
+
+A Psi4 molecule can be saved to an xyz file using `mol.save_xyz_file("water_dimer_2.xyz", True)`. 
+You can then use this xyz file to calculate the energy using ANI.
+
+## Exercise 3: Evaluating the Accuracy of ANI for Benzene
+The directory `xyz/benzene` contains a number of XYZ files with different configurations of benzene.
+Write a script to calculate the energy of each benzene configuration using ANI and Psi4. 
+Does ANI provide the correct relative energies for the different benzene configurations?
+
+
+
+
+
diff --git a/index.rst b/index.rst
@@ -31,7 +31,8 @@ This tutorial will walk you through creating an MDI Driver that uses [TorchANI](
    :titlesonly:
 
    setup
-   torchani_tutorial
+   mdimechanic
+   energy_calculations
    driver_tutorial
 
 
diff --git a/setup.md b/setup.md
@@ -1 +1,44 @@
-# Set Up
+# Set Up
+
+## GitHub Codespaces (Recommended)
+We recommend using the [GitHub Codespaces](https://codespaces.new/MolSSI-MDI/mdi-ani-workshop) we have prepared for this workshop.
+GitHub Codespaces are cloud-based development environments that are created directly from a GitHub repository.
+This Codespace will have all of the necessary software installed and configured for you to complete the tutorial.
+
+To use GitHub Codespaces, you must first [sign up for a GitHub account](https://github.com/signup) if you do not have one. 
+Then you can click the button below to open the code space.
+
+[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/MolSSI-MDI/mdi-ani-workshop)
+
+## Installing the Software Locally
+You may also choose to install software and run the tutorial locally. 
+In order to do so, you will need to have [Docker](https://docs.docker.com/get-docker/) and Python installed on your machine.
+We recommend using the [conda package manager](https://docs.conda.io/en/latest/miniconda.html) for Python installation and environment management. 
+If you'd like to know more detailed instructions as well as MolSSI's recommendations for computer set-up, see [these instructions](https://education.molssi.org/python-package-best-practices/setup.html) from our Best Practices workshop.
+
+After ensuring you have the necessary software installed, create an environment for this project then install mdimechanic using pip:
+
+:::{tab-set}
+
+````{tab-item} bash
+
+```bash
+conda create -n mdi-ani-workshop python=3.11
+pip install mdimechanic
+```
+
+````
+:::
+
+To complete this tutorial, clone the [starting repository](https://github.com/MolSSI-MDI/mdi-ani-workshop.git).
+
+````{tab-set} 
+```{tab-item} shell
+
+:::{code-block} shell  
+git clone https://github.com/MolSSI-MDI/mdi-ani-workshop.git
+:::
+
+```
+````
+
diff --git a/torchani_tutorial.md b/torchani_tutorial.md