up docs

m3g · Apr 1, 2022 · d903a75 · d903a75
1 parent 95b8153
commit d903a75
Showing 1 changed file with 145 additions and 11 deletions.
diff --git a/docs/src/advanced.md b/docs/src/advanced.md
@@ -1,21 +1,155 @@
 # Advanced usage
 
-### Advanced, non-allocating call
+## Array preallocation
 
-If this function will be called from within a loop where the coordinates of the molecules involved change, a faster and non-allocating call can be used, by exposing the interface of `CellListMap`. For example:
+Thea advanced usage of this package exposes the interface of `CellListMap`, such that it is possible to iterate through sets varying coordinates and perhaps box types without need for reallocating lists of neighbors every time.  
 
+Basically, preallocation is needed for:
+
+- The cell lists.
+- The resulting lists of minimum-distances. 
+
+And for parallel runs (probably the most common ones):
+
+- Auxiliary arrays for storing threaded versions of the cell lists.
+- Auxiliary arrays for storing threaded versions of the resulting lists of minimum-distances. 
+
+## Index of molecules
+
+Additionally, the low level interface allows the definition of more general groups of particles, in the sense that "molecule" can have different number of atoms in the same set. Therefore, one needs to provide *a function* that returns the index of the molecule of each atom, given the index of the atom. 
+
+Briefly, if a set of atoms belong to molecules of the same number of atoms, one can compute the index of each molecule using
+```julia
+mol_index(i,n) = div((i - 1), n) + 1
+```
+where `i` is the atom index in the array of coordinates, and `n` is the number of atoms per molecule. This is the default assumed in the basic interface, and can be called with:
 ```julia-repl
-julia> function iterate_lists(nsteps, list, water, protein, box, cl, aux)
-           inds(i) = mol_index(i,3)
-           for i in 1:nsteps
-               # water and protein coordinates, and box, could change here
-               cl = UpdateCellList!(water, protein, box, cl, aux)
-               minimum_distances!(inds, list, box, cl)
-               # list was updated
-           end
-        end
+julia> using StaticArrays
+
+julia> x = rand(SVector{3,Float64},9); # 3 water molecules
+
+julia> mol_index(2,3) # second atom belongs to first molecule
+1
+
+julia> mol_index(4,3) # fourth atom belongs to second molecule
+2
+```
+
+Typically, as we will show, this function will be used for setting up molecule indexes.
+
+However, more general indexing can be used. For instance, let us suppose that the 9 atoms of the `x` array of coordinates above belong to `2` molecules, with `4` and `5` indexes each. Then, we could define, for example:
+
+```julia-repl
+julia> my_mol_index(i) = i <= 4 ? 1 : 2
+my_mol_index (generic function with 1 method)
+
+julia> my_mol_index(4)
+1
+
+julia> my_mol_index(5)
+2
+```
+
+Since the function can close-over an array of molecular indexes, the definition can be completely general, that is:
+
+```julia-repl
+julia> molecular_indexes = [ 1, 3, 3, 2, 2, 1, 3, 1, 2 ];
+
+julia> my_mol_index(i) = molecular_indexes[i]
+my_mol_index (generic function with 1 method)
+
+julia> my_mol_index(1)
+1
 
+julia> my_mol_index(5)
+2
+```
+
+In summary, this function that given the index of the atom returns the index of the corresponding molecule must be provided in the advanced interface, and typically will be just a closure around the number of atoms per molecule, using the already available `mol_index` function. 
+
+### Example
+
+The example below illustrates the usage of the interface described above: 
+
+```julia
+using MolecularMinimumDistances
+using CellListMap # required
+using PDBTools # example
+
+function iterate_lists(nsteps, water, protein)
+    # Load example coordinates
+    # Build initial system and cell lists
+    box = Box([84.48, 84.48, 84.48], 12.)
+    cl = CellList(protein,water,box)
+    # Function that given the index of the atom, returns the index of the molecule. Here, all molecules are similar, and we use the standard `mol_index` function:
+    water_index(i) = mol_index(i,3)
+    # Initialize output array; i -> mol_index(i,3) returns the molecule index for each water molecule
+    list = init_list(water, water_index)
+    # Initalize auxiliary arrays for multi-threading: The number of batches of multi-threading is obtained from a property of the cell-lists (see the CellListMap documentation for additional information and options).
+    aux_cl = CellListMap.AuxThreaded(cl)
+    list_threaded = [ copy(list) for _ in 1:nbatches(cl) ]
+    # Now we are ready to iterate over many possible different sets of coordinates. Here the coordinates or the box won't change, but they could:
+    for i in 1:nsteps
+        # box = Box(...) the box could be redefined here.
+        # Update the cell lists: water and protein coordinates, and box, could change here. 
+        cl = UpdateCellList!(water, protein, box, cl, aux_cl)
+        # Update the `list` array:
+        minimum_distances!( water_index, list, box, cl, list_threaded = list_threaded) 
+        # Perform whatever futher analysis using the `list` of minimum distances.
+    end
+end
+
+# This function is the driver fro the above example:
+function run(nsteps)
+    system = MolecularMinimumDistances.download_example()
+    water = coor(system, "water") # water coordinates
+    protein = coor(system, "protein") # protein coordinates
+    iterate_lists(nsteps, water, protein)
+end
+```
+
+Running the above example shows that allocations are small for each iteration:
+```julia-repl
+julia> using BenchmarkTools
+
+julia> system = MolecularMinimumDistances.download_example();
+
+julia> water = coor(system, "water"); # water coordinates
+
+julia> protein = coor(system, "protein"); # protein coordinates
+
+julia> @btime iterate_lists(10, $water, $protein)
+  1.113 s (21471 allocations: 36.42 MiB)
+
+julia> @btime iterate_lists(100, $water, $protein)
+  13.568 s (37946 allocations: 38.36 MiB)
+```
+
+In fact, if we opted to run the calculation in serial, with:
+
+```julia
+function iterate_lists_serial(nsteps, water, protein)
+    box = Box([84.48, 84.48, 84.48], 12.)
+    cl = CellList(protein,water,box, parallel=false)
+    water_index(i) = mol_index(i,3)
+    list = init_list(water, water_index)
+    for i in 1:nsteps
+        cl = UpdateCellList!(water, protein, box, cl, parallel=false)
+        minimum_distances!( water_index, list, box, cl, parallel=false) 
+        # Perform whatever futher analysis using the `list` of minimum distances.
+    end
+end
+```
+
+We can see that the updating of the cell lists and the computation of the minimum-distance lists are completely allocation free, such that the loop is allocation free:
+
+```
+julia> @btime iterate_lists_serial(10, $water, $protein)
+  5.038 s (1925 allocations: 14.43 MiB)
 
+julia> @btime iterate_lists_serial(20, $water, $protein)
+  11.613 s (1955 allocations: 14.43 MiB)
 ```
 
+Thus, using a parallelization scheme at a upper level can be also an alternative.