Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/develop' into pgrete/pmd-output
Browse files Browse the repository at this point in the history
  • Loading branch information
pgrete committed Sep 2, 2024
2 parents 6811594 + ec61c9c commit c0d7f11
Show file tree
Hide file tree
Showing 51 changed files with 1,276 additions and 669 deletions.
29 changes: 29 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,30 @@

## Current develop

### Changed (changing behavior/API/variables/...)


### Fixed (not changing behavior/API/variables/...)


### Infrastructure (changes irrelevant to downstream codes)


### Removed (removing behavior/API/varaibles/...)


### Incompatibilities (i.e. breaking changes)


## Release 24.08
Date: 2024-08-30

### Added (new features/APIs/variables/...)
- [[PR 1151]](https://github.com/parthenon-hpc-lab/parthenon/pull/1151) Add time offset `c` to LowStorageIntegrator
- [[PR 1147]](https://github.com/parthenon-hpc-lab/parthenon/pull/1147) Add `par_reduce_inner` functions
- [[PR 1159]](https://github.com/parthenon-hpc-lab/parthenon/pull/1159) Add additional timestep controllers in parthenon/time.
- [[PR 1148]](https://github.com/parthenon-hpc-lab/parthenon/pull/1148) Add `GetPackDimension` to `StateDescriptor` for calculating pack sizes before `Mesh` initialization
- [[PR 1143]](https://github.com/parthenon-hpc-lab/parthenon/pull/1143) Add tensor indices to VariableState, add radiation constant to constants, add TypeLists, allow for arbitrary containers for solvers
- [[PR 1140]](https://github.com/parthenon-hpc-lab/parthenon/pull/1140) Allow for relative convergence tolerance in BiCGSTAB solver.
- [[PR 1047]](https://github.com/parthenon-hpc-lab/parthenon/pull/1047) General three- and four-valent 2D forests w/ arbitrary orientations.
- [[PR 1130]](https://github.com/parthenon-hpc-lab/parthenon/pull/1130) Enable `parthenon::par_reduce` for MD loops with Kokkos 1D Range
Expand All @@ -28,12 +51,17 @@
- [[PR 1019]](https://github.com/parthenon-hpc-lab/parthenon/pull/1019) Enable output for non-cell-centered variables

### Changed (changing behavior/API/variables/...)
- [[PR 1153]](https://github.com/parthenon-hpc-lab/parthenon/pull/1153) Allow base grid with fewer blocks than ranks before initial AMR
- [[PR 1105]](https://github.com/parthenon-hpc-lab/parthenon/pull/1105) Refactor parameter input for linear solvers
- [[PR 1078]](https://github.com/parthenon-hpc-lab/parthenon/pull/1078) Add reduction fallback in 1D. Add IndexRange overload for 1D par loops
- [[PR 1024]](https://github.com/parthenon-hpc-lab/parthenon/pull/1024) Add .outN. to history output filenames
- [[PR 1004]](https://github.com/parthenon-hpc-lab/parthenon/pull/1004) Allow parameter modification from an input file for restarts

### Fixed (not changing behavior/API/variables/...)
- [[PR 1145]](https://github.com/parthenon-hpc-lab/parthenon/pull/1145) Fix remaining swarm D->H->D copies
- [[PR 1150]](https://github.com/parthenon-hpc-lab/parthenon/pull/1150) Reduce memory consumption for buffer pool
- [[PR 1146]](https://github.com/parthenon-hpc-lab/parthenon/pull/1146) Fix an issue outputting >4GB single variables per rank
- [[PR 1152]](https://github.com/parthenon-hpc-lab/parthenon/pull/1152) Fix memory leak in task graph outputs related to `abi::__cxa_demangle`
- [[PR 1146]](https://github.com/parthenon-hpc-lab/parthenon/pull/1146) Fix an issue outputting >4GB single variables per rank
- [[PR 1144]](https://github.com/parthenon-hpc-lab/parthenon/pull/1144) Fix some restarts w/non-CC fields
- [[PR 1132]](https://github.com/parthenon-hpc-lab/parthenon/pull/1132) Fix regional dependencies for iterative task lists and make solvers work for arbirtrary MeshData partitioning
Expand Down Expand Up @@ -82,6 +110,7 @@
- [[PR 1108]](https://github.com/parthenon-hpc-lab/parthenon/pull/1108) Remove NaN payload tags infrastructure

### Incompatibilities (i.e. breaking changes)
- [[PR 1135]](https://github.com/parthenon-hpc-lab/parthenon/pull/1135) Drivers now correctly return DriverStatus::timeout on hittig walltime limit
- [[PR 1128]](https://github.com/parthenon-hpc-lab/parthenon/pull/1128) Add cycle and nbtotal to hst
- [[PR 1108]](https://github.com/parthenon-hpc-lab/parthenon/pull/1108) Remove NaN payload tags infrastructure
- [[PR 1026]](https://github.com/parthenon-hpc-lab/parthenon/pull/1026) Particle BCs without relocatable device code
Expand Down
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ cmake_minimum_required(VERSION 3.16)
# Imports machine-specific configuration
include(cmake/MachineCfg.cmake)

project(parthenon VERSION 24.03 LANGUAGES C CXX)
project(parthenon VERSION 24.08 LANGUAGES C CXX)

if (${CMAKE_VERSION} VERSION_GREATER_EQUAL 3.19.0)
cmake_policy(SET CMP0110 NEW)
Expand Down
6 changes: 0 additions & 6 deletions doc/sphinx/src/concepts_lite.rst

This file was deleted.

4 changes: 3 additions & 1 deletion doc/sphinx/src/driver.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,9 @@ The ``EvolutionDriver`` class derives from ``Driver``, defining the
loop, including periodic outputs. It has a single pure virtual member
function called ``Step`` which a derived class must define and which
will be called during each pass of the loop above.
will be called during each pass of the loop above. The
``SetGlobalTimeStep`` and ``OutputCycleDiagnostics`` functions have
default implementations, but can be overridden for flexibility.

MultiStageDriver
----------------
Expand Down
25 changes: 18 additions & 7 deletions doc/sphinx/src/inputs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ General parthenon options such as problem name and parameter handling.
+---------------------+---------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Default | Type | Description |
+=====================+=========+=========+========================================================================================================================================================================================================+
|| name || none || string || Name of this problem or initialization, prefixed to output files. |
|| archive_parameters || false || string || Produce a parameter file containing all parameters known to Parthenon. Set to `true` for an output file named `parthinput.archive`. Set to `timestamp` for a file with a name containing a timestamp. |
|| name || none || string || Name of this problem or initialization, prefixed to output files. |
+---------------------+---------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+


Expand All @@ -34,12 +34,23 @@ Options related to time-stepping and printing of diagnostic data.
+------------------------------+---------+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Default | Type | Description |
+==============================+=========+========+=======================================================================================================================================================================+
|| tlim || none || float || Stop criterion on simulation time. |
|| nlim || -1 || int || Stop criterion on total number of steps taken. Ignored if < 0. |
|| perf_cycle_offset || 0 || int || Skip the first N cycles when calculating the final performance (e.g., zone-cycles/wall_second). Allows to hide the initialization overhead in Parthenon. |
|| dt_ceil || none || Real || The maximum allowed timestep. |
|| dt_factor || 2.0 || Real || The maximum allowed relative increase of the timestep over the previous value. |
|| dt_floor || none || Real || The minimum allowed timestep. |
|| dt_force || none || Real || Force the timestep to this value, ignoring all other conditions. |
|| dt_init || none || Real || The maximum allowed timestep during the first cycle. |
|| dt_init_force || none || bool || If set to true, force the first cycle's timestep to the value given by dt_init. |
|| dt_min || none || Real || If the timestep falls below dt_min for dt_min_cycle_limit cycles, Parthenon fatals. |
|| dt_min_cycle_limit || 10 || int || The maximum number of cycles the timestep can be below dt_min. |
|| dt_max || none || Real || If the timestep falls below dt_max for dt_max_cycle_limit cycles, Parthenon fatals. |
|| dt_max_cycle_limit || 1 || int || The maximum number of cycles the timestep an be above dt_max. |
|| dt_user || none || Real || Set a global timestep limit. |
|| ncrecv_bdry_buf_timeout_sec || -1.0 || Real || Timeout in seconds for the `ReceiveBoundaryBuffers` tasks. Disabed (negative) by default. Typically no need in production runs. Useful for debugging MPI calls. |
|| ncycle_out || 1 || int || Number of cycles between short diagnostic output to standard out containing, e.g., current time, dt, zone-update/wsec. Default: 1 (i.e, every cycle). |
|| ncycle_out_mesh || 0 || int || Number of cycles between printing the mesh structure to standard out. Use a negative number to also print every time the mesh was modified. Default: 0 (i.e, off). |
|| ncrecv_bdry_buf_timeout_sec || -1.0 || Real || Timeout in seconds for the `ReceiveBoundaryBuffers` tasks. Disabed (negative) by default. Typically no need in production runs. Useful for debugging MPI calls. |
|| nlim || -1 || int || Stop criterion on total number of steps taken. Ignored if < 0. |
|| perf_cycle_offset || 0 || int || Skip the first N cycles when calculating the final performance (e.g., zone-cycles/wall_second). Allows to hide the initialization overhead in Parthenon. |
|| tlim || none || Real || Stop criterion on simulation time. |
+------------------------------+---------+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+


Expand All @@ -64,9 +75,9 @@ See the :ref:`sparse impl` documentation for details.
+--------------------+---------+--------+----------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Default | Type | Description |
+====================+=========+========+==============================================================================================================================================+
|| enable_sparse || `true` || bool || If set to false, sparse variables will always be allocated, see also :ref:`sparse run-time` |
|| alloc_threshold || 1e-12 || float || Global (for all sparse variables) threshold to trigger allocation of a variable if cells in the receiving ghost cells are above this value. |
|| dealloc_threshold || 1e-14 || float || Global (for all sparse variables) threshold to trigger deallocation if all active cells of a variable in a block are below this value. |
|| dealloc_count || 5 || int || First deallocate a sparse variable if the `dealloc_threshold` has been met in this number of consecutive cycles. |
|| dealloc_threshold || 1e-14 || float || Global (for all sparse variables) threshold to trigger deallocation if all active cells of a variable in a block are below this value. |
|| enable_sparse || `true` || bool || If set to false, sparse variables will always be allocated, see also :ref:`sparse run-time` |
+--------------------+---------+--------+----------------------------------------------------------------------------------------------------------------------------------------------+

20 changes: 14 additions & 6 deletions doc/sphinx/src/integrators.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,15 +32,17 @@ described in `Ketchson (2010)`_. These integrators are of the classic
.. math::
u^{(0)} &= u^n \\
u^{(i)} &= \sum_{k=0}^{i-1} (\alpha_{i,k} u^{(k)} + \Delta t \beta_{i, k} F(u^{(k)})\\
u^{(i)} &= \sum_{k=0}^{i-1} (\alpha_{i,k} u^{(k)} + \Delta t \beta_{i, k} F(t^n+c_k \Delta t, u^{(k)}))\\
u^{n+1} &= u^{(m)}
where superscripts in parentheses mean subcycles in a Runge-Kutta
integration and :math:`F` is the right-hand-side of ODE system. The
integration and :math:`F` is the right-hand-side of ODE system. Note
that the time dependence of :math:`F` is explicitly included in the above
formulation. The
difference between these low-storage methods and the classic SSPK
methods is that the low-storage methods typically have sparse
:math:`\alpha` and :math:`\beta` matrices, which are replaced by
diagonal termes, named :math:`\gamma_0` and :math:`\gamma_1`
diagonal terms, named :math:`\gamma_0` and :math:`\gamma_1`
respectively.

These methods can be generalized to support more general methods with
Expand All @@ -52,11 +54,17 @@ The full update then takes the form:
.. math::
u^{(1)} &:= u^{(1)} + \delta_s u^{(0)} \\
u^{(0)} &:= \gamma_{s0} u^{(0)} + \gamma_{s1} u^{(1)} + \beta_{s,s-1} \Delta t F(u^{(0)})
u^{(0)} &:= \gamma_{s0} u^{(0)} + \gamma_{s1} u^{(1)} + \beta_{s,s-1} \Delta t F(t^n+c_s\Delta t, u^{(0)})
where here :math:`u^{(0)}` and :math:`u^{(1)}` are the two storage
buffers required to compute the update for a given Runge-Kutta stage
:math:`s`.
:math:`s`. While the :math:`\delta`, :math:`\beta`, :math:`\gamma_0` and :math:`\gamma_1`
associated with a particular scheme are published in the literature, :math:`c` is not.
Instead, :math:`c` is computed following the procedure outlined in
`Ketchson (2010)`_ for obtaining the Butcher coefficients from their low-storage
counterparts.
A Mathematica notebook to calculate :math:`c` is provided
`here <https://github.com/parthenon-hpc-lab/parthenon/blob/develop/scripts/mathematica/sparse_integrators.nb>`__.

.. _Ketchson (2010): https://doi.org/10.1016/j.jcp.2009.11.006

Expand All @@ -65,7 +73,7 @@ buffers required to compute the update for a given Runge-Kutta stage
.. _Athena++ paper: https://doi.org/10.3847/1538-4365/ab929b

The ``LowStorageIntegrator`` contains arrays for ``delta``, ``beta``,
``gam0``, and ``gam1``. Available integration methods are:
``gam0``, ``gam1``, and ``c``. Available integration methods are:

* ``RK1``, which is simply forward Euler.

Expand Down
48 changes: 48 additions & 0 deletions doc/sphinx/src/utilities.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
``TypeList``s
=============
Provides a wrapper class around a variadic pack of types to simplify
performing compile time operations on the pack. There are templated
types defined giving the type at a particular index in the pack, types
for extracting sub-``TypeList``s of the original type list, and ``constexpr``
functions for getting the index of the first instance of a type in the
pack. Additionally it provides a capability for iterating an ``auto`` lambda
over the type list, which can be useful for calling a ``static`` function
defined for each of the types on each of the types. *In the future, it
would be nice to have the capability to make a unique type list from
another type list (i.e. the unique one only a single instance of each type
in the original type list)*

``TypeList``s have many applications and are commonly found in many
codebases, but in Parthenon one of the main use cases is for storing
lists of types associated with field variables that are used in type
based ``SparsePack``s.
Robust
======
Provides a number of functions for doing operations on floating point
numbers that are bounded, avoid division by zero, etc.
C++11 Style Concepts Implementation
===================================
*This documentation needs to be written (see issue #695), but there are
extensive comments in src/utlils/concepts_lite.hpp and examples of
useage in tst/unit/test_concepts_lite.hpp*
``Indexer``
===========

Provides functionality for iterating over an arbitrary dimensional
hyper-rectangular index space using a flattened loop. Specific
instantiations, e.g. ``Indexer5D``, are provided up to eight
dimensions. Useage:

.. code:: cpp
Indexer4D idxer({0, 3}, {1, 2}, {0, 5}, {10, 16});
for (int flat_idx = 0; flat_idx < idxer.size(); ++flat_idx) {
auto [i, j, k, l] = idxer(flat_idx);
// Do stuff in the 4D index space...
}
6 changes: 2 additions & 4 deletions example/particles/parthinput.particles
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,8 @@ refinement = none
nx1 = 16
x1min = -0.5
x1max = 0.5
ix1_bc = user
ox1_bc = user
# ix1_bc = periodic # Optionally use periodic boundary conditions everywhere
# ox1_bc = periodic
ix1_bc = periodic
ox1_bc = periodic

nx2 = 16
x2min = -0.5
Expand Down
Loading

0 comments on commit c0d7f11

Please sign in to comment.