Skip to content

Commit

Permalink
Merge branch 'develop' into bprather/driver-overridable
Browse files Browse the repository at this point in the history
  • Loading branch information
pgrete authored Aug 26, 2024
2 parents 554a087 + 3a969ff commit 334a8b9
Show file tree
Hide file tree
Showing 15 changed files with 273 additions and 124 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
## Current develop

### Added (new features/APIs/variables/...)
- [[PR 1159]](https://github.com/parthenon-hpc-lab/parthenon/pull/1159) Add additional timestep controllers in parthenon/time.
- [[PR 1148]](https://github.com/parthenon-hpc-lab/parthenon/pull/1148) Add `GetPackDimension` to `StateDescriptor` for calculating pack sizes before `Mesh` initialization
- [[PR 1143]](https://github.com/parthenon-hpc-lab/parthenon/pull/1143) Add tensor indices to VariableState, add radiation constant to constants, add TypeLists, allow for arbitrary containers for solvers
- [[PR 1140]](https://github.com/parthenon-hpc-lab/parthenon/pull/1140) Allow for relative convergence tolerance in BiCGSTAB solver.
Expand Down Expand Up @@ -30,12 +31,15 @@
- [[PR 1019]](https://github.com/parthenon-hpc-lab/parthenon/pull/1019) Enable output for non-cell-centered variables

### Changed (changing behavior/API/variables/...)
- [[PR 1153]](https://github.com/parthenon-hpc-lab/parthenon/pull/1153) Allow base grid with fewer blocks than ranks before initial AMR
- [[PR 1105]](https://github.com/parthenon-hpc-lab/parthenon/pull/1105) Refactor parameter input for linear solvers
- [[PR 1078]](https://github.com/parthenon-hpc-lab/parthenon/pull/1078) Add reduction fallback in 1D. Add IndexRange overload for 1D par loops
- [[PR 1024]](https://github.com/parthenon-hpc-lab/parthenon/pull/1024) Add .outN. to history output filenames
- [[PR 1004]](https://github.com/parthenon-hpc-lab/parthenon/pull/1004) Allow parameter modification from an input file for restarts

### Fixed (not changing behavior/API/variables/...)
- [[PR 1150]](https://github.com/parthenon-hpc-lab/parthenon/pull/1150) Reduce memory consumption for buffer pool
- [[PR 1146]](https://github.com/parthenon-hpc-lab/parthenon/pull/1146) Fix an issue outputting >4GB single variables per rank
- [[PR 1152]](https://github.com/parthenon-hpc-lab/parthenon/pull/1152) Fix memory leak in task graph outputs related to `abi::__cxa_demangle`
- [[PR 1146]](https://github.com/parthenon-hpc-lab/parthenon/pull/1146) Fix an issue outputting >4GB single variables per rank
- [[PR 1144]](https://github.com/parthenon-hpc-lab/parthenon/pull/1144) Fix some restarts w/non-CC fields
Expand Down
25 changes: 18 additions & 7 deletions doc/sphinx/src/inputs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ General parthenon options such as problem name and parameter handling.
+---------------------+---------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Default | Type | Description |
+=====================+=========+=========+========================================================================================================================================================================================================+
|| name || none || string || Name of this problem or initialization, prefixed to output files. |
|| archive_parameters || false || string || Produce a parameter file containing all parameters known to Parthenon. Set to `true` for an output file named `parthinput.archive`. Set to `timestamp` for a file with a name containing a timestamp. |
|| name || none || string || Name of this problem or initialization, prefixed to output files. |
+---------------------+---------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+


Expand All @@ -34,12 +34,23 @@ Options related to time-stepping and printing of diagnostic data.
+------------------------------+---------+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Default | Type | Description |
+==============================+=========+========+=======================================================================================================================================================================+
|| tlim || none || float || Stop criterion on simulation time. |
|| nlim || -1 || int || Stop criterion on total number of steps taken. Ignored if < 0. |
|| perf_cycle_offset || 0 || int || Skip the first N cycles when calculating the final performance (e.g., zone-cycles/wall_second). Allows to hide the initialization overhead in Parthenon. |
|| dt_ceil || none || Real || The maximum allowed timestep. |
|| dt_factor || 2.0 || Real || The maximum allowed relative increase of the timestep over the previous value. |
|| dt_floor || none || Real || The minimum allowed timestep. |
|| dt_force || none || Real || Force the timestep to this value, ignoring all other conditions. |
|| dt_init || none || Real || The maximum allowed timestep during the first cycle. |
|| dt_init_force || none || bool || If set to true, force the first cycle's timestep to the value given by dt_init. |
|| dt_min || none || Real || If the timestep falls below dt_min for dt_min_cycle_limit cycles, Parthenon fatals. |
|| dt_min_cycle_limit || 10 || int || The maximum number of cycles the timestep can be below dt_min. |
|| dt_max || none || Real || If the timestep falls below dt_max for dt_max_cycle_limit cycles, Parthenon fatals. |
|| dt_max_cycle_limit || 1 || int || The maximum number of cycles the timestep an be above dt_max. |
|| dt_user || none || Real || Set a global timestep limit. |
|| ncrecv_bdry_buf_timeout_sec || -1.0 || Real || Timeout in seconds for the `ReceiveBoundaryBuffers` tasks. Disabed (negative) by default. Typically no need in production runs. Useful for debugging MPI calls. |
|| ncycle_out || 1 || int || Number of cycles between short diagnostic output to standard out containing, e.g., current time, dt, zone-update/wsec. Default: 1 (i.e, every cycle). |
|| ncycle_out_mesh || 0 || int || Number of cycles between printing the mesh structure to standard out. Use a negative number to also print every time the mesh was modified. Default: 0 (i.e, off). |
|| ncrecv_bdry_buf_timeout_sec || -1.0 || Real || Timeout in seconds for the `ReceiveBoundaryBuffers` tasks. Disabed (negative) by default. Typically no need in production runs. Useful for debugging MPI calls. |
|| nlim || -1 || int || Stop criterion on total number of steps taken. Ignored if < 0. |
|| perf_cycle_offset || 0 || int || Skip the first N cycles when calculating the final performance (e.g., zone-cycles/wall_second). Allows to hide the initialization overhead in Parthenon. |
|| tlim || none || Real || Stop criterion on simulation time. |
+------------------------------+---------+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+


Expand All @@ -64,9 +75,9 @@ See the :ref:`sparse impl` documentation for details.
+--------------------+---------+--------+----------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Default | Type | Description |
+====================+=========+========+==============================================================================================================================================+
|| enable_sparse || `true` || bool || If set to false, sparse variables will always be allocated, see also :ref:`sparse run-time` |
|| alloc_threshold || 1e-12 || float || Global (for all sparse variables) threshold to trigger allocation of a variable if cells in the receiving ghost cells are above this value. |
|| dealloc_threshold || 1e-14 || float || Global (for all sparse variables) threshold to trigger deallocation if all active cells of a variable in a block are below this value. |
|| dealloc_count || 5 || int || First deallocate a sparse variable if the `dealloc_threshold` has been met in this number of consecutive cycles. |
|| dealloc_threshold || 1e-14 || float || Global (for all sparse variables) threshold to trigger deallocation if all active cells of a variable in a block are below this value. |
|| enable_sparse || `true` || bool || If set to false, sparse variables will always be allocated, see also :ref:`sparse run-time` |
+--------------------+---------+--------+----------------------------------------------------------------------------------------------------------------------------------------------+

2 changes: 1 addition & 1 deletion src/bvals/comms/bnd_info.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -332,7 +332,7 @@ BndInfo BndInfo::GetSetBndInfo(MeshBlock *pmb, const NeighborBlock &nb,
out.buf_allocated = false;
} else {
printf("%i [rank: %i] -> %i [rank: %i] (Set %s) is in state %i.\n", nb.gid, nb.rank,
pmb->gid, Globals::my_rank, v->label().c_str(), buf_state);
pmb->gid, Globals::my_rank, v->label().c_str(), static_cast<int>(buf_state));
PARTHENON_FAIL("Buffer should be in a received state.");
}
return out;
Expand Down
14 changes: 8 additions & 6 deletions src/bvals/comms/boundary_communication.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -79,10 +79,12 @@ TaskStatus SendBoundBufs(std::shared_ptr<MeshData<Real>> &md) {
}
}
// Restrict
auto pmb = md->GetBlockData(0)->GetBlockPointer();
StateDescriptor *resolved_packages = pmb->resolved_packages.get();
refinement::Restrict(resolved_packages, cache.prores_cache, pmb->cellbounds,
pmb->c_cellbounds);
if (md->NumBlocks() > 0) {
auto pmb = md->GetBlockData(0)->GetBlockPointer();
StateDescriptor *resolved_packages = pmb->resolved_packages.get();
refinement::Restrict(resolved_packages, cache.prores_cache, pmb->cellbounds,
pmb->c_cellbounds);
}

// Load buffer data
auto &bnd_info = cache.bnd_info;
Expand Down Expand Up @@ -335,7 +337,7 @@ TaskStatus SetBounds(std::shared_ptr<MeshData<Real>> &md) {
#endif
std::for_each(std::begin(cache.buf_vec), std::end(cache.buf_vec),
[](auto pbuf) { pbuf->Stale(); });
if (nbound > 0 && pmesh->multilevel) {
if (nbound > 0 && pmesh->multilevel && md->NumBlocks() > 0) {
// Restrict
auto pmb = md->GetBlockData(0)->GetBlockPointer();
StateDescriptor *resolved_packages = pmb->resolved_packages.get();
Expand Down Expand Up @@ -377,7 +379,7 @@ TaskStatus ProlongateBounds(std::shared_ptr<MeshData<Real>> &md) {
}
}

if (nbound > 0 && pmesh->multilevel) {
if (nbound > 0 && pmesh->multilevel && md->NumBlocks() > 0) {
auto pmb = md->GetBlockData(0)->GetBlockPointer();
StateDescriptor *resolved_packages = pmb->resolved_packages.get();

Expand Down
49 changes: 42 additions & 7 deletions src/bvals/comms/build_boundary_buffers.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,12 @@
//========================================================================================

#include <algorithm>
#include <cstddef>
#include <iostream> // debug
#include <memory>
#include <random>
#include <string>
#include <unordered_map>
#include <vector>

#include "bvals_in_one.hpp"
Expand All @@ -44,25 +46,58 @@ template <BoundaryType BTYPE>
void BuildBoundaryBufferSubset(std::shared_ptr<MeshData<Real>> &md,
Mesh::comm_buf_map_t &buf_map) {
Mesh *pmesh = md->GetMeshPointer();
std::unordered_map<int, int>
nbufs; // total (existing and new) number of buffers for given size

ForEachBoundary<BTYPE>(md, [&](auto pmb, sp_mbd_t /*rc*/, nb_t &nb, const sp_cv_t v) {
// Calculate the required size of the buffer for this boundary
int buf_size = GetBufferSize(pmb, nb, v);
// LR: Multigrid logic requires blocks sending messages to themselves (since the same
// block can show up on two multigrid levels). This doesn't require any data
// transfer, so the message size can be zero. It is essentially just a flag to show
// that the block is done being used on one level and can be used on the next level.
if (pmb->gid == nb.gid && nb.offsets.IsCell()) buf_size = 0;

nbufs[buf_size] += 1; // relying on value init of int to 0 for initial entry
});

ForEachBoundary<BTYPE>(md, [&](auto pmb, sp_mbd_t /*rc*/, nb_t &nb, const sp_cv_t v) {
// Calculate the required size of the buffer for this boundary
int buf_size = GetBufferSize(pmb, nb, v);
// See comment above on the same logic.
if (pmb->gid == nb.gid && nb.offsets.IsCell()) buf_size = 0;

// Add a buffer pool if one does not exist for this size
using buf_t = buf_pool_t<Real>::base_t;
if (pmesh->pool_map.count(buf_size) == 0) {
pmesh->pool_map.emplace(std::make_pair(
buf_size, buf_pool_t<Real>([buf_size](buf_pool_t<Real> *pool) {
using buf_t = buf_pool_t<Real>::base_t;
// TODO(LFR): Make nbuf a user settable parameter
const int nbuf = 200;
buf_t chunk("pool buffer", buf_size * nbuf);
// Might be worth discussing what a good default is.
// Using the number of packs, assumes that all blocks in a pack have fairly similar
// buffer configurations, which may or may not be a good approximation.
// An alternative would be "1", which would reduce the memory footprint, but
// increase the number of individual memory allocations.
const int64_t nbuf = pmesh->DefaultNumPartitions();
pmesh->pool_map.emplace(
buf_size, buf_pool_t<Real>([buf_size, nbuf](buf_pool_t<Real> *pool) {
const auto pool_size = nbuf * buf_size;
buf_t chunk("pool buffer", pool_size);
for (int i = 1; i < nbuf; ++i) {
pool->AddFreeObjectToPool(
buf_t(chunk, std::make_pair(i * buf_size, (i + 1) * buf_size)));
}
return buf_t(chunk, std::make_pair(0, buf_size));
})));
}));
}
// Now that the pool is guaranteed to exist we can add free objects of the required
// amount.
auto &pool = pmesh->pool_map.at(buf_size);
const std::int64_t new_buffers_req = nbufs.at(buf_size) - pool.NumBuffersInPool();
if (new_buffers_req > 0) {
const auto pool_size = new_buffers_req * buf_size;
buf_t chunk("pool buffer", pool_size);
for (int i = 0; i < new_buffers_req; ++i) {
pool.AddFreeObjectToPool(
buf_t(chunk, std::make_pair(i * buf_size, (i + 1) * buf_size)));
}
}

const int receiver_rank = nb.rank;
Expand Down
1 change: 1 addition & 0 deletions src/coordinates/uniform_cartesian.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -318,6 +318,7 @@ class UniformCartesian {
const std::array<Real, 3> &GetXmin() const { return xmin_; }
const std::array<int, 3> &GetStartIndex() const { return istart_; }
const char *Name() const { return name_; }
static const char *StaticName() { return name_; }

private:
std::array<int, 3> istart_;
Expand Down
Loading

0 comments on commit 334a8b9

Please sign in to comment.