Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a MeshData variant for refinement tagging #1182

Open
wants to merge 20 commits into
base: develop
Choose a base branch
from

Conversation

acreyes
Copy link
Contributor

@acreyes acreyes commented Sep 29, 2024

PR Summary

Introduces a boolean input parameter parthenon/mesh/CheckRefineMesh that will switch to using a loop over MeshData to do the refinement tagging. Default is false to use the original MeshBlockData based functions. Also adds package function CheckRefinementMesh to do the same.

Uses a ScatterView over the mesh blocks to hold the refinement tags, which get resolved against any package CheckRefinementBlock criteria. There are also some added utilities to ParArrayGeneric to ease getting a ScatterView with other reduction Ops and contributing them into the ParArray.

PR Checklist

  • Code passes cpplint
  • New features are documented.
  • Adds a test for any bugs fixed. Adds tests for new features.
  • Code is formatted
  • Changes are summarized in CHANGELOG.md
  • Change is breaking (API, behavior, ...)
    • Change is additionally added to CHANGELOG.md in the breaking section
    • PR is marked as breaking
    • Short summary API changes at the top of the PR (plus optionally with an automated update/fix script)
  • CI has been triggered on Darwin for performance regression tests.
  • Docs build
  • (@lanl.gov employees) Update copyright on changed files

Copy link
Collaborator

@Yurlungur Yurlungur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this! This was a little thing that's been on my mind as something we should fix for a long time.

Comment on lines +219 to +235
// utilities for scatter views
template <typename Op = Kokkos::Experimental::ScatterSum>
auto ToScatterView() {
using view_type = std::remove_cv_t<std::remove_reference_t<Data>>;
using data_type = typename view_type::data_type;
using exec_space = typename view_type::execution_space;
using layout = typename view_type::array_layout;
return Kokkos::Experimental::ScatterView<data_type, layout, exec_space, Op>(data_);
}

template <class ScatterView_t>
void ContributeScatter(ScatterView_t scatter) {
static_assert(
is_specialization_of<ScatterView_t, Kokkos::Experimental::ScatterView>::value,
"Need to provide a Kokkos::Experimental::ScatterView");
Kokkos::Experimental::contribute(data_, scatter);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. This is an elegant solution.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed.

@@ -37,14 +37,24 @@ std::shared_ptr<StateDescriptor> Initialize(ParameterInput *pin);
template <typename T>
TaskStatus Tag(T *rc);

AmrTag CheckAllRefinement(MeshBlockData<Real> *rc);
AmrTag CheckAllRefinement(MeshBlockData<Real> *rc,
const AmrTag &level = AmrTag::derefine);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the default level tag for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was to avoid breaking things calling it the old way, but looking more closely I think that is only MeshRefinement::CheckRefinementCondition(), which can probably be removed now.

Comment on lines 39 to 41
bool check_refine_mesh =
pin->GetOrAddBoolean("parthenon/mesh", "CheckRefineMesh", false);
ref->AddParam("check_refine_mesh", check_refine_mesh);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this flag is only for the default refinement operators, right? I.e., each package will do its own thing, so it's possible for package (a) to check refinement on the mesh and package (b) to do it per block?

If so, I think I would actually default this to true, assuming we believe the performance characteristics will be favorable. This is only a runtime parameter for the built in refinement ops, not the custom ones, so I don't think changing the default should break downstream. And it's the more performant/sane option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this flag is only for the default refinement operators, right? I.e., each package will do its own thing, so it's possible for package (a) to check refinement on the mesh and package (b) to do it per block?

That's right

@Yurlungur Yurlungur added enhancement New feature or request performance refactor An improvement to existing code. labels Sep 29, 2024
@Yurlungur
Copy link
Collaborator

Oh one other thing---we should enable this in the tests. It seems like fine advection now uses it? That might be good enough, but we might also turn it on for, e.g., calculate pi, but leave advection and sparse advection alone, so we stress both code paths.

@acreyes
Copy link
Contributor Author

acreyes commented Sep 29, 2024

Oh one other thing---we should enable this in the tests. It seems like fine advection now uses it? That might be good enough, but we might also turn it on for, e.g., calculate pi, but leave advection and sparse advection alone, so we stress both code paths.

With CheckRefineMesh=true now as the default calculate pi & fine-advection will now use it. I think the burgers benchmark is the only other problem I think that uses the built in criteria.

Copy link
Collaborator

@pgrete pgrete left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very happy this getting added to the code base. One spot less with missing MeshDatacallbacks.

I do have a couple of comments, that also might deserve additional discussion.

@@ -123,6 +123,8 @@ TaskCollection BurgersDriver::MakeTaskCollection(BlockList_t &blocks, const int
// estimate next time step
if (stage == integrator->nstages) {
auto new_dt = tl.AddTask(update, EstimateTimestep<MeshData<Real>>, mc1.get());
auto tag_refine =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need a if (pmesh->adaptive) { here, don't we?

Also, conceptually, we previously checked for refinement after the (physical) boundary conditions were set, which I think is more appropriate.

Independent of this PR , it just occurred to me that we're setting the timestep inside the Step() of the driver and only outside do the actual refinement.
Doesn't this potentially cause issues with violating the cfl on fine blocks after a block was refined?
Also pinging @jdolence @lroberts36 @Yurlungur @bprather here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do need pmesh->adaptive.

Regarding the time step control... I agree in principle that could be a problem but we never encountered any issues, so I wonder if there's something I'm missing there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do need pmesh->adaptive.

Agreed. Its fixed now and I also moved to the ApplyBoundaryConditionsMD to keep everything all in the same task region to tag after applying the BC.

Regarding the time step control... I agree in principle that could be a problem but we never encountered any issues, so I wonder if there's something I'm missing there.

I think because the refined variable is 10x all the other components it dominates the CFL condition. As a little experiment I offset the spatial profile of the non-refined components and gave one of them a 50x multiplier to try and force where the timestep is overestimated before a refinement. There is definitely some non-monotonicity compared to uniformly refined case.
image
image

@@ -94,12 +94,59 @@ std::shared_ptr<StateDescriptor> Initialize(ParameterInput *pin) {
pkg->AddField<Conserved::divD>(
Metadata({Metadata::Cell, Metadata::Derived, Metadata::OneCopy}));

pkg->CheckRefinementBlock = CheckRefinement;
bool check_refine_mesh =
pin->GetOrAddBoolean("parthenon/mesh", "CheckRefineMesh", true);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not mix downstream/example parameters with Parthenon intrinsic parameters, i.e., CheckRefineMesh should belong in the <Advection> input block (and maybe also get a short commend, that his is an example usage for academic/test purposes and that downstream codes typically only use either or -- with a strong recommendation for the MeshData variant)

Comment on lines +329 to +331
void CheckRefinement(MeshData<Real> *mc, ParArray1D<AmrTag> &delta_level) const {
if (CheckRefinementMesh != nullptr) CheckRefinementMesh(mc, delta_level);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two notes on notation/naming (not saying this needs to be changed for the PR to pass, just for reference).

  1. delta_level still goes back to the old Athena++ that didn't use enum class AmrTag, which I had to remind myself of as I was confused by the delta_level naming. Might be worth at some point to just rename to amr_tags (or similar)

  2. I assume that you picked mc mirror the rc for the blocks. rc goes back to the original "RealContainer -- i.e., container holding Real data", which has evolved to (much more powerful/flexible) MeshBlockData. So in other places, we've been trying to use/replace rc with mbd and then equivalently using md for the MeshData.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really good to know. I've been confused a bit about the naming seeing all these variations. The context makes it easy to remember what is preferred.

Comment on lines +382 to +383
std::function<void(MeshData<Real> *rc, ParArray1D<AmrTag> &delta_level)>
CheckRefinementMesh = nullptr;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +39 to +41
bool check_refine_mesh =
pin->GetOrAddBoolean("parthenon/mesh", "CheckRefineMesh", true);
ref->AddParam("check_refine_mesh", check_refine_mesh);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I now see that parthenon/mesh/CheckRefineMesh is also being used outside of the example.
Personally, I'm not a super big fan of these switches for callback functions (for most other callback functions we only allow to enroll the MeshData version or the MeshBlockData and check if both are enrolled [and then throw an error]).
I'd be in favor of removing the input file switch and go with that pattern, but more people should weigh in here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main use is for deciding which amr_criteria variant to use, rather than for the callbacks. Maybe there is a better name to avoid the confusion, something like MeshDataCriteria. I was avoiding removing the MeshBlockData derivative criteria, but it shouldn't be breaking.

As it is written nothing checks whether both callbacks are enrolled, but I can add that as you've described. I only used the switch to easily check that it was giving the same refinement patterns.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be nice to get additional input here (@lroberts36 @jdolence @bprather @brryan @Yurlungur ) with regard to whether this a pattern we generally want to introduce to the codebase.
I'm in favor of (potentially) breaking backwards compatibility (though I think it should also work without breaking) over introduce a runtime switch that determines the callback function (at the Parthenon level).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would vote for not introducing a runtime parameter that switches the callback function. I think we should be moving to using MeshData based functions everywhere and encouraging downstream users to do this as well, so I would also support removing the MeshBlockData based criteria as well (unless this ends up putting a significant burden on some downstream codes?).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Entirely removing the MeshBlockData callback would be breaking. In riot at least that would require some effort, though not a lot, to conform. I think that would be better in the long term though and it wouldn't be too big of a burden, assuming MeshData is alwasy faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runtime switch isn't for the callbacks, but rather for the refinement criteria. The callbacks are treated the same as the EstimateTimeStep callbacks.

If we don't have the runtime switch then I think the MeshBlockData based amr_criteria would need to be removed. Unless there is some other way to keep both but not call both in each cycle.

Comment on lines +219 to +235
// utilities for scatter views
template <typename Op = Kokkos::Experimental::ScatterSum>
auto ToScatterView() {
using view_type = std::remove_cv_t<std::remove_reference_t<Data>>;
using data_type = typename view_type::data_type;
using exec_space = typename view_type::execution_space;
using layout = typename view_type::array_layout;
return Kokkos::Experimental::ScatterView<data_type, layout, exec_space, Op>(data_);
}

template <class ScatterView_t>
void ContributeScatter(ScatterView_t scatter) {
static_assert(
is_specialization_of<ScatterView_t, Kokkos::Experimental::ScatterView>::value,
"Need to provide a Kokkos::Experimental::ScatterView");
Kokkos::Experimental::contribute(data_, scatter);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed.

auto ib = md->GetBoundsI(IndexDomain::entire);
auto jb = md->GetBoundsJ(IndexDomain::entire);
auto kb = md->GetBoundsK(IndexDomain::entire);
auto scatter_levels = delta_levels.ToScatterView<Kokkos::Experimental::ScatterMax>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, and for the other calls: We might need a reset() call here, don't we?

Also kokkos/kokkos#6363 is not an issue here because the delta_levels are reset to derefine on any call anyway, aren't they?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what reset() does. Is it for the values in the ScatterView, but doesn't effect the original view?

Also kokkos/kokkos#6363 is not an issue here because the delta_levels are reset to derefine on any call anyway, aren't they?

yes, they are reset at the inital call to CheckAllRefinement and accumulated through all the packages & criteria

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what reset() does. Is it for the values in the ScatterView, but doesn't effect the original view?

That is a very good question. I just followed the example on p120 of file:///home/pgrete/Downloads/KokkosTutorial_ORNL20.pdf when I implemented the scatterview for the histograms.

I'm going to ask on the Kokkos Slack.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I got some info:

Stan Moore
  Yesterday at 9:09 PM
Use reset() if you want to zero out all the values in the scatter view, e.g. you used it in one kernel and now want to use it in another. Note that there is also the reset_except() which can preserve the values in the original view, e.g. see https://github.com/lammps/lammps/blob/develop/src/KOKKOS/fix_qeq_reaxff_kokkos.cpp#L875C5-L876C31


Philipp Grete
  Today at 8:42 PM
I see. So just to double check for our use case (which is a view with N elements that by default is set to -1. We need to update that view every cycle, so we currently (re)set/deep_copy -1 to the view at the beginning of each cycle and then use a scatterview to update that view in every cycle. The kernel using the scatterview may be called multiple times, but always for distinct elements of N):
I should call scatterview.reset_except(original_view) so that we keep the -1 and also only call it once after we set the values in the original view (rather than before each kernel launch) so that the data in the scatter view remains consistent across kernel launches. Moreover, we only need to call contribute once after all the kernels ran.
Am I missing sth.?


Stan Moore
  9 minutes ago
> I should call scatterview.reset_except(original_view) so that we keep the -1 and also only call it once after we set the values in the original view (rather than before each kernel launch) so that the data in the scatter view remains consistent across kernel launches.
If the ScatterView persists across cycles, then yes you need to reset it. And yes you need to use reset_except so the -1 values are not overwritten in the original view. If it is reallocated at the beginning of every cycle, the values of the extra copies are already zero-initialized on creation by default, so no reset would be needed.
> Moreover, we only need to call contribute once after all the kernels ran.
Yes the values will persist as long as you don't call reset. (edited) 

So, with the current pattern, it seems that we need a reset_except (and contribute) outside the loop launching the check refinement kernel over MeshData.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome, thanks @pgrete!

So, with the current pattern, it seems that we need a reset_except (and contribute) outside the loop launching the check refinement kernel over MeshData.

As it is now the ScatterView is created before each kernel and contributed after. I don't think the reset is necessary since the ScatterViews don't persist in between the kernel launches.

Comment on lines +126 to +132
KOKKOS_LAMBDA(parthenon::team_mbr_t team_member, const int b, const int n,
const int k) {
typename Kokkos::MinMax<Real>::value_type minmax;
par_reduce_inner(
parthenon::inner_loop_pattern_ttr_tag, team_member, jb.s, jb.e, ib.s, ib.e,
[&](const int j, const int i,
typename Kokkos::MinMax<Real>::value_type &lminmax) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific reason for the b,n,k / j,i split?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

par_for_outer only has overloads up to 3D was the constraint here. #1142 would allow more flexibility.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. That makes sense. In that case, I suggest to open an issue (before merging) this to keep track of updating these functions once #1142 is in.

Comment on lines 140 to 145
auto levels_access = scatter_levels.access();
auto flag = AmrTag::same;
if (minmax.max_val > refine_tol && minmax.min_val < derefine_tol)
flag = AmrTag::refine;
if (minmax.max_val < derefine_tol) flag = AmrTag::derefine;
levels_access(b).update(flag);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work as expected (in particular with the split across k, and j/i (similarly below for the derivs)?
If we refine just a single cell in a block, this info should be persistent even if other cells in that block is same or derefine, so currently there could be a clash of info across ks doesn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. It should be equivalent to doing

Kokkos::atomic_max(&delta_levels(b), flag);

So the race condition is across ks and the max ensures that refinement always wins

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, yes. I didn't see see. Might be worth adding a comment in that direction (at least to the common/non-example function in src/amr_criteria/refinement_package.cpp.

n5 = dims[0];
n4 = dims[1];
}
const int idx = comp4 + n4 * (comp5 + n5 * comp6);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we don't have a simpler way to get a flat index (I imagine there must be more places with thus use case but I don't recall offhand).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance refactor An improvement to existing code.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants