Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random AMR failure in ExportCoordinates1D #5797

Closed
nilsvu opened this issue Feb 23, 2024 · 2 comments · Fixed by #5803
Closed

Random AMR failure in ExportCoordinates1D #5797

nilsvu opened this issue Feb 23, 2024 · 2 comments · Fixed by #5803
Assignees

Comments

@nilsvu
Copy link
Member

nilsvu commented Feb 23, 2024

This happened here: https://github.com/sxs-collaboration/spectre/actions/runs/8015560168/job/21895920997?pr=5796

Charm++> Running on MPI version: 3.1
Charm++> level of thread support used: -1 (desired: 0)
Charm++> Running in SMP mode: 1 processes, 1 worker threads (PEs) + 1 comm threads per process, 1 PEs total
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: v7.0.0
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 hosts (1 sockets x 2 cores x 2 PUs = 4-way SMP)
Charm++> cpu topology info is gathered in 0.001 seconds.

Executing 'ExportCoordinates1D' using 1 processors.
Launch command line: /__w/spectre/spectre/build/bin/ExportCoordinates1D --input-file /__w/spectre/spectre/tests/InputFiles/ExportCoordinates/Input1D.yaml
Charm++ startup time in seconds: 0.018598
Date and time at startup: Fri Feb 23 06:43:58 2024

SpECTRE Build Information:
Version:                      2024.02.05
Compiled on host:             54076d3c0e08
Compiled in directory:        /__w/spectre/spectre/build
Source directory is:          /__w/spectre/spectre
Compiled on git branch:       HEAD
Compiled on git revision:     f246540
Linked on:                    Fri Feb 23 06:30:51 2024
Build type:                   Release

The following options differ from their suggested values:

Option parsing completed.

Allocating Singletons:
Component on node 0, global proc 0, exclusive = false


----- Domain Info -----
Total blocks: 1
Total elements: 2
Total grid points: 6
Number of cores: 1
Number of nodes: 1
Elements per core: (2)
Elements per node: (2)
Grid points per core: (6)
Grid points per node: (6)
-----------------------

Entering phase: Register at time 00:00:00
Entering phase: CheckDomain at time 00:00:00
Number of elements: 2
Number of grid points: 6
Average refinement levels: (1)
Average grid points: (3)

Entering phase: Execute at time 00:00:00
Time: 0, Global inertial minimum grid spacing: 0.25
Entering phase from phase control: EvaluateAmrCriteria at time 00:00:00
Entering phase from phase control: AdjustDomain at time 00:00:00
Entering phase from phase control: CheckDomain at time 00:00:00
Number of elements: 1
Number of grid points: 3
Average refinement levels: (0)
Average grid points: (3)

Entering phase from phase control: Execute at time 00:00:00
Time: 0.1, Global inertial minimum grid spacing: 0.5
Entering phase from phase control: EvaluateAmrCriteria at time 00:00:00
Entering phase from phase control: AdjustDomain at time 00:00:00
Entering phase from phase control: CheckDomain at time 00:00:00
Number of elements: 1
Number of grid points: 3
Average refinement levels: (0)
Average grid points: (3)

Entering phase from phase control: Execute at time 00:00:00
Time: 0.2, Global inertial minimum grid spacing: 0.5
Entering phase from phase control: EvaluateAmrCriteria at time 00:00:00
Entering phase from phase control: AdjustDomain at time 00:00:00
Entering phase from phase control: CheckDomain at time 00:00:00
Number of elements: 1
Number of grid points: 3
Average refinement levels: (0)
Average grid points: (3)

Entering phase from phase control: Execute at time 00:00:00
Time: 0.3, Global inertial minimum grid spacing: 0.5
Entering phase from phase control: EvaluateAmrCriteria at time 00:00:00
Entering phase from phase control: AdjustDomain at time 00:00:00


###############################
The following exceptions were reported during the phase: AdjustDomain
Component: DgElementArray
Array Index: [B0,(L0I0)]
Phase: EvaluateAmrCriteria
Algorithm Step: 0
Message: 
############ ERROR ############
Stack trace:

  0. [error handling]
  1. [error handling]
  2. FixedHashMap<2ul, Direction<1ul>, Neighbors<1ul>, DirectionHash<1ul>, std::equal_to<Direction<1ul> > >::at(Direction<1ul> const&) - Resolve source file and line with: addr2line -fCpe /__w/spectre/spectre/build/lib/libAmr.so 0x1322d
  3. std::deque<ElementId<1ul>, std::allocator<ElementId<1ul> > > amr::ids_of_joining_neighbors<1ul>(Element<1ul> const&, std::array<amr::Flag, 1ul> const&) - Resolve source file and line with: addr2line -fCpe /__w/spectre/spectre/build/lib/libAmr.so 0x10168
  4. _ZN3amr7Actions12AdjustDomain5applyI14DgElementArrayI13MetavariablesILm1ELb0EEN7brigand4listIJN8Parallel12PhaseActionsILNS8_5PhaseE9ENS7_IJN14Initialization7Actions15InitializeItemsIJNSB_12TimeSteppingIS5_11TimeStepperEEN9evolution2dg14Initialization6DomainILm1ELb0EEENS_14Initialization10InitializeI [...] omputeILm1EEESY_S10_EEES5_EEvRN2db7DataBoxIT0_EERNS8_11GlobalCacheIT1_EERKS1Z_IXsrS58_10volume_dimEE - Resolve source file and line with: addr2line -fCpe /__w/spectre/spectre/build/bin/ExportCoordinates1D 0x674d92
  5. void Parallel::DistributedObject<DgElementArray<Metavariables<1ul, false>, brigand::list<Parallel::PhaseActions<(Parallel::Phase)9, brigand::list<Initialization::Actions::InitializeItems<Initialization::TimeStepping<Metavariables<1ul, false>, TimeStepper>, evolution::dg::Initialization::Domain<1ul,  [...] ggers, PhaseControl::Actions::ExecutePhaseChange> > > >::simple_action<amr::Actions::AdjustDomain>() - Resolve source file and line with: addr2line -fCpe /__w/spectre/spectre/build/bin/ExportCoordinates1D 0x674a3d
  6. void CkIndex_AlgorithmArray<DgElementArray<Metavariables<1ul, false>, brigand::list<Parallel::PhaseActions<(Parallel::Phase)9, brigand::list<Initialization::Actions::InitializeItems<Initialization::TimeStepping<Metavariables<1ul, false>, TimeStepper>, evolution::dg::Initialization::Domain<1ul, false [...] eChange> > > >, ElementId<1ul> >::_call_simple_action_void<amr::Actions::AdjustDomain>(void*, void*) - Resolve source file and line with: addr2line -fCpe /__w/spectre/spectre/build/bin/ExportCoordinates1D 0x6749ec
  7. CkLocRec::invokeEntry(CkMigratable*, void*, int, bool) in /work/charm_7_0_0/src/ck-core/cklocation.C:2263
  8. CkArrayBroadcaster::deliver(CkArrayMessage*, ArrayElement*, bool) in /work/charm_7_0_0/src/ck-core/ckarray.C:1371
  9. CkArray::recvBroadcast(CkMessage*) in /work/charm_7_0_0/src/ck-core/ckarray.C:1683
 10. CkDeliverMessageFree in /work/charm_7_0_0/src/ck-core/ck.C:553
 11. _deliverForBocMsg(CkCoreState*, int, envelope*, IrrGroup*) [clone .constprop.0] in /work/charm_7_0_0/src/ck-core/ck.C:1064
 12. _processHandler(void*, CkCoreState*) in /work/charm_7_0_0/src/ck-core/ck.C:1250
 13. CsdScheduleForever in /work/charm_7_0_0/src/conv-core/convcore.C:1943
 14. CsdScheduler in /work/charm_7_0_0/src/conv-core/convcore.C:1888
 15. ConverseRunPE(int) in /work/charm_7_0_0/src/arch/util/machine-common-core.C:1614
 16. call_startfn(void*) in /work/charm_7_0_0/src/arch/util/machine-smp.C:372
 17. start_thread in ./nptl/pthread_create.c:442
 18. /lib/x86_64-linux-gnu/libc.so.6(+0x126a40) [0x7f767067ca40] in ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83

Wall time: 00:00:00
Node: 0 Proc: 0
FixedHashMap::mapped_type &FixedHashMap<2, Direction<1>, Neighbors<1>, DirectionHash<1>>::at(const FixedHashMap::key_type &) [MaxSize = 2, Key = Direction<1>, ValueType = Neighbors<1>, Hash = DirectionHash<1>, KeyEqual = std::equal_to<Direction<1>>] in ../../../../src/DataStructures/FixedHashMap.hpp:489

+0 not in FixedHashMap
############ ERROR ############


Type: std::out_of_range

To determine where an exception is thrown, run gdb and do
catch throw EXCEPTION_TYPE
run
where EXCEPTION_TYPE is the Type of the exception above.
You may have to type `continue` to skip some option parser
exceptions until you get to the one you care about
You may also have to type `up` or `down` to go up and down
the function calls in order to find a useful line number.

Entering phase: PostFailureCleanup at time 00:00:00
PostFailureCleanup phase complete. Aborting.

Done!
Wall time: 00:00:00
Date and time at completion: Fri Feb 23 06:43:59 2024
@nilsvu
Copy link
Member Author

nilsvu commented Feb 23, 2024

Oh I think this might be my fault actually. In #5781 I kept the max refinement level around, but didn't impose a min refinement level of zero. I just assumed that was built in. I suppose you're going to add that when adding the max refinement level policy @kidder?

@knelli2
Copy link
Contributor

knelli2 commented Feb 26, 2024

@nilsvu Is this also the same bug?

27/126 Test   #28: InputFiles.ExportCoordinates.Input1D.yaml.execute ............................***Failed    0.98 sec
Charm++> No provisioning arguments specified. Running with a single PE.
         Use +auto-provision to fully subscribe resources or +p1 to silence this message.
Charm++: standalone mode (not using charmrun)
Charm++> Running in Multicore mode: 1 threads (PEs)
The following options differ from their suggested values:
Converse/Charm++ Commit ID: v7.0.0
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 hosts (1 sockets x 2 cores x 2 PUs = 4-way SMP)
Charm++> cpu topology info is gathered in 0.000 seconds.

Executing 'ExportCoordinates1D' using 1 processors.
Launch command line: /__w/spectre/spectre/build/bin/ExportCoordinates1D --input-file /__w/spectre/spectre/tests/InputFiles/ExportCoordinates/Input1D.yaml
Charm++ startup time in seconds: 0.051014
Date and time at startup: Mon Feb 26 19:51:12 2024

SpECTRE Build Information:
Version:                      2024.02.05
Compiled on host:             107dd09770cd
Compiled in directory:        /__w/spectre/spectre/build
Source directory is:          /__w/spectre/spectre
Compiled on git branch:       HEAD
Compiled on git revision:     0e3b118
Linked on:                    Mon Feb 26 18:54:08 2024
Build type:                   Debug


Option parsing completed.

Allocating Singletons:
Component on node 0, global proc 0, exclusive = false


Parallel components:
  Component (Singleton) has a DataBox with 17 items.
  DgElementArray (Array) has a DataBox with 48 items.
  Observer (Group) has a DataBox with 23 items.
  ObserverWriter (Nodegroup) has a DataBox with 29 items.


----- Domain Info -----
Total blocks: 1
Total elements: 2
Total grid points: 6
Number of cores: 1
Number of nodes: 1
Elements per core: (2)
Elements per node: (2)
Grid points per core: (6)
Grid points per node: (6)
-----------------------

Entering phase: Register at time 00:00:00
Entering phase: CheckDomain at time 00:00:00
Number of elements: 2
Number of grid points: 6
Average refinement levels: (1)
Average grid points: (3)

Entering phase: Execute at time 00:00:00
Time: 0, Global inertial minimum grid spacing: 0.25
Entering phase from phase control: EvaluateAmrCriteria at time 00:00:00
Entering phase from phase control: AdjustDomain at time 00:00:00
Entering phase from phase control: CheckDomain at time 00:00:00
Number of elements: 1
Number of grid points: 3
Average refinement levels: (0)
Average grid points: (3)

Entering phase from phase control: Execute at time 00:00:00
Time: 0.1, Global inertial minimum grid spacing: 0.5
Entering phase from phase control: EvaluateAmrCriteria at time 00:00:00
Entering phase from phase control: AdjustDomain at time 00:00:00


###############################
The following exceptions were reported during the phase: AdjustDomain
Component: DgElementArray
Array Index: [B0,(L0I0)]
Phase: EvaluateAmrCriteria
Algorithm Step: 0
Message: 
############ ASSERT FAILED ############
Stack trace:

  0. [error handling]
  1. SegmentId::id_of_parent() const - Resolve source file and line with: addr2line -fCpe /__w/spectre/spectre/build/lib/libAmr.so 0x8d517
  2. ElementId<1ul> amr::id_of_parent<1ul>(ElementId<1ul> const&, std::array<amr::Flag, 1ul> const&) - Resolve source file and line with: addr2line -fCpe /__w/spectre/spectre/build/lib/libAmr.so 0x8eb84
  3. _ZN3amr7Actions12AdjustDomain5applyI14DgElementArrayI13MetavariablesILm1ELb0EEN7brigand4listIJN8Parallel12PhaseActionsILNS8_5PhaseE9ENS7_IJN14Initialization7Actions15InitializeItemsIJNSB_12TimeSteppingIS5_11TimeStepperEEN9evolution2dg14Initialization6DomainILm1ELb0EEENS_14Initialization10InitializeI [...] omputeILm1EEESY_S10_EEES5_EEvRN2db7DataBoxIT0_EERNS8_11GlobalCacheIT1_EERKS1Z_IXsrS58_10volume_dimEE - Resolve source file and line with: addr2line -fCpe /__w/spectre/spectre/build/bin/ExportCoordinates1D 0xfc6730
  4. void Parallel::DistributedObject<DgElementArray<Metavariables<1ul, false>, brigand::list<Parallel::PhaseActions<(Parallel::Phase)9, brigand::list<Initialization::Actions::InitializeItems<Initialization::TimeStepping<Metavariables<1ul, false>, TimeStepper>, evolution::dg::Initialization::Domain<1ul,  [...] ggers, PhaseControl::Actions::ExecutePhaseChange> > > >::simple_action<amr::Actions::AdjustDomain>() - Resolve source file and line with: addr2line -fCpe /__w/spectre/spectre/build/bin/ExportCoordinates1D 0xf9d0b4
  5. void CkIndex_AlgorithmArray<DgElementArray<Metavariables<1ul, false>, brigand::list<Parallel::PhaseActions<(Parallel::Phase)9, brigand::list<Initialization::Actions::InitializeItems<Initialization::TimeStepping<Metavariables<1ul, false>, TimeStepper>, evolution::dg::Initialization::Domain<1ul, false [...] eChange> > > >, ElementId<1ul> >::_call_simple_action_void<amr::Actions::AdjustDomain>(void*, void*) - Resolve source file and line with: addr2line -fCpe /__w/spectre/spectre/build/bin/ExportCoordinates1D 0xf6e7ca
  6. CkLocRec::invokeEntry(CkMigratable*, void*, int, bool) in /work/charm_7_0_0/src/ck-core/cklocation.C:2263
  7. CkArrayBroadcaster::deliver(CkArrayMessage*, ArrayElement*, bool) in /work/charm_7_0_0/src/ck-core/ckarray.C:1371
  8. CkArray::recvBroadcast(CkMessage*) in /work/charm_7_0_0/src/ck-core/ckarray.C:1[68](https://github.com/sxs-collaboration/spectre/actions/runs/8053988507/job/21997568173?pr=5795#step:27:69)3
  9. CkDeliverMessageFree in /work/charm_7_0_0/src/ck-core/ck.C:553
 10. _deliverForBocMsg(CkCoreState*, int, envelope*, IrrGroup*) [clone .constprop.0] in /work/charm_7_0_0/src/ck-core/ck.C:1064
 11. _processHandler(void*, CkCoreState*) in /work/charm_7_0_0/src/ck-core/ck.C:1250
 12. CsdScheduleForever in /work/charm_7_0_0/src/conv-core/convcore.C:1943
 13. CsdScheduler in /work/charm_7_0_0/src/conv-core/convcore.C:1888
 14. ConverseRunPE(int) in /work/charm_7_0_0/src/arch/util/machine-common-core.C:1614
 15. ConverseInit in /work/charm_7_0_0/src/arch/util/machine-common-core.C:1530
 16. charm_main in /work/charm_7_0_0/src/ck-core/init.C:1[75](https://github.com/sxs-collaboration/spectre/actions/runs/8053988507/job/21997568173?pr=5795#step:27:76)9
 17. __libc_start_call_main in ../sysdeps/nptl/libc_start_call_main.h:58
 18. __libc_start_main in ../csu/libc-start.c:3[79](https://github.com/sxs-collaboration/spectre/actions/runs/8053988507/job/21997568173?pr=5795#step:27:80)
 19. _start - Resolve source file and line with: addr2line -fCpe /__w/spectre/spectre/build/bin/ExportCoordinates1D 0xe29[94](https://github.com/sxs-collaboration/spectre/actions/runs/8053988507/job/21997568173?pr=5795#step:27:95)5

Wall time: 00:00:00
Node: 0 Proc: 0
SegmentId SegmentId::id_of_parent() const in ../../../../src/Domain/Structure/SegmentId.hpp:[113](https://github.com/sxs-collaboration/spectre/actions/runs/8053988507/job/21997568173?pr=5795#step:27:114)

'0 != refinement_level_' violated!
Cannot call id_of_parent() on root refinement level!
############ ASSERT FAILED ############


Type: SpectreAssert

To determine where an exception is thrown, run gdb and do
catch throw EXCEPTION_TYPE
run
where EXCEPTION_TYPE is the Type of the exception above.
You may have to type `continue` to skip some option parser
exceptions until you get to the one you care about
You may also have to type `up` or `down` to go up and down
the function calls in order to find a useful line number.

Entering phase: PostFailureCleanup at time 00:00:00
PostFailureCleanup phase complete. Aborting.

Done!
Wall time: 00:00:00
Date and time at completion: Mon Feb 26 19:51:13 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants