Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

APEX: messed up JSON with APEX_TRACE_EVENT=1 for Google Trace Events #5239

Closed
albestro opened this issue Mar 12, 2021 · 20 comments
Closed

APEX: messed up JSON with APEX_TRACE_EVENT=1 for Google Trace Events #5239

albestro opened this issue Mar 12, 2021 · 20 comments

Comments

@albestro
Copy link
Contributor

Due to other problems I have with OTF2 (it hangs on exit, I may create a different issue for that), I'm giving a try to Google Trace Events Format with APEX_TRACE_EVENT=1.

I was able to run the my code, it has a clean exit and it produces the output. Unfortunately, the json produced is messed up like if there is a race-condition between multiple ranks writing to the same json file.

Eventually I was able to open the json after fixing manually the messed up parts of the file. In my specific runs I was able to quickly figure out what to remove (clearly losing a few entries) thanks also to the Google Tracing Tool that was pointing me to the buffer position where it found a grammar problem.

I didn't try to create a minimal reproducible example since it may not be so easy and deterministic, anyway I can help in testing it with my code which till now always generates a wrongly-formed json.

HPX team mentioned you @khuck as the APEX expert, let me know if I can help you in some way or I can provide you more information via other specific tests.
(OT: it may be interesting having a chat/call with you about the general topic "annotation". I'm open to that, let me know!)

Specifications

  • HPX Version: 1.6.0
  • Platform (compiler, OS): GCC on Linux
@khuck
Copy link
Contributor

khuck commented Mar 12, 2021

@albestro hey there ... I would be surprised if there are race conditions writing to the stream, but it is certainly possible. A couple of questions:

  • Which version of APEX? I assume 2.3.1, the default for HPX 1.6.0
  • OTF2 hanging on exit: did it write the archive at all? If it hang, it would have hung in the event unification process at the end. There's a new unification process that uses HPX, and in some cases we were seeing hangs but it was many months ago.
  • Can you share the json file? Each rank writes to its own file, assuming that each rank knows its rank id. How are you running the HPX application? Is it using TCP or MPI for the transport layer?

@msimberg
Copy link
Contributor

I think the issue is exactly that the ranks don't know what id they are (networking is off in HPX). I know @biddisco worked on that at some point, and possibly even made it work, but given that it's not tested it's probably broken. @biddisco do you remember where the rank was supposed to be set if networking is off?

@khuck in the meantime, if I remember correctly it's possible to set the output filename for OTF2 files with an environment variable. Is it possible to do the same for the google trace event files?

@albestro
Copy link
Contributor Author

hey @khuck! thanks for your quick reply!

Which version of APEX? I assume 2.3.1, the default for HPX 1.6.0

Correct, using default selected. From HPX CMake configure process I see apex : git checkout -q v2.3.1

OTF2 hanging on exit: did it write the archive at all? If it hang, it would have hung in the event unification process at the end. There's a new unification process that uses HPX, and in some cases we were seeing hangs but it was many months ago.

APEX_OTF2=1 srun -n 2 -c36 /users/ialberto/workspace/dla-future/builds/daint/miniapp/miniapp_reduction_to_band --matrix-size 10240 --block-size 512 --grid-rows 2 --grid-cols 1 --nwarmups=0 --nruns 1 --hpx:use-process-mask
[OTF2] src/otf2_archive_int.c:3945: error: File does already exist: Could not create archive trace directory!
[OTF2] src/otf2_archive_int.c:1108: error: File does already exist: Couldn't create directories on root.
OTF2 Error: 17, File does already exist
[0]
[0] 26.1765s nanGFlop/s (10240, 10240) (512, 512) (2, 1) 18
Closing OTF2 event files...
Closing OTF2 event files...
Writing OTF2 definition files...
Writing OTF2 Global definition file...
Writing OTF2 Node information...
Writing OTF2 Communicators...
Closing the archive...
done.
// here it hangs...

After killing it, the content of the folder is

OTF2_archive/
|-- APEX
|   |-- 0.def
|   |-- 0.evt
|   |-- 1.def
|   |-- 1.evt
|   |-- 10.def
|   |-- 10.evt
|   |-- 11.def
|   |-- 11.evt
|   |-- 12.def
|   |-- 12.evt
|   |-- 13.def
|   |-- 13.evt
|   |-- 14.def
|   |-- 14.evt
|   |-- 15.def
|   |-- 15.evt
|   |-- 16.def
|   |-- 16.evt
|   |-- 17.def
|   |-- 17.evt
|   |-- 18.def
|   |-- 18.evt
|   |-- 2.def
|   |-- 2.evt
|   |-- 3.def
|   |-- 3.evt
|   |-- 4.def
|   |-- 4.evt
|   |-- 5.def
|   |-- 5.evt
|   |-- 6.def
|   |-- 6.evt
|   |-- 7.def
|   |-- 7.evt
|   |-- 8.def
|   |-- 8.evt
|   |-- 9.def
|   `-- 9.evt
|-- APEX.def
`-- APEX.otf2

But if I open it with Vampir, I get an empty trace

image

Can you share the json file? Each rank writes to its own file, assuming that each rank knows its rank id. How are you running the HPX application? Is it using TCP or MPI for the transport layer?

NOTE: A thing that may be relevant and I forgot to specify: I built HPX_WITH_NETWORKING=off.

I'm running my application with 2 ranks on the same node, and the application itself uses MPI directly.

APEX_TRACE_EVENT=1 srun -n 2 -c36 /users/ialberto/workspace/dla-future/builds/daint/miniapp/miniapp_reduction_to_band --matrix-size 10240 --block-size 512 --grid-rows 2 --grid-cols 1 --nwarmups=0 --nruns 1 --hpx:use-process-mask

Here it is trace.tar.gz. If you try to load it with Google Format tool it says the bytes offset where it fails. Just for reference, a couple of errors:

line 95424: the entry is truncated (after "ts" field) and a new one starts

{"name":"hpx_main","ph":"X","pid":173700657,"tid":19,"ts":1615799668667355.000{"name":"hpx_main","ph":"X","pid":0,"tid":19,"ts":1615799668616451.000000, "dur": 83.561636,"args":{"GUID":14411518807585721556,"Parent GUID":14411518807585721555}},

line 252987: there is the termination of the array (end of first rank data?) but then data continues, but starts truncated

{"name":"APEX MAIN", "ph":"E","pid":0,"tid":0,"ts":1615799684527165.500000}
]
}

x_main","ph":"X","pid":0,"tid":13,"ts":1615799683699657.750000, "dur": 32.267636,"args":{"GUID":2305843009213702076,"Parent GUID":6917529027641089785}},

@khuck
Copy link
Contributor

khuck commented Mar 15, 2021

Based on what you've said, it sounds like the ranks are writing to the same file.

Which MPI are you using? Which batch system? Without HPX networking, APEX can sometimes guess the ranks based on environment variables:

https://github.com/khuck/xpress-apex/blob/7c511f63ae648b20991be7caec6d346a9dc7f675/src/apex/utils.cpp#L417-L450

If this is a Cray system, I might have to add support for their variable (CRAY_PMI_RANK for example).

@albestro
Copy link
Contributor Author

Yes, it's Cray MPICH with Slurm

@khuck
Copy link
Contributor

khuck commented Mar 16, 2021

Slurm should have been caught, but I'll add support for CRAY...

@albestro
Copy link
Contributor Author

I gave a quick look at the snippet of code you mentioned, and I did a quick test.

$ srun -n 2 -o "%t-env.txt" printenv
$ grep "SLURM_PROCID" *-env.txt
0-env.txt:SLURM_PROCID=0
1-env.txt:SLURM_PROCID=1

I got the same results with 2 ranks being them on a single node or on separate nodes. So, I would confirm your expectation, if that snippet of code gets called, it would be able to extract data from SLURM environment variables.

@albestro
Copy link
Contributor Author

albestro commented Mar 17, 2021

Hey, just to be sure...

https://github.com/khuck/xpress-apex/blob/7c511f63ae648b20991be7caec6d346a9dc7f675/src/apex/utils.cpp#L417-L450

Are you sure that you use the return type and not the argument? because, I would say that the argument should be a reference and not a value...

uint64_t test_for_MPI_comm_rank(uint64_t commrank) { // WARNING here commrank is not passed by reference
    // ...

    // Slurm - last resort
    tmpvar = getenv("SLURM_PROCID");
    if (tmpvar != NULL) {
        commrank = atol(tmpvar);
        return commrank;
    }
    return commrank;
}

It seems that the function had the intention to give back the results both as return value and as "out" argument, but in the latter case you don't get it because commrank is not passed by reference.

EDIT: Just realized that it seems you are using it as fallback value.

@albestro
Copy link
Contributor Author

I just copied the function test_for_MPI_comm_rank in my small example, I slightly changed it to print out which branch was falling in, and it returns the correct value

srun -n 2 a.out
SLURM
1
SLURM
0

So, I think we should look in other directions. Do you have any suggestion on how to debug this?

@khuck
Copy link
Contributor

khuck commented Mar 18, 2021

You could try printing out the trace file name when the file is created:
https://github.com/khuck/xpress-apex/blob/7c511f63ae648b20991be7caec6d346a9dc7f675/src/apex/trace_event_listener.cpp#L324

If you are getting two unique file names, then there is something else going on.

@albestro
Copy link
Contributor Author

albestro commented Mar 18, 2021

I've just checked, and I get exactly the same filename on all (2) ranks (tested on the same node for ease).

Specifically, saved_node_id == 0 and ss.str() == "trace_events.0.json" on both ranks.

While I wait for your next suggestion on how to go on investigating this, I take the chance to ask you one question (I have more about this topic, it would be nice to quickly discuss with you about a couple of things. Let me know if you are open to this and how we can organize it in case).

So, once this will be fixed, when using Google Trace Tool we are supposed to open each rank trace in a separate window, right?

@khuck
Copy link
Contributor

khuck commented Mar 24, 2021

@albestro It would appear there are two issues here. First, there is a bug in the OTF2 library, which I thought we had reported and had been fixed. In OTF2, you need to patch the otf2_archive_set_collective_callbacks method in src/otf2_archive_int.c. The declaration of the status variable is without an initialization, which causes the error message from non-zero ranks. That also bypasses some other initialization which leads to other problems down the line. To fix it, change that line (line 1086 in version 2.2) to:

    OTF2_ErrorCode status = OTF2_SUCCESS;

The second issue is that when HPX is configured without parcel port support, HPX can't tell APEX how many ranks there are, and the rank ID for each process. That's OK, as long as the MPI implementation sets some environment variables. I have added additional support for Cray systems which might take care of the issue with both tracing outputs - OTF2 and Google Trace Events. Please configure HPX with -DHPX_WITH_APEX_TAG=develop and see if that additional support helps.

@albestro
Copy link
Contributor Author

I did like you suggested, but without luck.

  • Patched manually OTF2
  • Built HPX 1.6.0 with APEX=develop

Google Trace Events still creates a single file with the usual "output-overlaps" between ranks.

OTF2, still hangs on exit

Closing OTF2 event files...
Closing OTF2 event files...                                                                                                                                                       Writing OTF2 definition files...
Writing OTF2 definition files...
Writing OTF2 Global definition file...
Writing OTF2 Node information...                                                                                                                                                  Writing OTF2 Communicators...
Closing the archive...
done.
// here it waits forever

If I kill the process, I get

{stack-trace}: 13 frames:
0x2aaab37dd2d0  : /apps/daint/UES/xalt/xalt2/software/xalt/2.8.10/lib64/libpthread.so.0(+0x132d0) [0x2aaab37dd2d0] in /apps/daint/UES/xalt/xalt2/software/xalt/2.8.10/lib64/libpthread.so.0
0x2aaab37d887d  : pthread_cond_wait [0x1fd] in /apps/daint/UES/xalt/xalt2/software/xalt/2.8.10/lib64/libpthread.so.0
0x2aaab16348ac  : boost::asio::detail::scheduler::run(boost::system::error_code&) [0x28c] in /project/csstaff/ialberto/libraries/hpx-1.6.0/build/install_here/lib64/libhpx.so.1
0x2aaab1e9c70a  : hpx::util::io_service_pool::thread_run(unsigned long, hpx::util::barrier*) [0x14a] in /project/csstaff/ialberto/libraries/hpx-1.6.0/build/install_here/lib64/libhpx_core.so
0x2aaab1616e15  : hpx::runtime_distributed::wait() [0x255] in /project/csstaff/ialberto/libraries/hpx-1.6.0/build/install_here/lib64/libhpx.so.1
0x2aaab16123f5  : hpx::runtime_distributed::run(hpx::util::function<int (), false> const&) [0x15] in /project/csstaff/ialberto/libraries/hpx-1.6.0/build/install_here/lib64/libhpx.so.1
0x2aaab16d9c9d  : /project/csstaff/ialberto/libraries/hpx-1.6.0/build/install_here/lib64/libhpx.so.1(+0x2d9c9d) [0x2aaab16d9c9d] in /project/csstaff/ialberto/libraries/hpx-1.6.0/build/install_here/lib64/libhpx.so.1
0x2aaab16d9f0c  : /project/csstaff/ialberto/libraries/hpx-1.6.0/build/install_here/lib64/libhpx.so.1(+0x2d9f0c) [0x2aaab16d9f0c] in /project/csstaff/ialberto/libraries/hpx-1.6.0/build/install_here/lib64/libhpx.so.1
0x2aaab16da7e5  : hpx::detail::run_or_start(hpx::util::function<int (hpx::program_options::variables_map&), false> const&, int, char**, hpx::init_params const&, bool) [0x845] in /project/csstaff/ialberto/libraries/hpx-1.6.0/build/install_here/lib64/libhpx.so.1
0x4178af        : /users/ialberto/workspace/dla-future/builds/daint/miniapp/miniapp_reduction_to_band() [0x4178af] in /users/ialberto/workspace/dla-future/builds/daint/miniapp/miniapp_reduction_to_band
0x2aaab4f7334a  : __libc_start_main [0xea] in /lib64/libc.so.6
0x41817a        : /users/ialberto/workspace/dla-future/builds/daint/miniapp/miniapp_reduction_to_band() [0x41817a] in /users/ialberto/workspace/dla-future/builds/daint/miniapp/miniapp_reduction_to_band
{what}: Interrupt
{stack-trace}: 15 frames:
0x2aaab37dd2d0  : /apps/daint/UES/xalt/xalt2/software/xalt/2.8.10/lib64/libpthread.so.0(+0x132d0) [0x2aaab37dd2d0] in /apps/daint/UES/xalt/xalt2/software/xalt/2.8.10/lib64/libpthread.so.0
0x2aaab427f894  : /opt/cray/pe/lib64/libmpich_gnu_82.so.3(+0x215894) [0x2aaab427f894] in /opt/cray/pe/lib64/libmpich_gnu_82.so.3
0x2aaab4284178  : MPIDI_CH3I_Progress [0x9c8] in /opt/cray/pe/lib64/libmpich_gnu_82.so.3
0x2aaab41cad6d  : MPIC_Wait [0x4d] in /opt/cray/pe/lib64/libmpich_gnu_82.so.3
0x2aaab41cb4ed  : MPIC_Sendrecv [0x1ed] in /opt/cray/pe/lib64/libmpich_gnu_82.so.3
0x2aaab40fee94  : MPIR_Barrier_intra [0xa4] in /opt/cray/pe/lib64/libmpich_gnu_82.so.3
0x2aaab40ff4d2  : MPIR_Barrier [0x12] in /opt/cray/pe/lib64/libmpich_gnu_82.so.3
0x2aaab41de41d  : MPIDI_Cray_shared_mem_coll_opt_cleanup [0x13d] in /opt/cray/pe/lib64/libmpich_gnu_82.so.3
0x2aaab41dbddb  : MPIDI_Cray_coll_finalize [0x1b] in /opt/cray/pe/lib64/libmpich_gnu_82.so.3
0x2aaab426e3cc  : MPID_Finalize [0x2c] in /opt/cray/pe/lib64/libmpich_gnu_82.so.3
0x2aaab417bb8d  : PMPI_Finalize [0x29d] in /opt/cray/pe/lib64/libmpich_gnu_82.so.3
0x4178d5        : /users/ialberto/workspace/dla-future/builds/daint/miniapp/miniapp_reduction_to_band() [0x4178d5] in /users/ialberto/workspace/dla-future/builds/daint/miniapp/miniapp_reduction_to_band
0x2aaab4f7334a  : __libc_start_main [0xea] in /lib64/libc.so.6
0x41817a        : /users/ialberto/workspace/dla-future/builds/daint/miniapp/miniapp_reduction_to_band() [0x41817a] in /users/ialberto/workspace/dla-future/builds/daint/miniapp/miniapp_reduction_to_band

@khuck
Copy link
Contributor

khuck commented Mar 29, 2021

@albestro this looks more like a shutdown race condition. Did you at least get a valid OTF2 trace? I'll see if I can reproduce the race condition.

Is there any way you can share the application that is causing this deadlock?

@severinstrobl
Copy link
Contributor

As I encounter a very similar issue with APEX/OTF2, here is a small example program that allows me to reliably reproduce the issue. The OTF2 output deadlocks even earlier here, all I get is:

Closing OTF2 event files...
Writing OTF2 definition files...

The path OTF2_archive/APEX is created, but no files.

@albestro
Copy link
Contributor Author

albestro commented Apr 1, 2021

@severinstrobl

As I encounter a very similar issue with APEX/OTF2, here is a small example program that allows me to reliably reproduce the issue. The OTF2 output deadlocks even earlier here, all I get is:

Closing OTF2 event files...
Writing OTF2 definition files...

The path OTF2_archive/APEX is created, but no files.

I don't know if that's exactly the case, because as you can see in a previous message, I get some output files.

@khuck

@albestro this looks more like a shutdown race condition. Did you at least get a valid OTF2 trace? I'll see if I can reproduce the race condition.

The output folder is populated with files, but when I open it with Vampir I don't get anything (as in the previous #5239 (comment))

Is there any way you can share the application that is causing this deadlock?

It's a miniapp of my library, I don't have any problem sharing it with you, but it may require some work on your side to prepare the environment (it is spack based so it won't require too much). Before that, I can try to see if I can reproduce it with a more simple example...I'll let you know.

In the meanwhile, is it worth separating the two issues "otf2 hanging" and "google trace events single-file race-condition"?

severinstrobl added a commit to severinstrobl/spack that referenced this issue Apr 1, 2021
As pointed out in STEllAR-GROUP/hpx#5239,
there is an issues in OTF2 2.2 where a variable is not properly
initialized. As currently no release of OTF2 is available fixing this,
the patch should be applied.
severinstrobl added a commit to severinstrobl/spack that referenced this issue Apr 1, 2021
As pointed out in STEllAR-GROUP/hpx#5239,
there is an issues in OTF2 <=2.2 where a variable is not properly
initialized. As currently no release of OTF2 is available fixing this,
the patch should be applied.
alalazo pushed a commit to spack/spack that referenced this issue Apr 7, 2021
As pointed out in STEllAR-GROUP/hpx#5239,
there is an issues in OTF2 <=2.2 where a variable is not properly
initialized. As currently no release of OTF2 is available fixing this,
the patch should be applied.
@albestro
Copy link
Contributor Author

Hey @khuck, yesterday I found the time for giving this a look.

I'm keeping thinking that the root of the two problems, i.e. "OTF2 hang" and "Google Trace overlap", might be the same. It looks like a race-condition in both cases, once it ends up with an hanging, the other one with a write on the same file concurrently, but that's just my speculation without having much knowledge about this stuff.

Yesterday I worked on the OTF2 problem. Curiously enough, after a few tries, I was able to reproduce the problem with a super simple executable

#include <hpx/hpx.hpp>
#include <hpx/hpx_init.hpp>
#include <mpi.h>

int hpx_main() {
  hpx::make_ready_future<void>().then([](auto&&){
    int rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    hpx::util::annotate_function _(std::to_string(rank));
  });
  return hpx::finalize();
}

int main(int argc, char** argv) {
  MPI_Init(&argc, &argv);
  auto ret_code = hpx::init(argc, argv);
  MPI_Finalize();
  return ret_code;
}

I'm not really sure everything is needed to reproduce it, but it is simple enough and at the same time it looks very similar to my classic run (just as reminder HPX_WITH_NETWORKING=OFF and my program uses MPI directly).

The game changer to reproduce the OTF2 hang was the number of ranks. In fact, the probability to get a hang is directly proportional to the number of ranks I use (e.g. generally 8 is a good number, but with more it was way more frequent = almost every run was hanging).

An important point about this, it is that I worked in a multi rank per node configuration, and I did not have the chance to test it with multiple nodes yet (I can try). I think this may be quite useful for you to narrow down the problem, even considering what you were saying in your previous comments.

I don't know how the OTF2 is expected to work (one folder per rank? one folder per node?), but for you it may be insightful to see the following stacktrace.

In order to collect it, I ran a "hanging configuration" with multiple ranks on a single node, and among these ranks I saw that there was a number of ranks waiting on MPI_Finalize (and these printed "done" saying that they finished writing the OTF2 archive), and the remaining ones (not a fixed number over different runs) were the "hanging" ones, so I attached to one of them and I printed the stacktrace of the more interesting thread. Here you are.

(gdb) bt
#0  0x00002aaaaee8add7 in unlink () from /lib64/libc.so.6
#1  0x00002aaaaee0bde9 in remove () from /lib64/libc.so.6
#2  0x00002aaaac065173 in apex::otf2_listener::reduce_metrics (this=0xeeab00) at /users/ialberto/workspace/hpx/apex/src/apex/otf2_listener.cpp:2487
#3  0x00002aaaac060f0a in apex::otf2_listener::on_shutdown (this=0xeeab00, data=...) at /users/ialberto/workspace/hpx/apex/src/apex/otf2_listener.cpp:952
#4  0x00002aaaabfc1c6c in apex::finalize () at /users/ialberto/workspace/hpx/apex/src/apex/apex.cpp:1513
#5  0x00002aaaabfbd214 in apex::finalize_hpx_runtime () at /users/ialberto/workspace/hpx/apex/src/apex/apex.cpp:194
#6  0x00002aaaabfd3e25 in hpx::util::detail::callable_vtable<void ()>::_invoke<void (*)()>(void*) (f=0xf94ba0)
    at /users/ialberto/workspace/hpx/libs/core/functional/include/hpx/functional/detail/vtable/callable_vtable.hpp:93
#7  0x00002aaaab3858f6 in hpx::util::detail::basic_function<void (), false, false>::operator()() const (this=0xf94b90)
    at /users/ialberto/workspace/hpx/libs/core/functional/include/hpx/functional/detail/basic_function.hpp:228
#8  hpx::components::server::runtime_support::call_shutdown_functions (this=0xe5cf20, pre_shutdown=true)
    at /users/ialberto/workspace/hpx/src/runtime/components/server/runtime_support_server.cpp:1117
#9  0x00002aaaab3af44a in hpx::actions::detail::component_invoke<hpx::components::server::runtime_support, void, void (bool), bool>(std::integral_constant<bool, false>, unsigned long, int, void (hpx::components::server::runtime_support::*)(bool), bool&&) (lva=15060768, f=
    (void (hpx::components::server::runtime_support::*)(hpx::components::server::runtime_support * const, bool)) 0x2aaaab385830 <hpx::components::server::runtime_support::call_shutdown_functions(bool)>, vs#0=@0x1a64d1c: true) at /users/ialberto/workspace/hpx/libs/full/actions_base/include/hpx/actions_base/component_action.hpp:54
#10 0x00002aaaab3ae5b3 in hpx::actions::action<void (hpx::components::server::runtime_support::*)(bool), &hpx::components::server::runtime_support::call_shutdown_functions, hpx::components::server::runtime_support::call_shutdown_functions_action>::invoke<bool> (lva=15060768, comptype=0, vs#0=@0x1a64d1c: true)
    at /users/ialberto/workspace/hpx/libs/full/actions_base/include/hpx/actions_base/component_action.hpp:106
#11 0x00002aaaab3b36b7 in hpx::actions::detail::action_invoke<hpx::components::server::runtime_support::call_shutdown_functions_action>::operator()<bool> (this=0x2aaaaac35dd0)
    at /users/ialberto/workspace/hpx/libs/full/actions_base/include/hpx/actions_base/basic_action.hpp:80
#12 hpx::util::detail::invoke_fused_impl<0ul, hpx::actions::detail::action_invoke<hpx::components::server::runtime_support::call_shutdown_functions_action>, hpx::tuple<bool> > (
    f=..., t=...) at /users/ialberto/workspace/hpx/libs/core/functional/include/hpx/functional/invoke_fused.hpp:79
#13 0x00002aaaab3b3397 in hpx::util::functional::invoke_fused::operator()<hpx::actions::detail::action_invoke<hpx::components::server::runtime_support::call_shutdown_functions_action>, hpx::tuple<bool> > (this=0x2aaaaac35def, f=..., t=...) at /users/ialberto/workspace/hpx/libs/core/functional/include/hpx/functional/invoke_fused.hpp:136
#14 0x00002aaaab3b2d8d in hpx::actions::trigger<void, hpx::util::functional::invoke_fused, hpx::actions::detail::action_invoke<hpx::components::server::runtime_support::call_shutdown_functions_action>, hpx::tuple<bool> > (cont=..., f=..., vs#0=..., vs#1=...) at /users/ialberto/workspace/hpx/libs/full/actions/include/hpx/actions/trigger.hpp:45
#15 0x00002aaaab3b1f58 in hpx::traits::action_trigger_continuation<hpx::actions::typed_continuation<void, hpx::util::unused_type>, void>::call<hpx::util::functional::invoke_fused, hpx::actions::detail::action_invoke<hpx::components::server::runtime_support::call_shutdown_functions_action>, hpx::tuple<bool> > (f=..., cont=...)
    at /users/ialberto/workspace/hpx/libs/full/actions/include/hpx/actions/traits/action_trigger_continuation.hpp:28
#16 hpx::actions::detail::continuation_thread_function<hpx::components::server::runtime_support::call_shutdown_functions_action>::operator() (this=0x1a64cb0)
    at /users/ialberto/workspace/hpx/libs/full/actions_base/include/hpx/actions_base/basic_action.hpp:180
#17 0x00002aaaab3b1508 in hpx::util::detail::callable_vtable<std::pair<hpx::threads::thread_schedule_state, hpx::threads::thread_id> (hpx::threads::thread_restart_state)>::_invoke<hpx::actions::detail::continuation_thread_function<hpx::components::server::runtime_support::call_shutdown_functions_action> >(void*, hpx::threads::thread_restart_state&&) (
    f=0x1a64cb0, vs#0=@0x2aaaaac35edc: hpx::threads::thread_restart_state::signaled)
    at /users/ialberto/workspace/hpx/libs/core/functional/include/hpx/functional/detail/vtable/callable_vtable.hpp:93
#18 0x00002aaaac55d9d1 in hpx::util::detail::basic_function<std::pair<hpx::threads::thread_schedule_state, hpx::threads::thread_id> (hpx::threads::thread_restart_state), false, false>::operator()(hpx::threads::thread_restart_state) const (vs#0=hpx::threads::thread_restart_state::signaled, this=0x1a78880)
    at /users/ialberto/workspace/hpx/libs/core/functional/include/hpx/functional/detail/basic_function.hpp:228
#19 hpx::threads::coroutines::detail::coroutine_impl::operator() (this=0x1a78760) at /users/ialberto/workspace/hpx/libs/core/coroutines/src/detail/coroutine_impl.cpp:74
#20 0x00002aaaac55cff2 in hpx::threads::coroutines::detail::lx::trampoline<hpx::threads::coroutines::detail::coroutine_impl> (fun=0x1a78760)
    at /users/ialberto/workspace/hpx/libs/core/coroutines/include/hpx/coroutines/detail/context_linux_x86.hpp:92
#21 0x0000000000000000 in ?? ()
(gdb) f 2
#2  0x00002aaaac065173 in apex::otf2_listener::reduce_metrics (this=0xeeab00) at /users/ialberto/workspace/hpx/apex/src/apex/otf2_listener.cpp:2487
2487                rc = std::remove(my_lock_filename.str().c_str());
(gdb) print my_lock_filename.str().c_str()
$1 = 0x1a9c3a0 "OTF2_archive/.metrics.lock.all"
(gdb)

At the very end of the above gdb session, you can see I selected the frame 2 and I printed the name of the file it was waiting to unlink (I thought it may be useful for your analysis). I also report here the status of the folder. Just as a side note, if I do touch .metrics.lock.all it unlocks and the run exists correctly.

ialberto@nid00008:/scratch/snx3000/ialberto/temp/OTF2_archive> ls -al
total 40
drwxr-xr-x 3 ialberto csstaff 4096 Apr 14 08:31 .
drwxr-xr-x 3 ialberto csstaff 4096 Apr 14 08:28 ..
-rw-r--r-- 1 ialberto csstaff  288 Apr 14 08:31 .metrics.0
-rw-r--r-- 1 ialberto csstaff  316 Apr 14 08:31 .metrics.reduced.0
-rw-r--r-- 1 ialberto csstaff  754 Apr 14 08:30 .regions.0
-rw-r--r-- 1 ialberto csstaff  767 Apr 14 08:30 .regions.reduced.0
-rw-r--r-- 1 ialberto csstaff  154 Apr 14 08:31 .threads.0
-rw-r--r-- 1 ialberto csstaff    0 Apr 14 08:31 .threads.lock.all
drwxr-xr-x 2 ialberto csstaff 4096 Apr 14 08:30 APEX
-rw-r--r-- 1 ialberto csstaff 2257 Apr 14 08:31 APEX.def
-rw-r--r-- 1 ialberto csstaff  229 Apr 14 08:31 APEX.otf2

This may be just the effect and not the root of the problem, but with your knowledge and experience about these libraries you may be able to understand better the bigger figure.

As always, I'm open to any other test/check you want me to do.

note: I'm using HPX 1.6.0, APEX develop, OTF2 patched

@khuck
Copy link
Contributor

khuck commented Apr 19, 2021

@albestro Ah, this makes perfect sense! The "file based" method is the "least good" choice among HPX, MPI and shared filesystems for doing event unification, which is required by OTF2. My guess is that the file-based solution isn't working, probably because the application doesn't know how many total ranks there are! Ideally, if you configure APEX with HPX, it uses HPX to do the event unification at the end. However, you aren't using HPX parcel ports, so that is not an option.

Here are some possible solutions, from easiest to hardest:

  1. If you could run an MPI job and see what environment variables (if any) are available on your system that tell you how many MPI ranks are in the job? I could then add support for that variable in here: https://github.com/khuck/xpress-apex/blob/f5a776a2d2a475cd9d64cd3a1d1dfa6fc88de27d/src/apex/utils.cpp#L466

  2. If MPI is used, APEX could use MPI collectives to do the event unification. That works great, and is how OTF2 is designed to work. However, because HPX isn't configured with MPI support, APEX doesn't have paths to the MPI headers or libraries. I could see if it's possible to add support in APEX configuration for MPI without HPX having to know about it.

solution number 1 will be a faster fix than number 2...

@khuck
Copy link
Contributor

khuck commented Apr 19, 2021

I reviewed the slurm code for detecting environment variables, and I think I found the (or at least 'a') problem. I was using the wrong variable (documentation can be misleading...):
UO-OACISS/apex@e029c42

I think if you pull the latest APEX develop branch (you'll get some Kokkos changes with it, but they shouldn't affect you), I hope it'll fix the problem.

@albestro
Copy link
Contributor Author

Aha! Good catch!

Tested my basic example with UO-OACISS/apex@e029c42 and I can see the difference with both OTF2 and GoogleTraceFormat!

  • OTF2: just the root rank manages the OTF2 archive (not anymore all ranks printing "closing the archive") and Vampir shows the trace.
  • GoogleTraceFormat: it does not produce anymore a single "trace_events.0.json", but one output file per rank...so no race-condition anymore!

I'll give a try as soon as possible with my miniapp, but I think this issue can be closed.

@khuck I take the chance to point you to this discussion #5263, which I'd like to have with you about a "guideline" on how it is better to use APEX.

Thanks for your support and in advance for any comment on the discussion!

mpbelhorn pushed a commit to mpbelhorn/olcf-spack that referenced this issue May 20, 2021
As pointed out in STEllAR-GROUP/hpx#5239,
there is an issues in OTF2 <=2.2 where a variable is not properly
initialized. As currently no release of OTF2 is available fixing this,
the patch should be applied.
@albestro albestro closed this as completed Jun 2, 2021
RikkiButler20 pushed a commit to RikkiButler20/spack that referenced this issue Jun 9, 2021
As pointed out in STEllAR-GROUP/hpx#5239,
there is an issues in OTF2 <=2.2 where a variable is not properly
initialized. As currently no release of OTF2 is available fixing this,
the patch should be applied.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants