Skip to content

Using APEX with HPX

Kevin Huck edited this page Aug 31, 2020 · 6 revisions

For specialized instructions, here is a link to build instructions for building HPX and the Octotiger application on NERSC Cori Phase II (KNL).

Dependencies

There are a number of dependencies for HPX, and a number for APEX:

HPX Dependencies

APEX Dependencies for HPX support

Getting the code

HPX will automatically download APEX as a dependency, so once the above dependencies are installed, download the HPX source code:

git clone --branch stable --depth 1 https://github.com/STEllAR-GROUP/hpx.git

HPX has many branches and deep history, so to speed up the clone and save disk space, be specific:

git clone --branch stable --depth 1 https://github.com/STEllAR-GROUP/hpx.git

Configuring and building HPX with APEX support

Here's an example for how to build HPX without MPI and with APEX support on an OSX laptop, using Spack to manage dependencies:

#!/bin/zsh -e

. ${HOME}/spack/share/spack/setup-env.sh
spack load cmake
spack load boost
spack load gperftools
spack load hwloc@2.2.0%clang@11.0.3-apple~cairo~cuda~gl~libudev+libxml2~netloc~nvml~pci+shared
spack load otf2@2.2%clang@11.0.3-apple

if [ -d build ] ; then
    rm -rf build
fi
mkdir build
cd build

cwd=`pwd`
boost=`spack location -i boost`
gperftools=`spack location -i gperftools` \
hwloc=`spack location -i hwloc@2.2.0%clang@11.0.3-apple~cairo~cuda~gl~libudev+libxml2~netloc~nvml~pci+shared`
otf2=`spack location -i otf2@2.2%clang@11.0.3-apple`

cmake \
-DCMAKE_CXX_COMPILER=`which g++` \
-DCMAKE_C_COMPILER=`which gcc` \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DBOOST_ROOT=${boost} \
-DTCMALLOC_ROOT=${gperftools} \
-DHPX_WITH_MALLOC=tcmalloc \
-DHWLOC_ROOT=${hwloc} \
-DCMAKE_INSTALL_PREFIX=${cwd}/install \
-DHPX_WITH_THREAD_IDLE_RATES=ON \
-DHPX_WITH_PARCELPORT_MPI=OFF \
-DHPX_WITH_TOOLS=ON \
-DHPX_WITH_TESTS=ON \
-DHPX_WITH_EXAMPLES=ON \
-DHPX_WITH_APEX=TRUE \                  # Enables APEX support
-DHPX_WITH_APEX_TAG=develop \           # Optional, only for getting latest code updates
-DAPEX_WITH_ACTIVEHARMONY=FALSE \       # Optional, used for executing policies for runtime adaptation
-DAPEX_WITH_OTF2=TRUE \                 # Optional, used for generating OTF2 traces read by Vampir/Traveler
-DOTF2_ROOT=${otf2} \                   # Optional, path to OTF2 library installation
-DAPEX_WITH_PAPI=FALSE \                # Optional, enables hardware counter support
..

make -j8 -l8 core tests.examples.quickstart
ctest -V -R tests.examples.quickstart

Running HPX test applications, seeing APEX performance data

HPX has many test / example programs. For brevity, we'll use the fibonacci example. To run the fibonacci program from the build directory:

khuck@Kevins-MacBook-Air build % ./bin/fibonacci                                  
fibonacci(10) == 55
elapsed time: 0.001749 [s]

To run and see an APEX summary of execution, set the APEX_SCREEN_OUTPUT environment variable (or export it in your environment):

khuck@Kevins-MacBook-Air build % APEX_SCREEN_OUTPUT=1 ./bin/fibonacci
fibonacci(10) == 55
elapsed time: 0.002364 [s]

Elapsed time: 0.0306946 seconds
Cores detected: 8
Worker Threads observed: 4
Available CPU time: 0.122778 seconds

Timer                                                : #calls  |    mean  |   total  |  % total  
------------------------------------------------------------------------------------------------
                                           APEX MAIN :        1      0.031      0.031    100.000
           apex::profiler_listener::process_profiles :        1      0.000      0.000      0.079
                                               async :        2      0.000      0.000      0.003
                        async_launch_policy_dispatch :        5      0.000      0.000      0.239
            broadcast_call_shutdown_functions_action :        2      0.000      0.000      0.065
                      call_shutdown_functions_action :        2      0.000      0.000      0.181
                                    fibonacci_action :      174      0.000      0.008      6.767
                              load_components_action :        1      0.026      0.026     21.234
                   primary_namespace_colocate_action :        2      0.000      0.000      0.038
                                          run_helper :        1      0.001      0.001      0.739
                                 shutdown_all_action :        1      0.000      0.000      0.110
                                           APEX Idle :                          0.087     70.544
------------------------------------------------------------------------------------------------
                                        Total timers : 191

The HPX runtime is instrumented with APEX callbacks, so any HPX task is automatically measured. Note that because the APEX data is not reduced to node (process/locality) 0 before exit, the screen report is only from node 0 data.

APEX Task graphs

APEX can generate task graphs from HPX. To see them, use the APEX_TASKGRAPH_OUTPUT environment variable when the application is executed. Then run dot (from graphviz) on the resulting taskgraph:

khuck@Kevins-MacBook-Air build % APEX_TASKGRAPH_OUTPUT=1 ./bin/fibonacci
fibonacci(10) == 55
elapsed time: 0.002013 [s]
khuck@Kevins-MacBook-Air build % ls
CMakeCache.txt         apex/                  hpx/                   scripts/
CMakeFiles/            arch.c                 init/                  src/
CTestTestfile.cmake    bin/                   lib/                   taskgraph.0.dot
DartConfiguration.tcl  cmake_install.cmake    libs/                  tests/
Makefile               components/            out.bmp                tools/
Testing/               examples/              plugins/               wrap/
khuck@Kevins-MacBook-Air build % dot -Tpdf -O taskgraph.0.dot 
khuck@Kevins-MacBook-Air build % ls
CMakeCache.txt         arch.c                 lib/                   taskgraph.0.dot.pdf
CMakeFiles/            bin/                   libs/                  tests/
CTestTestfile.cmake    cmake_install.cmake    out.bmp                tools/
DartConfiguration.tcl  components/            plugins/               wrap/
Makefile               examples/              scripts/
Testing/               hpx/                   src/
apex/                  init/                  taskgraph.0.dot
khuck@Kevins-MacBook-Air build % open taskgraph.0.dot.pdf

images/taskgraph.0.dot.png

APEX Scatterplots

APEX can generate scatterplots of a sample (1/100) of tasks that are executed. The x-axis is the time since start of the program, the y-axis is the duration of the task. To see them, use the APEX_SCATTERPLOT_OUTPUT environment variable, and then run the APEX post-processing python script on them to generate the charts. For this example, we run with a larger fibonacci number to generate more samples, and we run the fibonacci_futures example, which tries different parallel implementations:

khuck@Kevins-MacBook-Air build % APEX_TASK_SCATTERPLOT=1 ./bin/fibonacci_futures --n-value=20
fibonacci_serial(20) == 6765,elapsed time:,45086,[s]
fibonacci_future_one(20) == 6765,elapsed time:,165061319,[s]
fibonacci(20) == 6765,elapsed time:,32537395,[s]
fibonacci_fork(20) == 6765,elapsed time:,20437048,[s]
fibonacci_future(20) == 6765,elapsed time:,65245878,[s]
fibonacci_future_fork(20) == 6765,elapsed time:,49399325,[s]
fibonacci_future_when_all(20) == 6765,elapsed time:,68501537,[s]
fibonacci_future_unwrapped_when_all(20) == 6765,elapsed time:,68637877,[s]
fibonacci_future_all(20) == 6765,elapsed time:,52566179,[s]
fibonacci_future_all_when_all(20) == 6765,elapsed time:,50315426,[s]
khuck@Kevins-MacBook-Air build % ../apex/src/scripts/task_scatterplot.py                     
Parsed 2467 samples
Plotting async_launch_policy_dispatch
Plotting async_launch_policy_dispatch::call
Plotting async
Rendering...
khuck@Kevins-MacBook-Air build % open image.png

images/scatterplot.png

APEX Tracing

APEX can generate an OTF2 trace suitable for visualization with Vampir (a commercial tool) or Traveler. To collect an OTF2 trace, use the APEX_OTF2 environment variable:

khuck@Kevins-MacBook-Air build % APEX_OTF2=1 ./bin/fibonacci                               
Rank 0 of 1.
fibonacci(10) == 55
elapsed time: 0.003572 [s]
Closing OTF2 event files...
Writing OTF2 definition files...
Writing OTF2 Global definition file...
Writing OTF2 Node information...
Writing OTF2 Communicators...
Closing the archive...
done.

To validate the trace, you can use the otf2-print utility that comes with the OTF2 library:

khuck@Kevins-MacBook-Air build % otf2-print -A ./OTF2_archive/APEX.otf2 

=== OTF2-PRINT ===

Content of OTF2 anchor file:
Version                        2.2.0
Chunk size events              1048576
Chunk size definitions         4194304
File substrate                 POSIX
Compression                    NONE
Number of locations            5
Number of global definitions   52
Machine name                   
Creator                        APEX version stable-6cbbe6b878-master
Built on: 09:47:32 Jul 17 2020
C++ Language Standard version : 201402
Clang Compiler version : 4.2.1 Compatible Apple LLVM 11.0.3 (clang-1103.0.32.62)
Description                    
Number of properties           0
Trace identifier               9a80b630b08826d7
Number of snapshots:           0
Number of thumbnails:          0

=== Global Definitions =========================================================

Definition                            ID  Attributes
--------------------------------------------------------------------------------
STRING                                 0  ""
STRING                                 1  "run_helper"
REGION                                 0  Name: "run_helper" <1> (Aka. "" <0>), Descr.: "" <0>, Role: TASK, Paradigm: USER, Flags: NONE, File: "" <0>, Begin: 0, End: 0
STRING                                 2  "load_components_action"
REGION                                 1  Name: "load_components_action" <2> (Aka. "" <0>), Descr.: "" <0>, Role: TASK, Paradigm: USER, Flags: NONE, File: "" <0>, Begin: 0, End: 0
STRING                                 3  "async_launch_policy_dispatch"
REGION                                 2  Name: "async_launch_policy_dispatch" <3> (Aka. "" <0>), Descr.: "" <0>, Role: TASK, Paradigm: USER, Flags: NONE, File: "" <0>, Begin: 0, End: 0
STRING                                 4  "fibonacci_action"
REGION                                 3  Name: "fibonacci_action" <4> (Aka. "" <0>), Descr.: "" <0>, Role: TASK, Paradigm: USER, Flags: NONE, File: "" <0>, Begin: 0, End: 0
STRING                                 5  "apex::profiler_listener::process_profiles"
REGION                                 4  Name: "apex::profiler_listener::process_profiles" <5> (Aka. "" <0>), Descr.: "" <0>, Role: TASK, Paradigm: MEASUREMENT_SYSTEM, Flags: NONE, File: "" <0>, Begin: 0, End: 0
STRING                                 6  "apex::process_profiles"
REGION                                 5  Name: "apex::process_profiles" <6> (Aka. "" <0>), Descr.: "" <0>, Role: TASK, Paradigm: MEASUREMENT_SYSTEM, Flags: NONE, File: "" <0>, Begin: 0, End: 0
STRING                                 7  "shutdown_all_action"
REGION                                 6  Name: "shutdown_all_action" <7> (Aka. "" <0>), Descr.: "" <0>, Role: TASK, Paradigm: USER, Flags: NONE, File: "" <0>, Begin: 0, End: 0
STRING                                 8  "primary_namespace_colocate_action"
REGION                                 7  Name: "primary_namespace_colocate_action" <8> (Aka. "" <0>), Descr.: "" <0>, Role: TASK, Paradigm: USER, Flags: NONE, File: "" <0>, Begin: 0, End: 0
STRING                                 9  "broadcast_call_shutdown_functions_action"
REGION                                 8  Name: "broadcast_call_shutdown_functions_action" <9> (Aka. "" <0>), Descr.: "" <0>, Role: TASK, Paradigm: USER, Flags: NONE, File: "" <0>, Begin: 0, End: 0
STRING                                10  "call_shutdown_functions_action"
REGION                                 9  Name: "call_shutdown_functions_action" <10> (Aka. "" <0>), Descr.: "" <0>, Role: TASK, Paradigm: USER, Flags: NONE, File: "" <0>, Begin: 0, End: 0
STRING                                11  "async"
REGION                                10  Name: "async" <11> (Aka. "" <0>), Descr.: "" <0>, Role: TASK, Paradigm: USER, Flags: NONE, File: "" <0>, Begin: 0, End: 0
STRING                                12  "GUID"
STRING                                13  "Globaly unique identifier"
ATTRIBUTE                              0  Name: "GUID" <12>, Description: "Globaly unique identifier" <13>, Type: UINT64
STRING                                14  "Parent GUID"
STRING                                15  "Globaly unique identifier of the parent task"
ATTRIBUTE                              1  Name: "Parent GUID" <14>, Description: "Globaly unique identifier of the parent task" <15>, Type: UINT64
STRING                                16  "count"
CLOCK_PROPERTIES                          Ticks per Seconds: 1000000000, Global Offset: 0, Length: 37693000
STRING                                17  "node"
STRING                                18  "Kevins-MacBook-Air.local"
SYSTEM_TREE_NODE                       0  Name: "Kevins-MacBook-Air.local" <18>, Class: "node" <17>, Parent: UNDEFINED
STRING                                19  "process 93544"
LOCATION_GROUP                         0  Name: "process 93544" <19>, Type: PROCESS, Parent: "Kevins-MacBook-Air.local" <0>
STRING                                20  "thread 00"
LOCATION                               0  Name: "thread 00" <20>, Type: CPU_THREAD, # Events: 11, Group: "process 93544" <0>
STRING                                21  "thread 01"
LOCATION                               1  Name: "thread 01" <21>, Type: CPU_THREAD, # Events: 11, Group: "process 93544" <0>
STRING                                22  "thread 02"
LOCATION                               2  Name: "thread 02" <22>, Type: CPU_THREAD, # Events: 11, Group: "process 93544" <0>
STRING                                23  "thread 03"
LOCATION                               3  Name: "thread 03" <23>, Type: CPU_THREAD, # Events: 11, Group: "process 93544" <0>
STRING                                24  "thread 04"
LOCATION                               4  Name: "thread 04" <24>, Type: CPU_THREAD, # Events: 11, Group: "process 93544" <0>
STRING                                25  "MPI_COMM_WORLD_LOCATIONS"
GROUP                                  0  Name: "MPI_COMM_WORLD_LOCATIONS" <25>, Type: COMM_LOCATIONS, Paradigm: MPI, Flags: NONE, 1 Member: "thread 00" <0>
STRING                                26  "MPI_COMM_WORLD_GROUP"
GROUP                                  1  Name: "MPI_COMM_WORLD_GROUP" <26>, Type: COMM_GROUP, Paradigm: MPI, Flags: NONE, 1 Member: 0 ("thread 00" <0>)
STRING                                27  "MPI_COMM_WORLD"
COMM                                   0  Name: "MPI_COMM_WORLD" <27>, Group: "MPI_COMM_WORLD_GROUP" <1>, Parent: UNDEFINED
=== Events =====================================================================
Event                               Location            Timestamp  Attributes
--------------------------------------------------------------------------------
ENTER                                      2              2500000  Region: "run_helper" <0>
                                                                   ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 2), ("Parent GUID" <1>; UINT64; 0)
ENTER                                      1              2630000  Region: "load_components_action" <1>
                                                                   ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 2305843009213693954), ("Parent GUID" <1>; UINT64; 2)
LEAVE                                      2              2659000  Region: "run_helper" <0>
                                                                   ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 2), ("Parent GUID" <1>; UINT64; 0)
ENTER                                      2             29096000  Region: "run_helper" <0>
                                                                   ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 2), ("Parent GUID" <1>; UINT64; 0)
LEAVE                                      1             29124000  Region: "load_components_action" <1>
                                                                   ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 2305843009213693954), ("Parent GUID" <1>; UINT64; 2)
LEAVE                                      2             29655000  Region: "run_helper" <0>
                                                                   ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 2), ("Parent GUID" <1>; UINT64; 0)
ENTER                                      3             29666000  Region: "async_launch_policy_dispatch" <2>
                                                                   ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 2305843009213693956), ("Parent GUID" <1>; UINT64; 2)
ENTER                                      1             29759000  Region: "run_helper" <0>
                                                                   ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 2), ("Parent GUID" <1>; UINT64; 0)
LEAVE                                      3             29765000  Region: "async_launch_policy_dispatch" <2>
                                                                   ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 2305843009213693956), ("Parent GUID" <1>; UINT64; 2)
LEAVE                                      1             29772000  Region: "run_helper" <0>
                                                                   ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 2), ("Parent GUID" <1>; UINT64; 0)
ENTER                                      2             29776000  Region: "async_launch_policy_dispatch" <2>
                                                                   ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 9223372036854775811), ("Parent GUID" <1>; UINT64; 2)
ENTER                                      3             29794000  Region: "run_helper" <0>
                                                                   ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 2), ("Parent GUID" <1>; UINT64; 0)
LEAVE                                      2             29797000  Region: "async_launch_policy_dispatch" <2>
                                                                   ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 9223372036854775811), ("Parent GUID" <1>; UINT64; 2)
ENTER                                      1             29882000  Region: "async_launch_policy_dispatch" <2>
                                                                   ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 4611686018427387906), ("Parent GUID" <1>; UINT64; 2)
ENTER                                      4             29890000  Region: "async_launch_policy_dispatch" <2>
                                                                   ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 4611686018427387908), ("Parent GUID" <1>; UINT64; 2)
LEAVE                                      3             29894000  Region: "run_helper" <0>
                                                                   ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 2), ("Parent GUID" <1>; UINT64; 0)
LEAVE                                      4             29909000  Region: "async_launch_policy_dispatch" <2>
                                                                   ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 4611686018427387908), ("Parent GUID" <1>; UINT64; 2)
ENTER                                      2             29912000  Region: "async_launch_policy_dispatch" <2>
                                                                   ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 4611686018427387908), ("Parent GUID" <1>; UINT64; 2)
ENTER                                      4             29939000  Region: "fibonacci_action" <3>
                                                                   ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 2305843009213693958), ("Parent GUID" <1>; UINT64; 4611686018427387908)
LEAVE                                      1             29948000  Region: "async_launch_policy_dispatch" <2>
                                                                   ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 4611686018427387906), ("Parent GUID" <1>; UINT64; 2)
LEAVE                                      2             29951000  Region: "async_launch_policy_dispatch" <2>
                                                                   ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 4611686018427387908), ("Parent GUID" <1>; UINT64; 2)
ENTER                                      3             29953000  Region: "fibonacci_action" <3>
                                                                   ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 9223372036854775813), ("Parent GUID" <1>; UINT64; 4611686018427387906)
ENTER                                      1             29965000  Region: "fibonacci_action" <3>
                                                                   ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 2305843009213693960), ("Parent GUID" <1>; UINT64; 4611686018427387908)
...

A view of the trace in Vampir:

images/vampir_fibonacci.png

A view of the trace in Traveler:

images/traveler_fibonacci.png