Add caliper annotations to quest_candidates_example #1419

bmhan12 · 2024-09-19T21:15:26Z

This PR:

Adds caliper annotations to quest_candidates_example

As part of this, also re-ran my test scripts using the same setup as before to get the average numbers (in seconds) for the spatial index performances. In addition, I added numbers for rzwhippet for 112 threads.

Notably, the initialization times for both bvh and implicit grid are an order of magnitude faster than before for HIP and CUDA (previous PR #1278 for comparison):

Policy - Spatial Index	Initialize Spatial Index	Query candidates	Write candidate pairs	Total average processing runtime	System Tested On
Sequential - BVH (36 threads)	2.527	8.701	0.377	11.605	rzgenie
Sequential - BVH (112 threads)	1.498	5.377	0.208	7.083	rzwhippet
OpenMP - BVH (36 threads)	0.580	0.517	0.340	1.437	rzgenie
OpenMP - BVH (112 threads)	0.483	0.262	0.248	0.993	rzwhippet
CUDA - BVH	0.071	0.056	0.013	0.139	rzansel
HIP - BVH	0.158	0.098	0.319	0.575	rzvernal
Sequential - Implicit Grid (36 threads)	1.266	143.984	138.798	284.048	rzgenie
Sequential - Implicit Grid (112 threads)	0.748	85.579	84.509	170.836	rzwhippet
OpenMP - Implicit Grid (36 threads)	0.609	5.319	5.468	11.396	rzgenie
OpenMP - Implicit Grid (112 threads)	0.329	2.257	2.436	5.022	rzwhippet
CUDA - Implicit Grid	0.129	0.915	1.355	2.399	rzansel
HIP - Implicit Grid	0.146	3.713	3.985	7.844	rzvernal

Same testing setup as last time, but with caliper:

Test command: time ./examples/quest_candidates_example_ex -i ucart23z.cycle_000000.root -q ucart23z_shifted.cycle_000000.root -p <raja policy number> -m <method, either "bvh" or "implicit"> --caliper report
HIP command: flux run -N 1 -g 1
CUDA command: lrun -n 1 -g 1
OpenMP allocation: salloc -N 1 -n 36 for rzgenie, salloc -N 1 -n 112 for rzwhippet
ucart23z is an 8,000,000 element mesh, while ucart23z_shifted is the same mesh but shifted slightly.

bmhan12 · 2024-09-19T21:19:58Z

Here's an example of the CUDA-BVH output with caliper report:

CUDA-BVH output (click for dropdown)

$ lrun -n 1 -g 1 ./examples/quest_candidates_example_ex -i ../../ucart23z.cycle_000000.root  -q ../../ucart23z_shifted.cycle_000000.root -p 2 --caliper report
[INFO] 
     Parsed parameters:
      * First Blueprint mesh to insert: '../../ucart23z.cycle_000000.root'
      * Second Blueprint mesh to query: '../../ucart23z_shifted.cycle_000000.root'
      * Verbose logging: false
      * Spatial method: 'Bounding Volume Hierarchy (BVH)'
      * Resolution: 'Not Applicable'
      * Runtime execution policy: 'cuda'
       
[INFO] Reading Blueprint file to insert: '../../ucart23z.cycle_000000.root'...
 
[INFO] Mesh bounding box is { min:(-1,-1,-1); max:(1,1,1); range:<2,2,2> }.
 
[INFO] Reading Blueprint file to query: '../../ucart23z_shifted.cycle_000000.root'...
 
[INFO] Mesh bounding box is { min:(-0.995,-0.995,-0.995); max:(1.005,1.005,1.005); range:<2,2,2> }.
 
[INFO] Reading in Blueprint files took 5.07 seconds. 
[INFO] Running BVH candidates algorithm in execution Space: [CUDA_EXEC] 
[INFO] 0: Initializing BVH took 0.0708 seconds. 
[INFO] 1: Querying candidate bounding boxes took 0.0557 seconds. 
[INFO] 2: Initializing candidate pairs (on device) took 0.0128 seconds. 
[INFO] 3: Moving candidate pairs to host took 10.5 seconds. 
[INFO] Stats for query
    -- Number of insert-BVH mesh hexes 8,000,000
    -- Number of query mesh hexes 8,000,000
    -- Total possible candidates 64,000,000,000,000
    -- Candidates from BVH query 63,521,199
     
[INFO] Computing candidates took 10.6 seconds. 
[INFO] Mesh had 63,521,199 candidates pairs 
Path                                  Min time/rank Max time/rank Avg time/rank Time %    
quest candidates example                  15.713506     15.713506     15.713506 99.998934 
  load Blueprint meshes                    5.070086      5.070086      5.070086 32.265441 
    load Blueprint hexahedron mesh         4.999803      4.999803      4.999803 31.818169 
  find candidates                         10.631097     10.631097     10.631097 67.655072 
    initializing BVH                       0.070883      0.070883      0.070883  0.451092 
      BVH::initialize                      0.070824      0.070824      0.070824  0.450713 
        LinearBVH::buildImpl               0.070816      0.070816      0.070816  0.450663 
          build_radix_tree                 0.047904      0.047904      0.047904  0.304856 
            RadixTree::allocate            0.018013      0.018013      0.018013  0.114634 
            transform_boxes                0.001531      0.001531      0.001531  0.009742 
            reduce_abbs                    0.006462      0.006462      0.006462  0.041126 
            get_mcodes                     0.000526      0.000526      0.000526  0.003347 
            sort_mcodes                    0.002699      0.002699      0.002699  0.017174 
              array_counting               0.000062      0.000062      0.000062  0.000397 
              raja_stable_sort             0.002626      0.002626      0.002626  0.016711 
            reorder                        0.009216      0.009216      0.009216  0.058650 
            build_tree                     0.000506      0.000506      0.000506  0.003220 
            propagate_abbs                 0.008909      0.008909      0.008909  0.056697 
          LinearBVH::allocate              0.014877      0.014877      0.014877  0.094675 
          emit_bvh_parents                 0.004852      0.004852      0.004852  0.030880 
    query candidates                       0.055728      0.055728      0.055728  0.354644 
      BVH::findBoundingBoxes               0.053715      0.053715      0.053715  0.341837 
        LinearBVH::findCandidatesImpl      0.053567      0.053567      0.053567  0.340893 
          PASS[1]:count_traversal          0.021550      0.021550      0.021550  0.137143 
          exclusive_scan                   0.000110      0.000110      0.000110  0.000698 
          allocate_candidates              0.004724      0.004724      0.004724  0.030064 
          PASS[2]:fill_traversal           0.027167      0.027167      0.027167  0.172889 
    write candidate pairs                  0.012821      0.012821      0.012821  0.081593 
    copy pairs to host                    10.483453     10.483453     10.483453 66.715484

and an example of the CUDA-Implicit Grid output with caliper report:

CUDA-Implicit Grid output (click for dropdown)

$ lrun -n 1 -g 1 ./examples/quest_candidates_example_ex -i ../../ucart23z.cycle_000000.root  -q ../../ucart23z_shifted.cycle_000000.root -p 2 -m implicit --caliper report
[INFO] 
     Parsed parameters:
      * First Blueprint mesh to insert: '../../ucart23z.cycle_000000.root'
      * Second Blueprint mesh to query: '../../ucart23z_shifted.cycle_000000.root'
      * Verbose logging: false
      * Spatial method: 'Implicit Grid'
      * Resolution: '0'
      * Runtime execution policy: 'cuda'
       
[INFO] Reading Blueprint file to insert: '../../ucart23z.cycle_000000.root'...
 
[INFO] Mesh bounding box is { min:(-1,-1,-1); max:(1,1,1); range:<2,2,2> }.
 
[INFO] Reading Blueprint file to query: '../../ucart23z_shifted.cycle_000000.root'...
 
[INFO] Mesh bounding box is { min:(-0.995,-0.995,-0.995); max:(1.005,1.005,1.005); range:<2,2,2> }.
 
[INFO] Reading in Blueprint files took 5.04 seconds. 
[INFO] Running Implicit Grid candidates algorithm in execution Space: [CUDA_EXEC] 
[INFO] 0: Initializing Implicit Grid took 0.13 seconds. 
[INFO] 1: Querying candidate bounding boxes took 0.934 seconds. 
[INFO] 2: Initializing candidate pairs (on device) took 1.36 seconds. 
[INFO] 3: Moving candidate pairs to host took 10.9 seconds. 
[INFO] Stats for query
    -- Number of insert mesh hexes 8,000,000
    -- Number of query mesh hexes 8,000,000
    -- Total possible candidates 64,000,000,000,000
    -- Candidates from Implicit Grid query 63,521,199
     
[INFO] Computing candidates took 13.4 seconds. 
[INFO] Mesh had 63,521,199 candidates pairs 
Path                               Min time/rank Max time/rank Avg time/rank Time %    
quest candidates example               18.438882     18.438882     18.438882 99.999133 
  load Blueprint meshes                 5.042328      5.042328      5.042328 27.345932 
    load Blueprint hexahedron mesh      4.973948      4.973948      4.973948 26.975089 
  find candidates                      13.384182     13.384182     13.384182 72.586102 
    initializing implicit grid          0.129856      0.129856      0.129856  0.704246 
    query candidates                    0.933588      0.933588      0.933588  5.063107 
    write candidate pairs               1.356354      1.356354      1.356354  7.355883 
    copy pairs to host                 10.916162     10.916162     10.916162 59.201351

kennyweiss

Thanks @bmhan12 -- these new numbers are a really nice improvement over the previous ones.

You didn't call it out explicitly, but the improved initialization timings are likely related to the improvements you've been making in converting from UNIFIED to DEVICE memory (tracked here: #1339 )

kennyweiss · 2024-09-19T22:40:39Z

src/axom/quest/examples/quest_candidates_example.cpp

@@ -434,6 +446,7 @@ template <typename ExecSpace>
 axom::Array<IndexPair> findCandidatesBVH(const HexMesh& insertMesh,
                                         const HexMesh& queryMesh)
 {
+  AXOM_ANNOTATE_BEGIN("initializing BVH");


Minor: Would it make sense to remove the explicit timers now that we have caliper?

Having both will cause the outer wrapper to include timings for the inner one, and in this case, the caliper timings will include the SLIC formatting and logging times.

rhornung67

@bmhan12 thanks for adding the caliper stuff and showing the performance data. I need to pour over it a bit more. I will let you now if I have questions.

kennyweiss · 2024-09-20T22:14:10Z

src/axom/quest/examples/quest_candidates_example.cpp


  // copy pairs back to host and into return array
+  AXOM_ANNOTATE_BEGIN("copy pairs to host");


It's surprising that this loop takes so long (around 10 seconds on both platform that you showed!),
It would be interesting to explore where that time is being spent.

The only thing that sticks out to me is candidatePairs.emplace_back(), where we don't reserve the size of candidatePairs ahead of time. Any chance that each write is causing the array to expand by a single element each time w/ a full copy each time? (e.g. rather than reserving a buffer that's twice as big as the current one).

A quick test would be to call candidatePairs.reserve( candidates_v.size() ) before that loop and see what that does to the timings.

A different quick test might be to switch candidatePairs to a std::vector instead of axom::Array and see what the performance looks like.

Looks like that's exactly what's happening:

axom/src/axom/core/Array.hpp

Lines 1484 to 1490 in 6f5eaa3

template <typename T, int DIM, MemorySpace SPACE>

template <typename... Args>

inline void Array<T, DIM, SPACE>::emplace_back(Args&&... args)

{

static_assert(DIM == 1, "emplace_back is only supported for 1D arrays");

emplace(size(), std::forward<Args>(args)...);

}

axom/src/axom/core/Array.hpp

Lines 1428 to 1436 in 6f5eaa3

template <typename T, int DIM, MemorySpace SPACE>

template <typename... Args>

inline void Array<T, DIM, SPACE>::emplace(IndexType pos, Args&&... args)

{

reserveForInsert(1, pos);

OpHelper {m_allocator_id, m_executeOnGPU}.emplace(m_data,

pos,

std::forward<Args>(args)...);

}

axom/src/axom/core/Array.hpp

Lines 1635 to 1660 in 6f5eaa3

template <typename T, int DIM, MemorySpace SPACE>

inline T* Array<T, DIM, SPACE>::reserveForInsert(IndexType n, IndexType pos)

{

assert(n >= 0);

assert(pos >= 0);

assert(pos <= m_num_elements);

if(n == 0)

{

return m_data + pos;

}

IndexType new_size = m_num_elements + n;

if(new_size > m_capacity)

{

dynamicRealloc(new_size);

}

OpHelper {m_allocator_id, m_executeOnGPU}.move(m_data,

pos,

m_num_elements,

pos + n);

updateNumElements(new_size);

return m_data + pos;

}

A quick test would be to call candidatePairs.reserve( candidates_v.size() ) before that loop and see what that does to the timings.
A different quick test might be to switch candidatePairs to a std::vector instead of axom::Array and see what the performance looks like.

Surprisingly, the switch to std::vector with reserve() spacing beforehand saw an order of magnitude improvement in the "copy pairs to host" timing. reserve() with axom::Array did not make any noticeable difference from what I saw.

CUDA-BVH output with std::vector :

CUDA-BVH output (click for dropdown)

$ lrun -n 1 -g 1 ./examples/quest_candidates_example_ex -i ../../ucart23z.cycle_000000.root -q ../../ucart23z_shifted.cycle_000000.root -p 2 -m bvh --caliper report [INFO] Parsed parameters: * First Blueprint mesh to insert: '../../ucart23z.cycle_000000.root' * Second Blueprint mesh to query: '../../ucart23z_shifted.cycle_000000.root' * Verbose logging: false * Spatial method: 'Bounding Volume Hierarchy (BVH)' * Resolution: 'Not Applicable' * Runtime execution policy: 'cuda' [INFO] Reading Blueprint file to insert: '../../ucart23z.cycle_000000.root'... [INFO] Mesh bounding box is { min:(-1,-1,-1); max:(1,1,1); range:<2,2,2> }. [INFO] Reading Blueprint file to query: '../../ucart23z_shifted.cycle_000000.root'... [INFO] Mesh bounding box is { min:(-0.995,-0.995,-0.995); max:(1.005,1.005,1.005); range:<2,2,2> }. [INFO] Finished reading in Blueprint files. [INFO] Running BVH candidates algorithm in execution Space: [CUDA_EXEC] [INFO] 0: Initialized BVH. [INFO] 1: Queried candidate bounding boxes. [INFO] 2: Initialized candidate pairs (on device). [INFO] 3: Moved candidate pairs to host. [INFO] Stats for query -- Number of insert-BVH mesh hexes 8,000,000 -- Number of query mesh hexes 8,000,000 -- Total possible candidates 64,000,000,000,000 -- Candidates from BVH query 63,521,199 [INFO] Mesh had 63,521,199 candidates pairs Path Min time/rank Max time/rank Avg time/rank Time % quest candidates example 5.479472 5.479472 5.479472 99.997047 load Blueprint meshes 5.093233 5.093233 5.093233 92.948415 load Blueprint hexahedron mesh 4.996896 4.996896 4.996896 91.190336 find candidates 0.374126 0.374126 0.374126 6.827578 initializing BVH 0.071898 0.071898 0.071898 1.312089 BVH::initialize 0.071845 0.071845 0.071845 1.311119 LinearBVH::buildImpl 0.071836 0.071836 0.071836 1.310968 build_radix_tree 0.049027 0.049027 0.049027 0.894715 RadixTree::allocate 0.019380 0.019380 0.019380 0.353667 transform_boxes 0.001531 0.001531 0.001531 0.027946 reduce_abbs 0.006291 0.006291 0.006291 0.114814 get_mcodes 0.000524 0.000524 0.000524 0.009565 sort_mcodes 0.002679 0.002679 0.002679 0.048886 array_counting 0.000064 0.000064 0.000064 0.001160 raja_stable_sort 0.002605 0.002605 0.002605 0.047547 reorder 0.009178 0.009178 0.009178 0.167485 build_tree 0.000508 0.000508 0.000508 0.009262 propagate_abbs 0.008895 0.008895 0.008895 0.162326 LinearBVH::allocate 0.014821 0.014821 0.014821 0.270475 emit_bvh_parents 0.004830 0.004830 0.004830 0.088151 query candidates 0.056463 0.056463 0.056463 1.030416 BVH::findBoundingBoxes 0.054495 0.054495 0.054495 0.994504 LinearBVH::findCandidatesImpl 0.054346 0.054346 0.054346 0.991788 PASS[1]:count_traversal 0.021871 0.021871 0.021871 0.399136 exclusive_scan 0.000111 0.000111 0.000111 0.002033 allocate_candidates 0.004732 0.004732 0.004732 0.086359 PASS[2]:fill_traversal 0.027616 0.027616 0.027616 0.503972 write candidate pairs 0.012771 0.012771 0.012771 0.233071 copy pairs to host 0.223014 0.223014 0.223014 4.069871

CUDA-Implicit Grid output with std::vector :

CUDA-Implicit Grid output (click for dropdown)

$ lrun -n 1 -g 1 ./examples/quest_candidates_example_ex -i ../../ucart23z.cycle_000000.root -q ../../ucart23z_shifted.cycle_000000.root -p 2 -m implicit --caliper report [INFO] Parsed parameters: * First Blueprint mesh to insert: '../../ucart23z.cycle_000000.root' * Second Blueprint mesh to query: '../../ucart23z_shifted.cycle_000000.root' * Verbose logging: false * Spatial method: 'Implicit Grid' * Resolution: '0' * Runtime execution policy: 'cuda' [INFO] Reading Blueprint file to insert: '../../ucart23z.cycle_000000.root'... [INFO] Mesh bounding box is { min:(-1,-1,-1); max:(1,1,1); range:<2,2,2> }. [INFO] Reading Blueprint file to query: '../../ucart23z_shifted.cycle_000000.root'... [INFO] Mesh bounding box is { min:(-0.995,-0.995,-0.995); max:(1.005,1.005,1.005); range:<2,2,2> }. [INFO] Finished reading in Blueprint files. [INFO] Running Implicit Grid candidates algorithm in execution Space: [CUDA_EXEC] [INFO] 0: Initialized Implicit Grid. [INFO] 1: Queried candidate bounding boxes. [INFO] 2: Initialized candidate pairs (on device). [INFO] 3: Moved candidate pairs to host. [INFO] Stats for query -- Number of insert mesh hexes 8,000,000 -- Number of query mesh hexes 8,000,000 -- Total possible candidates 64,000,000,000,000 -- Candidates from Implicit Grid query 63,521,199 [INFO] Mesh had 63,521,199 candidates pairs Path Min time/rank Max time/rank Avg time/rank Time % quest candidates example 7.952440 7.952440 7.952440 99.997933 load Blueprint meshes 5.054810 5.054810 5.054810 63.561687 load Blueprint hexahedron mesh 4.985796 4.985796 4.985796 62.693869 find candidates 2.884664 2.884664 2.884664 36.273199 initializing implicit grid 0.126207 0.126207 0.126207 1.586988 query candidates 0.912268 0.912268 0.912268 11.471313 write candidate pairs 1.354428 1.354428 1.354428 17.031247 copy pairs to host 0.443690 0.443690 0.443690 5.579181

Linked this finding to related #287

…dup, remove Timer usage

bmhan12 · 2024-09-23T21:54:41Z

@BradWhitlock

As well as for my own future self-reference, this is the script I used to collect timing data for the table:

script (click for dropdown)

#!/bin/bash

host=$(hostname)

###############################################################################

# HIP- BVH
if [[ "$host" == *"rzvernal"* ]];then
	for n in {1..10}; 
	do
		cd /usr/workspace/han12/axom/build-rzvernal-* && \
		( \
	    time flux run -N 1 -g 1 -t 2 ./examples/quest_candidates_example_ex -i ../../ucart23z.cycle_000000.root  -q ../../ucart23z_shifted.cycle_000000.root -p 3 --caliper report \
		) \
		2>&1 | tee -a ../hip_bvh.txt
	done
fi


# CUDA - BVH
if [[ "$host" == *"rzansel"* ]];then
	for n in {1..10}; 
	do
		cd /usr/workspace/han12/axom/build-rzansel-* && \
		( \
		time lrun -n 1 -g 1 ./examples/quest_candidates_example_ex -i ../../ucart23z.cycle_000000.root  -q ../../ucart23z_shifted.cycle_000000.root -p 2 --caliper report \
		) \
		2>&1 | tee -a ../cuda_bvh.txt
	done
fi

# Openmp - BVH
if [[ "$host" == *"rzwhippet"* ]];then
	for n in {1..10}; 
	do
		cd /usr/workspace/han12/axom/build-rzwhippet-* && \
		( \
		time ./examples/quest_candidates_example_ex -i ../../ucart23z.cycle_000000.root  -q ../../ucart23z_shifted.cycle_000000.root -p 1 --caliper report \
		) \
		2>&1 | tee -a ../omp_bvh.txt
	done
fi

# Sequential - BVH
if [[ "$host" == *"rzwhippet"* ]];then
	for n in {1..10}; 
	do
		cd /usr/workspace/han12/axom/build-rzwhippet-* && \
		( \
	    time ./examples/quest_candidates_example_ex -i ../../ucart23z.cycle_000000.root  -q ../../ucart23z_shifted.cycle_000000.root -p 0 --caliper report \
		) \
		2>&1 | tee -a ../seq_bvh.txt
	done
fi

###############################################################################

# HIP- Implicit
if [[ "$host" == *"rzvernal"* ]];then
	for n in {1..10}; 
	do
		cd /usr/workspace/han12/axom/build-rzvernal-* && \
		( \
		time flux run -N 1 -g 1 -t 2 ./examples/quest_candidates_example_ex -i ../../ucart23z.cycle_000000.root  -q ../../ucart23z_shifted.cycle_000000.root -p 3 -m implicit --caliper report \
		) \
		2>&1 | tee -a ../hip_impl.txt
	done
fi

# CUDA - Implicit
if [[ "$host" == *"rzansel"* ]];then
	for n in {1..10}; 
	do
		cd /usr/workspace/han12/axom/build-rzansel-* && \
		( \
		time lrun -n 1 -g 1 ./examples/quest_candidates_example_ex -i ../../ucart23z.cycle_000000.root  -q ../../ucart23z_shifted.cycle_000000.root -p 2 -m implicit --caliper report \
		) \
		2>&1 | tee -a ../cuda_impl.txt
	done
fi

# OpenMP - Implicit
if [[ "$host" == *"rzwhippet"* ]];then
	for n in {1..10}; 
	do
		cd /usr/workspace/han12/axom/build-rzwhippet-* && \
		( \
		time ./examples/quest_candidates_example_ex -i ../../ucart23z.cycle_000000.root  -q ../../ucart23z_shifted.cycle_000000.root -p 1 -m implicit --caliper report \
		) \
		2>&1 | tee -a ../omp_impl.txt
	done
fi

# Sequential - Implicit
if [[ "$host" == *"rzwhippet"* ]];then
	for n in {1..10}; 
	do
		cd /usr/workspace/han12/axom/build-rzwhippet-* && \
		( \
		time ./examples/quest_candidates_example_ex -i ../../ucart23z.cycle_000000.root  -q ../../ucart23z_shifted.cycle_000000.root -p 0 -m implicit --caliper report \
		) \
		2>&1 | tee -a ../seq_impl.txt
	done
fi

Collects data for ten runs and dumps them into files labeled by RAJA policy and spatial index used.

Script is pretty rough around the edges.

Not completely automated, as it assumes you have an allocation, have configured and compiled a single Axom build for each system, etc.

Also data requires some post-processing, going through my code editor to grab values and a quick plug into Excel to calculate averages.

Add caliper annotations to quest_candidates_example

e84a66c

bmhan12 added the Quest Issues related to Axom's 'quest' component label Sep 19, 2024

kennyweiss mentioned this pull request Sep 19, 2024

Investigate HIP performance difference with spatial indices #1282

Open

kennyweiss approved these changes Sep 19, 2024

View reviewed changes

rhornung67 approved these changes Sep 20, 2024

View reviewed changes

kennyweiss reviewed Sep 20, 2024

View reviewed changes

Return candidate pairs as std::vector instead of axom::Array for spee…

6a17af4

…dup, remove Timer usage

bmhan12 mentioned this pull request Sep 23, 2024

Benchmark Axom Array vs C++ array and std::vector #287

Open

bmhan12 merged commit 842a624 into develop Sep 23, 2024
13 checks passed

bmhan12 deleted the feature/han12/candidates_caliper branch September 23, 2024 22:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add caliper annotations to quest_candidates_example #1419

Add caliper annotations to quest_candidates_example #1419

bmhan12 commented Sep 19, 2024 •

edited

Loading

bmhan12 commented Sep 19, 2024 •

edited

Loading

kennyweiss left a comment

kennyweiss Sep 19, 2024

rhornung67 left a comment

kennyweiss Sep 20, 2024

kennyweiss Sep 20, 2024

bmhan12 Sep 23, 2024 •

edited

Loading

bmhan12 Sep 23, 2024

bmhan12 commented Sep 23, 2024 •

edited

Loading


		// copy pairs back to host and into return array
		AXOM_ANNOTATE_BEGIN("copy pairs to host");

	template <typename T, int DIM, MemorySpace SPACE>
	template <typename... Args>
	inline void Array<T, DIM, SPACE>::emplace_back(Args&&... args)
	{
	static_assert(DIM == 1, "emplace_back is only supported for 1D arrays");
	emplace(size(), std::forward<Args>(args)...);
	}

	template <typename T, int DIM, MemorySpace SPACE>
	inline T* Array<T, DIM, SPACE>::reserveForInsert(IndexType n, IndexType pos)
	{
	assert(n >= 0);
	assert(pos >= 0);
	assert(pos <= m_num_elements);

	if(n == 0)
	{
	return m_data + pos;
	}

	IndexType new_size = m_num_elements + n;
	if(new_size > m_capacity)
	{
	dynamicRealloc(new_size);
	}

	OpHelper {m_allocator_id, m_executeOnGPU}.move(m_data,
	pos,
	m_num_elements,
	pos + n);

	updateNumElements(new_size);
	return m_data + pos;
	}

Add caliper annotations to quest_candidates_example #1419

Add caliper annotations to quest_candidates_example #1419

Conversation

bmhan12 commented Sep 19, 2024 • edited Loading

bmhan12 commented Sep 19, 2024 • edited Loading

kennyweiss left a comment

Choose a reason for hiding this comment

kennyweiss Sep 19, 2024

Choose a reason for hiding this comment

rhornung67 left a comment

Choose a reason for hiding this comment

kennyweiss Sep 20, 2024

Choose a reason for hiding this comment

kennyweiss Sep 20, 2024

Choose a reason for hiding this comment

bmhan12 Sep 23, 2024 • edited Loading

Choose a reason for hiding this comment

bmhan12 Sep 23, 2024

Choose a reason for hiding this comment

bmhan12 commented Sep 23, 2024 • edited Loading

bmhan12 commented Sep 19, 2024 •

edited

Loading

bmhan12 commented Sep 19, 2024 •

edited

Loading

bmhan12 Sep 23, 2024 •

edited

Loading

bmhan12 commented Sep 23, 2024 •

edited

Loading