Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Err stats #76

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from
Open

Err stats #76

wants to merge 6 commits into from

Conversation

JaeseungYeom
Copy link

Taking advantage of idle rank due to load imbalance for computing extra physic model outputs such that those can be compared against surrogate outputs.
Currently, average and variation of the error of surrogate outputs of chosen validation points are reported per evaluation iteration.

@JaeseungYeom JaeseungYeom force-pushed the err_stats branch 3 times, most recently from 572817a to b2dbb6f Compare June 27, 2024 21:26
@@ -198,7 +211,9 @@ class AMSWorkflow
#ifdef __ENABLE_MPI__
comm(MPI_COMM_NULL),
#endif
ePolicy(AMSExecPolicy::AMS_UBALANCED)
ePolicy(AMSExecPolicy::AMS_UBALANCED),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@koparasy fix this before we land it to develop.

@koparasy
Copy link
Member

@JaeseungYeom Thank you for the PR!

I am afraid this version has too much data movement across ranks. The predicate array although boolean can be fairly large in an actual large simulation. Gathering and Scattering them across the distributed system will be a bottleneck. How about this solution here:

unsigned physics_evaluations_per_rank[num_ranks];
// Root gathers only the number of physics evaluations per rank.
MPI_Gather(Root, num_of_my_physics_evaluations, physics_evaluations_per_rank); 

if (rank == 0) {
       total_evaluations = sum(physics_evaluations_per_rank);
       load_balance_slack = math::ceil(total_evaluations/word_size) * world_size - total_evaluations;
       int flipped_indexes[load_balance_slack];
       // randomly select which indexes we will flip.
       for (int i = 0; i < load_balance_slack; i++){
           flipped_indexes[i] = sample(0, total_evaluations -1);
       }
       flipped_indexes = sort(flipped_indexes);
       int additional_evaluations_per_rank[num_ranks];
       int j =0;
       int running_sum = 0;
       for ( i = 0; i < load_balance_slack; i++){
           int flipped_index = flipped_indexes[i];
           // Search among ranks in which index does this element fall in
           do {
             if (flipped_index > running_sum && flipped_index < running_sum +  physics_evaluations_per_rank[j]) {
                  additional_evaluations_per_rank[j] ++;
                  running_sum += physics_evaluations_per_rank[j];
                  break;
             }
             running_sum += physics_evaluations_per_rank[j];
           }
           while (j < num_ranks);
       }
       
       // Now here additional_evaluations_per_rank contains how many extra evaluations need to happen on every rank.
}
     MPI_Scatter(Root, additional_evaluations_per_rank, &my_evaluations);
     // Now here every rank has in my_evaluations the number of additional toggled predicated it needs to perform.
     
     // Every rank toggles and operates independently.
     ...

The algorithm probably is not correct but I tried to use to the extend possible meaningful variable names for you to understand. The concept though is to limit the number of data you send over the network. In this code I have 1 Gather and 1 Scatter. The size of the message increases linearly to the number of ranks. Instead you code has 1 Gather, 1 Scatter, 1 GatherV, 1 ScatterV. Gatherv and ScatterV may end up sending a lot of data! Which we cannot sustain in the inner loop.

Let me know if I can further help.

Thank you!

@JaeseungYeom
Copy link
Author

@JaeseungYeom Thank you for the PR!

I am afraid this version has too much data movement across ranks. The predicate array although boolean can be fairly large in an actual large simulation. Gathering and Scattering them across the distributed system will be a bottleneck. How about this solution here:

unsigned physics_evaluations_per_rank[num_ranks];
// Root gathers only the number of physics evaluations per rank.
MPI_Gather(Root, num_of_my_physics_evaluations, physics_evaluations_per_rank); 

if (rank == 0) {
       total_evaluations = sum(physics_evaluations_per_rank);
       load_balance_slack = math::ceil(total_evaluations/word_size) * world_size - total_evaluations;
       int flipped_indexes[load_balance_slack];
       // randomly select which indexes we will flip.
       for (int i = 0; i < load_balance_slack; i++){
           flipped_indexes[i] = sample(0, total_evaluations -1);
       }
       flipped_indexes = sort(flipped_indexes);
       int additional_evaluations_per_rank[num_ranks];
       int j =0;
       int running_sum = 0;
       for ( i = 0; i < load_balance_slack; i++){
           int flipped_index = flipped_indexes[i];
           // Search among ranks in which index does this element fall in
           do {
             if (flipped_index > running_sum && flipped_index < running_sum +  physics_evaluations_per_rank[j]) {
                  additional_evaluations_per_rank[j] ++;
                  running_sum += physics_evaluations_per_rank[j];
                  break;
             }
             running_sum += physics_evaluations_per_rank[j];
           }
           while (j < num_ranks);
       }
       
       // Now here additional_evaluations_per_rank contains how many extra evaluations need to happen on every rank.
}
     MPI_Scatter(Root, additional_evaluations_per_rank, &my_evaluations);
     // Now here every rank has in my_evaluations the number of additional toggled predicated it needs to perform.
     
     // Every rank toggles and operates independently.
     ...

The algorithm probably is not correct but I tried to use to the extend possible meaningful variable names for you to understand. The concept though is to limit the number of data you send over the network. In this code I have 1 Gather and 1 Scatter. The size of the message increases linearly to the number of ranks. Instead you code has 1 Gather, 1 Scatter, 1 GatherV, 1 ScatterV. Gatherv and ScatterV may end up sending a lot of data! Which we cannot sustain in the inner loop.

Let me know if I can further help.

Thank you!

I got the idea. It is an excellent suggestion. I will implement something along the line of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants