Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use std::sort instead of QSortInt #307

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

fuji8
Copy link

@fuji8 fuji8 commented Jan 21, 2022

I profiled the T1050 with the parameters run by AlphaFold. (Only -cpu is changed). I used Score-P as the profiling tool and got the following results.

pr

From this image, we can see that QSortInt in mergeHitsToQuery is taking a long time. With fast enough storage, my hhblits for this condition is about 2100sec, and QSortInt accounts for about 40%.

Instead of this QsortInt, use std::sort.

I ran hhblits installed by conda and using std::sort under the same conditions as before.
In order to avoid I/O effects, I analyze the difference in execution time between the logs that contain this change, instead of the overall execution time. (From

hh-suite/src/hhblits.cpp

Lines 1028 to 1030 in ac76598

HH_LOG(INFO)
<< "Realigning " << nhits
<< " HMM-HMM alignments using Maximum Accuracy algorithm" << std::endl;
to
HH_LOG(INFO) << "Neutralized His-tag between positions " << imax(i0 - 8, 1) << " and " << i-1 << std::endl;
)

  conda use std::sort
iteration 1 1232(sec) 477(sec)
iteration 2 631(sec) 374(sec)
iteration 3 306(sec) 232(sec)

This reduced the execution time. I also ran it using the parallelization policy, but the results were not significantly different from std::sort.

This change is due to the different stability of sort, so the execution results may not truly match.

@milot-mirdita
Copy link
Member

Cool, thank you!

We have implemented a similar fix in MMseqs2's version of the same code, but haven’t backported it:
https://github.com/soedinglab/MMseqs2/blob/d89fcecf9911a99c45ed81c1c0e5054743debc64/src/alignment/MsaFilter.cpp#L212

Could you repeat the benchmark with a stable sort?

@fuji8
Copy link
Author

fuji8 commented Jan 21, 2022

Thank you for the reply.

I changed the sort to stable_sort and ran it 3 times on hpc in the following environment.

  • cpu: 28 cores
  • RAM: 235GB
  1 2 3
iteration 1 488(sec) 484(sec) 484(sec)
iteration 2 374(sec) 371(sec) 376(sec)
iteration 3 223(sec) 225(sec) 218(sec)

Because of the large memory, the computational complexity is probably Nlog(N).

@martin-steinegger
Copy link
Member

@fuji8 This looks great! Thank you for the PR. Would it be possible to avoid the lambda expression in the sort?

@fuji8
Copy link
Author

fuji8 commented Jan 31, 2022

I apologize for the delay in response.

I rewrote the code to be almost equivalent without using the lambda expression.
I ran it only once, just to be sure.

  no lambda
iteration 1 500(sec)
iteration 2 385(sec)
iteration 3 232(sec)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants