-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up NAM merging a bit #336
Conversation
merge_hits_into_nams() has a nested for loop that iterates over a vector that contains both forward and reverse hits, but it also has an extra check to ensure that it never merges forward and reverse hits into NAMs. We can instead keep forward and reverse hits separate when we generate them and then work on them separately.
Whether a Hit is a forward or reverse hit is known implicitly because forward and reverse hits are stored in two different unordered_maps.
Wow, neat. I guess removing the additional bool from the otherwise nicely packed 4x32-bit |
Here’s how accuracy changes on single-end reads.
Average difference: +0.0027
Without the bool, the struct fits into two 64-bit registers, which could be one reason for the speedup (more registers available for other things). And the cache can also hold more Hits. Memory alignment is maybe not quite the right term for this as "unaligned access" has a somewhat different meaning. It refers to reading from a memory location that is not a multiple of the word size (for example, reading a 64 bit value from address 0x003 or so). Some CPUs don’t support this at all and would crash; Intel CPUs support this, but are slower at it. Structs in C are laid out in memory such that there is no unaligned memory access. Here, although the bool only needs one byte, the struct is padded with three extra (unused bytes) so that it takes up 20 bytes. Removing the bool attribute thus saves 4 bytes. Sorry if the above was clear already. |
Great that that accuracy seems to be stable (or even improve a bit), although it is unclear to me why accuracy would change at all with your changes. Is it something 'random' like changed traversal order or the NAMs? Or is it something more systematic? Do you see similar speed improvements for other read lengths?
No, it was a good explanation. 'Structure padding to avoid unaligned access' might be a better explanation then, if I understood you correctly.. |
It’s the order of the produced NAMs. I get similar variations in accuracy when I shuffle the vector of NAMs (using
I tested 500 bp reads over night and there was also a speedup, but not as large (more like 1-3%).
Yes! |
I see. So I guess this PR is ready to merge then? Approved on my end. |
Yes, it’s ready! |
merge_hits_into_nams()
has two nestedfor
loops that iterate over a single vector that contains both forward and reverse hits. In the inner loop, there’s a check to ensure that forward and reverse hits are never merged into NAMs.We can instead keep forward and reverse hits in two different hash tables and work on them separately. This reduces the number of comparisons done in the inner loop. Interestingly, that change by itself doesn’t really reduce runtime. But what does reduce runtime is the subsequent commit that makes the
Hit
struct smaller by removing itsis_rc
attribute, which we no longer need.Measurements on 100 bp single-end reads:
I’ll post how accuracy changes tomorrow (so far, it looks as if there’s nearly no change).