-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
include performance improvements for FastTrack2 (memory + run-time) #84
base: master
Are you sure you want to change the base?
Conversation
taranbis
commented
Jan 9, 2021
Hi, I solved the conflicts as well, as mainly all them were because a context was added to detector initialization. |
Hi @taranbis , thanks for this contribution. May I ask you to try to cleanup the commit history a bit in the meantime. As you wrote, the effectiveness of some optimizations depends strongly on the application under test. @mhmdkanj Could you take a look into that and check what is possible with |
6dfe471
to
556b612
Compare
Hi @fmoessbauer , I've arranged the commit history. will look over it again, as there were many commits. The project compiles and runs flawlessly in any of these commits. As a TODO for me is still to check if there is a run-time drawback in using a stack of thread numbers instead of linearly increasing thread numbers. If there's not I will change the first commit to work with a stack of thread numbers, otherwise we will discuss here. |
Hi @taranbis , Please check the latest In case you had questions/concerns/bugs with this usage please feel free to contact me! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @taranbis
First of all, great effort on your additions and changes!
I've left some comments throughout the files:
- The most critical one being a compilation error in
fasttrack.h
(at least on my side), which needs a double-checking. Please refer to the corresponding comment for more info. - Most of the remaining comments pertain to style changes in order to be consistent with all other files - should be fast to change.
- A couple of these comments pertain to questions regarding the intended use (in the case where several classes are put together in one header file) and whether this could be further beautified.
Upon making further changes and committing, please make sure of the following:
- The clang style formatting script is executed on the changed file
- The PR is rebased with the new master
After resolving these comments, we will further review the changes as to their implementation and usage.
Best Regards,
Mohamad
Hi @mhmdkanj, I will look within the next days over your comments. For the moment I just want to talk about the compilation error in The changes cannot be commited from my side. They could be placed as a patch to the phmap implementation. Best greetings, |
@taranbis In the course of solving this issue, try to avoid the need to fork the |
Hi @mhmdkanj , well this feature is used to remove data from tracking and I am working with private members of the
of course, I am open to suggestions and the choice is yours in the end. All the best, |
Hi @taranbis , I did already discuss this internally with @mhmdkanj and also back then was aware of the issue.
Anyways, thanks for this well-written contribution and for your interest in speeding up DRace ;) |
Hi @fmoessbauer , Hi @mhmdkanj, I didn't get the chance to look over all your proposed changes. However, what I have done is talk to void clear(std::size_t submap_index) {
Inner& inner = sets_[submap_index];
typename Lockable::UniqueLock m(inner);
inner.set_.clear();
} Within the last commit I added the new API functionality and I also changed some other things that failed due to a changed namespace name. Now it should compile fine, so you can test in the meantime while I modify also the other things Best regards! |
Hi @taranbis,
Excellent work! Now we can just use the upstream version and benefit from (possible) improvements there, instead of maintaining our own fork which would bitrot at some point in time. Thanks also for taking the time to fix the other mentioned points. Is there still something conceptual, where we can help you? Or anything that needs a discussion? All the best, Felix |
Hi, I am sorry for the long due reply, but I have been really busy. First of all, I have walked though all the proposed changes and did all the required changes. However, working though the project again I realized that something could be improved, namely replacing pointers in prefix trees with shared pointers and also counting references in order to be able to remove them when needed. However, this might take some time for me to investigate, but I will happily do it to see that memory is ok deallocated and that there is a mechanism for future removals. All the best, |
Hi Mihai, thanks for taking care of integrating the required changes. I think we can integrate that in the meantime. Then, just create a new PR for that. @mhmdkanj : Could you please give the new version a try? Best, Felix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @taranbis
Thank you lots for incorporating these changes! The required code style is thus matched.
Also, it compiles successfully now & I tried executing the integration tests locally with fasttrack
, whereby all had succeeded (except for the DotnetClrMonitor
one for some reason; we can check that out later @fmoessbauer, most probably the source is elsewhere).
Please take a look at the couple of comments I posted after the new changes. Don't worry though; they are mostly code-style issues due to the new changes and/or things I forgot to review previously. So, they should be fast to incorporate. The most pressing one is rebasing the branch onto the new master
so that logging of fasttrack
details could be toggled on/off by the user using fasttrack
flags rather than being toggled on by default.
Take your time with adjusting these small details, and afterwards we could move to start integrating them.
If you have any inquiries feel free to contact me.
Best Regards,
Mohamad
|
||
void read_from_block(std::vector<std::pair<uintptr_t*, uintptr_t*>>* blocks) { | ||
try { | ||
std::uniform_int_distribution<int> dist(0, blocks->size() - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 41 generates a warning that conversion from 'size_t' to '_Ty', possible loss of data
; if this is intended, then it would be safe to ignore. Similarly in line 61.
Hi @fmoessbauer , I've mentioned you in three revision comments whereby your input might be needed, whenever you are able. Thanks in advance! |
- removed pointer for read-shared state from VarState class and replaced it with a hash map mapping memory addresses to the respective vector clocks in case of a read-shared state - reduce read and write epoch of the VarState class from 64 bits (32 bits thread ID + 32 clock value) to 32 bits (11 bits thread num + 21 bits clock value) - remove size member of the VarState class Result: sizeof(VarState) = 8 - increased concurrency by using internal locks of the parallel hash map used to store VarStates - switched std::list to std::deque for callstack information for improved performance - FastTrack tests now work via the general interface (easier to extend) - Functions return_stack_trace() and set_read_write() moved to ThreadState as by doing so it provides better decoupling with the callstack storage implementation, as storing the call stack will just have to implement 4 functions regardless of the implementation. These functions are accesses from Fasttrack class or from ThreadState
- this is not an improvment feature, but a necessary one, especially useful in the case of long-running applications - 4 types of removal policies, 1 default removing retired memory addresses and 3 that can be selected: random removal, hash map pruning and removal by lowest clock - VarStates can be removed even if they are in a read-shared state
- benchmark can be modified to test different performance aspects of DRace, such as more memory accessed and less threads or more threads and less memory accessed
- Implementations for storing call stack information will just have to implement 4 functions: make_trace(), get_current_element(), insert_function_element() and remove_function_element(); - PrefixTree_StackDepot is the first version of adapting a prefix tree to be used for storing call stack information. It uses hash maps instead of constant size arrays; - PrefixTreeDepot.h represents the cache efficient optimized version for the prefix tree implementation.
Hi @fmoessbauer ,
|
Hi @mhmdkanj, since the newly written fasttrack is responsible for that, I will have a look. I will get until Sunday back to you (sorry the last few months have been crazy). you can send the logs to my private email address. |
Hi @taranbis , Best, |
Hi @mhmdkanj , I looked over the tests. So, the reason the tests fail is because they use the logging structured In the last commit: refactor: set default logging to false in fasttrack (6f7d3da), you removed the final_output variable and made log_flag false. The tests used the logging structure as it was an easy way not to depend on the internals of the fasttrack algorithm and pass info to fasttrack like normally. If you undo your commit, the tests should work fine. I am already thinking of ways to rewrite this and one such way would be the callback. However, it doesn't hold all the info needed to differentiate the cases. Until then, the integration tests pass if set the log flag to true. I did also a final_output flag to not print it when it was not necessary. All the best, |
Oh and one more thing. The callback not only does not hold all the required info, it is also only triggered when there's a race. You cannot distinguish the other use cases such as read-exclusive, read-shared-same-epoch etc. |
Hi @mhmdkanj, Nevermind. I solved it. I added a wrapper on top of fasttrack to work with the testing framework. Now logging is like in your commit (false by default) and integration tests work. See my commit refactor: add class test wrapper (f381862). Tell me if there is anything else. P.S.: try not to use force pushes :D. All the best, |
Hi @taranbis ,
Do these two unit tests pass on your end? |
Hi @mhmdkanj , No, but this is not fasttrack related. it also fails for tsan. This is a test written by @fmoessbauer to check that concurrency works well on the detector side. I might be able to have a look into those, but i guess @fmoessbauer knows better. I will check nonetheless. |
@taranbis True, it's probably not a direct consequence but I was wondering if you were experiencing the same things. |
Hi @fmoessbauer , Unit Tests
Most of the time only the last two fail; however, occasionally the others fail as well. Integration Tests
Usually either one or both of these fail. |
thanks for taking care of the integration. Regarding the failed tests:
Cheers! |