include performance improvements for FastTrack2 (memory + run-time) #84

taranbis · 2021-01-09T13:20:07Z

• reduced memory consumption on the VarState class
• improved parallelism on the VarState hash map by removing the Fasttrack lock and introduced locking at sub map level
• introduced 3 memory address removal policies + 1 default to cap memory consumption
• modified removal functions to also remove elements from the hash map used for storing call stack information. This includes replacing make_shared() function in Fasttrack::create_thread()
• changed unit tests to be more general, to work via the general interface using log counters 
• included the implementation of the cache efficient prefix tree (not currently used, only experimental, but maybe will useful later after some further optimizations

taranbis · 2021-01-09T13:30:32Z

Hi,

I solved the conflicts as well, as mainly all them were because a context was added to detector initialization.

fmoessbauer · 2021-01-11T10:04:57Z

Hi @taranbis , thanks for this contribution.
That is a lot of code to review, so it will take some time for us to review.

May I ask you to try to cleanup the commit history a bit in the meantime.
There seems to be a lot of "working-commit" commits.
Maybe you could group that up into features and try one commit per feature.

As you wrote, the effectiveness of some optimizations depends strongly on the application under test.
Currently we do not have a clean interface to pass arguments into the detector (we have to register all options on DRace currently).

@mhmdkanj Could you take a look into that and check what is possible with clipp to do so?
I'd prefer a method on the detector where it registers all options along with a docstring.
Also, we could think of a "detector-options" group, which is first constructed when the detector is known (might be a bit tricky because selecting the detector itself is already an option).
Anyways, maybe we should discuss this in a separate issue.

taranbis · 2021-01-17T13:56:37Z

Hi @fmoessbauer ,

I've arranged the commit history. will look over it again, as there were many commits. The project compiles and runs flawlessly in any of these commits.

As a TODO for me is still to check if there is a run-time drawback in using a stack of thread numbers instead of linearly increasing thread numbers. If there's not I will change the first commit to work with a stack of thread numbers, otherwise we will discuss here.

mhmdkanj · 2021-01-22T15:56:58Z

Hi @taranbis ,

Please check the latest master as from now on you could pass detector-specific options using the CLI, and if you're using the GUI, there as well. In brief, you could do that after specifying the detector flag with -d fasttrack -option1 -option2 - you could see that in the DRace README for the usage.

In case you had questions/concerns/bugs with this usage please feel free to contact me!

mhmdkanj

Hi @taranbis

First of all, great effort on your additions and changes!

I've left some comments throughout the files:

The most critical one being a compilation error in fasttrack.h (at least on my side), which needs a double-checking. Please refer to the corresponding comment for more info.
Most of the remaining comments pertain to style changes in order to be consistent with all other files - should be fast to change.
A couple of these comments pertain to questions regarding the intended use (in the case where several classes are put together in one header file) and whether this could be further beautified.

Upon making further changes and committing, please make sure of the following:

The clang style formatting script is executed on the changed file
The PR is rebased with the new master

After resolving these comments, we will further review the changes as to their implementation and usage.

Best Regards,
Mohamad

standalone/detectors/fasttrack/include/fasttrack.h

standalone/detectors/fasttrack/include/MemoryPool.h

standalone/detectors/fasttrack/include/vectorclock.h

standalone/detectors/fasttrack/src/stacktrace.cpp

standalone/detectors/fasttrack/src/threadstate.cpp

standalone/ft_benchmark/ft_benchmark.cpp

standalone/detectors/fasttrack/include/PrefixTreeDepot.h

taranbis · 2021-01-27T11:56:59Z

Hi @mhmdkanj,

I will look within the next days over your comments. For the moment I just want to talk about the compilation error in fasttrack.h. I changed a file in the phmap submodule, namely, phmap.h to be able to drop an entire sub hash map, as I couldn't work with just the public interface. This is why the function is missing.

The changes cannot be commited from my side. They could be placed as a patch to the phmap implementation.

Best greetings,
Mihai

mhmdkanj · 2021-01-28T10:33:39Z

Hi @mhmdkanj,

I will look within the next days over your comments. For the moment I just want to talk about the compilation error in fasttrack.h. I changed a file in the phmap submodule, namely, phmap.h to be able to drop an entire sub hash map, as I couldn't work with just the public interface. This is why the function is missing.

The changes cannot be commited from my side. They could be placed as a patch to the phmap implementation.

Best greetings,
Mihai

@taranbis In the course of solving this issue, try to avoid the need to fork the phmap repo in order to patch it - but rather come up with an implementation on the DRace side, or if greg7mdp can patch it by opening up an issue there, etc... This can be up for discussion as well.

taranbis · 2021-01-29T09:37:59Z

Hi @mhmdkanj ,

well this feature is used to remove data from tracking and I am working with private members of the phmap class as I am dropping an entire sub hash map of memory addresses. @fmoessbauer knows about this. I don't have a solution to keep the feature will not modifying at least something in the fast parallel hash map implementation.

to patch the hash map implementation.
it could also work if the array of sub hash map in the phmap implementation is declared public, as it is currently private. This is why I had to work create the functions within that class. If it is public, I could move all implementations to DRace side However, it would not make much of a sense to declare the array of sub hash maps as public.
just drop the feature, but I think it is faster than random or lowest clock removals. I could come up with some measurements if dropping this feature comes into discussion.

of course, I am open to suggestions and the choice is yours in the end.

All the best,
Mihai

fmoessbauer · 2021-01-29T11:24:05Z

Hi @taranbis , I did already discuss this internally with @mhmdkanj and also back then was aware of the issue.
Nonetheless, we want to avoid having to maintain a fork of the phmap. By that, we have two options:

upstream the necessary feature. As we have a use-case for that, we could try to get into contact with greg7mdp and discuss options for integration. Maybe just declaring some members protected would help, so that we can inherit from the map.
Use another strategy that does not rely on the internals of the phmap. But according to your results, this was the most promising strategy, was it?

Anyways, thanks for this well-written contribution and for your interest in speeding up DRace ;)

taranbis · 2021-02-02T10:29:42Z

Hi @fmoessbauer , Hi @mhmdkanj,

I didn't get the chance to look over all your proposed changes. However, what I have done is talk to greg7mdp and he added the necessary API to be able to drop a sub hash map, so we can maintain all proposed features:

    void clear(std::size_t submap_index) {
        Inner& inner = sets_[submap_index];
        typename Lockable::UniqueLock m(inner);
        inner.set_.clear();
    }

Within the last commit I added the new API functionality and I also changed some other things that failed due to a changed namespace name. Now it should compile fine, so you can test in the meantime while I modify also the other things

Best regards!

fmoessbauer · 2021-02-02T10:38:44Z

Hi @taranbis,

However, what I have done is talk to greg7mdp and he added the necessary API to be able to drop a sub hash map, so we can maintain all proposed features:

Excellent work! Now we can just use the upstream version and benefit from (possible) improvements there, instead of maintaining our own fork which would bitrot at some point in time.

Thanks also for taking the time to fix the other mentioned points. Is there still something conceptual, where we can help you? Or anything that needs a discussion?

All the best, Felix

taranbis · 2021-03-17T19:40:33Z

Hi,

I am sorry for the long due reply, but I have been really busy. First of all, I have walked though all the proposed changes and did all the required changes. However, working though the project again I realized that something could be improved, namely replacing pointers in prefix trees with shared pointers and also counting references in order to be able to remove them when needed.

However, this might take some time for me to investigate, but I will happily do it to see that memory is ok deallocated and that there is a mechanism for future removals.

All the best,
Mihai

fmoessbauer · 2021-03-18T15:37:20Z

Hi Mihai, thanks for taking care of integrating the required changes.
Just take your time on the shared-ptr stuff.

I think we can integrate that in the meantime. Then, just create a new PR for that.

@mhmdkanj : Could you please give the new version a try?

Best, Felix

mhmdkanj

Hi @taranbis

Thank you lots for incorporating these changes! The required code style is thus matched.

Also, it compiles successfully now & I tried executing the integration tests locally with fasttrack, whereby all had succeeded (except for the DotnetClrMonitor one for some reason; we can check that out later @fmoessbauer, most probably the source is elsewhere).

Please take a look at the couple of comments I posted after the new changes. Don't worry though; they are mostly code-style issues due to the new changes and/or things I forgot to review previously. So, they should be fast to incorporate. The most pressing one is rebasing the branch onto the new master so that logging of fasttrack details could be toggled on/off by the user using fasttrack flags rather than being toggled on by default.

Take your time with adjusting these small details, and afterwards we could move to start integrating them.
If you have any inquiries feel free to contact me.

Best Regards,
Mohamad

mhmdkanj · 2021-03-29T12:42:30Z

standalone/ft_benchmark/ft_benchmark.cpp

+
+void read_from_block(std::vector<std::pair<uintptr_t*, uintptr_t*>>* blocks) {
+  try {
+    std::uniform_int_distribution<int> dist(0, blocks->size() - 1);


Line 41 generates a warning that conversion from 'size_t' to '_Ty', possible loss of data; if this is intended, then it would be safe to ignore. Similarly in line 61.

standalone/ft_benchmark/ft_benchmark.cpp

standalone/detectors/fasttrack/include/SelectAllocator.h

standalone/detectors/fasttrack/include/StackTreeNode.h

standalone/detectors/fasttrack/include/PoolAllocator.h

standalone/detectors/fasttrack/include/PrefixTreeDepot.h

standalone/detectors/fasttrack/include/fasttrack.h

mhmdkanj · 2021-03-30T13:21:39Z

Hi @fmoessbauer , I've mentioned you in three revision comments whereby your input might be needed, whenever you are able. Thanks in advance!

- removed pointer for read-shared state from VarState class and replaced it with a hash map mapping memory addresses to the respective vector clocks in case of a read-shared state - reduce read and write epoch of the VarState class from 64 bits (32 bits thread ID + 32 clock value) to 32 bits (11 bits thread num + 21 bits clock value) - remove size member of the VarState class Result: sizeof(VarState) = 8 - increased concurrency by using internal locks of the parallel hash map used to store VarStates - switched std::list to std::deque for callstack information for improved performance - FastTrack tests now work via the general interface (easier to extend) - Functions return_stack_trace() and set_read_write() moved to ThreadState as by doing so it provides better decoupling with the callstack storage implementation, as storing the call stack will just have to implement 4 functions regardless of the implementation. These functions are accesses from Fasttrack class or from ThreadState

- this is not an improvment feature, but a necessary one, especially useful in the case of long-running applications - 4 types of removal policies, 1 default removing retired memory addresses and 3 that can be selected: random removal, hash map pruning and removal by lowest clock - VarStates can be removed even if they are in a read-shared state

- benchmark can be modified to test different performance aspects of DRace, such as more memory accessed and less threads or more threads and less memory accessed

- Implementations for storing call stack information will just have to implement 4 functions: make_trace(), get_current_element(), insert_function_element() and remove_function_element(); - PrefixTree_StackDepot is the first version of adapting a prefix tree to be used for storing call stack information. It uses hash maps instead of constant size arrays; - PrefixTreeDepot.h represents the cache efficient optimized version for the prefix tree implementation.

mhmdkanj · 2021-06-24T10:54:30Z

Hi @fmoessbauer ,
Current status:

- My revisions above have already been merged
- All Integration tests pass
- 9/42 unit tests fail, all of which pertain to the newly written fasttrack tests; I can send you the log when/if needed

taranbis · 2021-06-24T10:59:01Z

Hi @mhmdkanj,

since the newly written fasttrack is responsible for that, I will have a look. I will get until Sunday back to you (sorry the last few months have been crazy). you can send the logs to my private email address.

mhmdkanj · 2021-06-24T11:06:36Z

Hi @mhmdkanj,

since the newly written fasttrack is responsible for that, I will have a look. I will get until Sunday back to you (sorry the last few months have been crazy). you can send the logs to my private email address.

Hi @taranbis ,
I've sent you the text file to your yahoo mail.

Best,
Mohamad

taranbis · 2021-06-28T21:41:12Z

Hi @mhmdkanj ,

I looked over the tests. So, the reason the tests fail is because they use the logging structured In the last commit: refactor: set default logging to false in fasttrack (6f7d3da), you removed the final_output variable and made log_flag false.

The tests used the logging structure as it was an easy way not to depend on the internals of the fasttrack algorithm and pass info to fasttrack like normally. If you undo your commit, the tests should work fine.

I am already thinking of ways to rewrite this and one such way would be the callback. However, it doesn't hold all the info needed to differentiate the cases.

Until then, the integration tests pass if set the log flag to true. I did also a final_output flag to not print it when it was not necessary.

All the best,
Mihai

taranbis · 2021-06-28T21:43:11Z

Oh and one more thing. The callback not only does not hold all the required info, it is also only triggered when there's a race. You cannot distinguish the other use cases such as read-exclusive, read-shared-same-epoch etc.

taranbis · 2021-06-28T23:01:12Z

Hi @mhmdkanj,

Nevermind. I solved it. I added a wrapper on top of fasttrack to work with the testing framework. Now logging is like in your commit (false by default) and integration tests work. See my commit refactor: add class test wrapper (f381862).

Tell me if there is anything else.

P.S.: try not to use force pushes :D.

All the best,
Mihai

mhmdkanj · 2021-06-29T15:11:34Z

Hi @mhmdkanj,

Nevermind. I solved it. I added a wrapper on top of fasttrack to work with the testing framework. Now logging is like in your commit (false by default) and integration tests work. See my commit refactor: add class test wrapper (f381862).

Tell me if there is anything else.

P.S.: try not to use force pushes :D.

All the best,
Mihai

Hi @taranbis ,
Yup this looks good. However:

97% tests passed, 2 tests failed out of 59

Total Test time (real) =  21.98 sec

The following tests FAILED:
         58 - Interface/DetectorTest.Parallelism/"fasttrack.standalone" (Failed)
         59 - Interface/DetectorTest.Parallelism/"tsan" (Failed)

Do these two unit tests pass on your end?

taranbis · 2021-06-29T15:16:03Z

Hi @mhmdkanj ,

No, but this is not fasttrack related. it also fails for tsan. This is a test written by @fmoessbauer to check that concurrency works well on the detector side. I might be able to have a look into those, but i guess @fmoessbauer knows better. I will check nonetheless.

mhmdkanj · 2021-06-29T15:24:03Z

@taranbis True, it's probably not a direct consequence but I was wondering if you were experiencing the same things.

mhmdkanj · 2021-06-29T15:39:45Z

Hi @fmoessbauer ,
So on my machine, the following tests cause the missing of a 100% pass rate:

Unit Tests

The following tests FAILED:
         45 - Interface/DetectorTest.ForkInitialize/"tsan" (Failed)
         47 - Interface/DetectorTest.Barrier/"tsan" (Failed)
         49 - Interface/DetectorTest.ResetRange/"tsan" (Failed)
         51 - Interface/DetectorTest.RaceInspection/"tsan" (Failed)
         58 - Interface/DetectorTest.Parallelism/"fasttrack.standalone" (Failed)
         59 - Interface/DetectorTest.Parallelism/"tsan" (Failed)

Most of the time only the last two fail; however, occasionally the others fail as well.

Integration Tests

[  FAILED  ] Integration/DR.HowToSolution/0, where GetParam() = "fasttrack"
[  FAILED  ] Integration/DR.NoIoRaces/0, where GetParam() = "fasttrack"

Usually either one or both of these fail.

fmoessbauer · 2021-06-30T16:13:05Z

Hi @taranbis , @mhmdkanj ,

thanks for taking care of the integration. Regarding the failed tests:

Interface/DetectorTest.*/"tsan" the TSAN tests fail in case TSAN is not able to allocate memory at the specified location. That happens from time to time. Then, all TSAN tests fail. There's not much we can do about it.
Interface/DetectorTest.Parallelism/* I will have a look into that one. Maybe there is an issue in the test itself.
Integration/DR.HowToSolution/, Integration/DR.NoIoRaces/0 looks like we get races on IO. Maybe symbol names changed, so they are not correctly intercepted. I will have a look.

Cheers!

taranbis added the performance performance could be better label Jan 9, 2021

taranbis force-pushed the optz_FT/final branch from 6dfe471 to 556b612 Compare January 15, 2021 16:35

mhmdkanj requested changes Jan 27, 2021

View reviewed changes

mhmdkanj requested changes Mar 30, 2021

View reviewed changes

taranbis and others added 8 commits June 23, 2021 13:26

feat: add a benchmark to measure different performance aspects

5c2645c

- benchmark can be modified to test different performance aspects of DRace, such as more memory accessed and less threads or more threads and less memory accessed

docs: add sparsepp and parallel-hashmap to dependencies

eac914c

refactor: add descriptive logging for ft benchmark read

e1ce75b

docs: adjust optz ft comments

a116cb7

refactor: set default logging to false in fasttrack

6f7d3da

mhmdkanj force-pushed the optz_FT/final branch from 7fe27da to 6f7d3da Compare June 23, 2021 12:25

refactor: add class test wrapper

f381862

style: add reuse header for fasttrack wrapper

ba145fe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

include performance improvements for FastTrack2 (memory + run-time) #84

include performance improvements for FastTrack2 (memory + run-time) #84

taranbis commented Jan 9, 2021

taranbis commented Jan 9, 2021

fmoessbauer commented Jan 11, 2021

taranbis commented Jan 17, 2021

mhmdkanj commented Jan 22, 2021

mhmdkanj left a comment

taranbis commented Jan 27, 2021

mhmdkanj commented Jan 28, 2021

taranbis commented Jan 29, 2021

fmoessbauer commented Jan 29, 2021

taranbis commented Feb 2, 2021

fmoessbauer commented Feb 2, 2021

taranbis commented Mar 17, 2021

fmoessbauer commented Mar 18, 2021

mhmdkanj left a comment

mhmdkanj Mar 29, 2021

mhmdkanj commented Mar 30, 2021

mhmdkanj commented Jun 24, 2021 •

edited

Loading

taranbis commented Jun 24, 2021

mhmdkanj commented Jun 24, 2021 •

edited

Loading

taranbis commented Jun 28, 2021

taranbis commented Jun 28, 2021

taranbis commented Jun 28, 2021

mhmdkanj commented Jun 29, 2021 •

edited

Loading

taranbis commented Jun 29, 2021 •

edited

Loading

mhmdkanj commented Jun 29, 2021

mhmdkanj commented Jun 29, 2021

fmoessbauer commented Jun 30, 2021

include performance improvements for FastTrack2 (memory + run-time) #84

Are you sure you want to change the base?

include performance improvements for FastTrack2 (memory + run-time) #84

Conversation

taranbis commented Jan 9, 2021

taranbis commented Jan 9, 2021

fmoessbauer commented Jan 11, 2021

taranbis commented Jan 17, 2021

mhmdkanj commented Jan 22, 2021

mhmdkanj left a comment

Choose a reason for hiding this comment

taranbis commented Jan 27, 2021

mhmdkanj commented Jan 28, 2021

taranbis commented Jan 29, 2021

fmoessbauer commented Jan 29, 2021

taranbis commented Feb 2, 2021

fmoessbauer commented Feb 2, 2021

taranbis commented Mar 17, 2021

fmoessbauer commented Mar 18, 2021

mhmdkanj left a comment

Choose a reason for hiding this comment

mhmdkanj Mar 29, 2021

Choose a reason for hiding this comment

mhmdkanj commented Mar 30, 2021

mhmdkanj commented Jun 24, 2021 • edited Loading

taranbis commented Jun 24, 2021

mhmdkanj commented Jun 24, 2021 • edited Loading

taranbis commented Jun 28, 2021

taranbis commented Jun 28, 2021

taranbis commented Jun 28, 2021

mhmdkanj commented Jun 29, 2021 • edited Loading

taranbis commented Jun 29, 2021 • edited Loading

mhmdkanj commented Jun 29, 2021

mhmdkanj commented Jun 29, 2021

Unit Tests

Integration Tests

fmoessbauer commented Jun 30, 2021

mhmdkanj commented Jun 24, 2021 •

edited

Loading

mhmdkanj commented Jun 24, 2021 •

edited

Loading

mhmdkanj commented Jun 29, 2021 •

edited

Loading

taranbis commented Jun 29, 2021 •

edited

Loading