Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change hits file instrumentation to be memory mapped #241

Closed

Conversation

Cyberboss
Copy link

Fixes #210

Added benefit of getting kernel level cleanup and atomic-ity when incrementing hit counters

@WiredUK
Copy link

WiredUK commented Nov 16, 2018

Nice to see someone take this on, I'll try this out when I'm back at work on Monday. Thanks!

@Cyberboss
Copy link
Author

Yeah, not sure why the tests are failing. Works locally

@tonerdo
Copy link
Collaborator

tonerdo commented Nov 19, 2018

Hi @Cyberboss, pull the latest master and see if all tests still pass locally

@codecov
Copy link

codecov bot commented Nov 19, 2018

Codecov Report

Merging #241 into master will increase coverage by 0.06%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #241      +/-   ##
==========================================
+ Coverage   95.27%   95.33%   +0.06%     
==========================================
  Files          16       16              
  Lines        1693     1695       +2     
==========================================
+ Hits         1613     1616       +3     
+ Misses         80       79       -1
Impacted Files Coverage Δ
...overlet.core/Instrumentation/InstrumenterResult.cs 100% <ø> (ø) ⬆️
src/coverlet.core/Instrumentation/Instrumenter.cs 98.93% <100%> (ø) ⬆️
src/coverlet.core/Helpers/InstrumentationHelper.cs 98.46% <100%> (ø) ⬆️
src/coverlet.core/Coverage.cs 99.49% <100%> (ø) ⬆️
src/coverlet.core/Symbols/CecilSymbolHelper.cs 100% <0%> (+0.46%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a334be0...5ad2da5. Read the comment docs.

@Cyberboss
Copy link
Author

Think I nailed it now

@WiredUK
Copy link

WiredUK commented Nov 20, 2018

Think I nailed it now

Anecdotally at least, this works for me. Ran it a couple of times and coverage matched perfectly.

@petli
Copy link
Collaborator

petli commented Nov 20, 2018

I've gave this a spin on my code base, and it seems to count hits correctly. But it is significantly slower than version 2.3.2, where two test projects go from about 50s to respectively 90s and 135s with this change.

I think this might be related to what @pjanotti discussed in #181 about multithreaded tests. Even if Interlocking is less bad than using mutexes, it still adds a synchronization between the CPU cores which slows down tests. It would be interesting if @pjanotti could see what the performance is on those projects, since they seemed to be even more penalised by synchronization between parallel unit tests than the projects I test here.

This could perhaps be improved by something similar to #181 keeping one hit count memory mapped file for each CPU core in a thread-static variable and increment it without Interlocking. But then the Coverage class must find all these files and add upp the hit counts from them.

@pjanotti
Copy link
Contributor

Hi @petli, yes I tried an initial perf improvement with Interlocked and for heavy concurrency scenarios it was really bad, so had to fallback on hit counting per thread. There were some projects on the CoreFx repo that were especially bad (I can't recall exactly which ones, but the names may live in some of my previous comments). I would recommend being careful about adding interlocked in the normal hit path count if considering performance for these concurrent scenarios is relevant. Perhaps an option to use the memory mapped file?

(/cc @danmosemsft @ViktorHofer)

@petli
Copy link
Collaborator

petli commented Nov 22, 2018

I realised the performance test in coverlet can easily be adjusted to test the threaded behaviour. This really exercises the code, since the contention gets worse if there's lots of threads all hitting the same memory pages and here there are twenty tests all looping intensely over the same code:

public class PerformanceTest
{
    [Theory]
    [InlineData(20_000)]
    [InlineData(20_001)]
    [InlineData(20_002)]
    [InlineData(20_003)]
    [InlineData(20_004)]
    [InlineData(20_005)]
    [InlineData(20_006)]
    [InlineData(20_007)]
    [InlineData(20_008)]
    [InlineData(20_009)]
    [InlineData(20_010)]
    [InlineData(20_011)]
    [InlineData(20_012)]
    [InlineData(20_013)]
    [InlineData(20_014)]
    [InlineData(20_015)]
    [InlineData(20_016)]
    [InlineData(20_017)]
    [InlineData(20_018)]
    [InlineData(20_019)]
    public void TestPerformance(int iterations)

Running those 20 parallel tests on 2.3.2 took 30s on my dev machine, with the Interlocking usage in this PR it takes... *twiddles thumbs* ...16 minutes.

So before this can be merged it needs to be changed to use ThreadStatic to keep track of one memory mapped file per thread, alas.

@Cyberboss
Copy link
Author

Alright, taken care of

@petli
Copy link
Collaborator

petli commented Nov 23, 2018

I was wrong above that the test case needed additional InlineData to stress-test the threaded behaviour, since it already creates a lot of tasks with Task.Run(). There's also a subtle bug on my part lurking in the test (which Resharper highlighted but I didn't notice until now) and it need to change to this to give more reliable numbers:

        for (var i = 0; i < iterations; i++)
        {
            var j = i;
            tasks.Add(Task.Run(() => big.Do(j)));
        }

(PR to fix this: #248)

This led me to do an incorrect analysis of the test run I did before the change to thread-local files, the one I wrote about above taking 16 minutes. It does not seem to be a problem with Interlocking, after all, hit counting itself is just much slower using memory mapped files rather than an int[].

Running just a single run of the performance test with 20000 iterations takes 3s with the old int[] counting-then-writing-to-files, 50 seconds with the pointer incrementing (either the original Interlocking.Increment or doing (*hitLocationArrayOffset)++ the one-file-per-thread change). Using Read+Write instead of pointers takes 1.8 minutes. (The test I ran with 20 InlineData didn't expose any additional contention, that's just 50 seconds * 20 tests = 16 minutes.)

@Cyberboss, have you seen this too or is this just something on my machine? Sorry for leading you down the wrong track here...

@Cyberboss
Copy link
Author

I have yet to look into how to run the performance test but yeah, my own MMF experiments don't bode well with random access incrementing like this. A shared memory implementation (non-file backed) should be infinitely better, I'll see if I can get that working.

@petli
Copy link
Collaborator

petli commented Nov 26, 2018

There's now a description in README.md in the master branch on how to run the performance tests, it was hidden down in a unit test file before:
https://github.com/tonerdo/coverlet/blob/master/README.md#performance-testing

@WiredUK
Copy link

WiredUK commented Nov 29, 2018

@Cyberboss Any progress on the perf testing? Keen to get this merged so we can fix our CI pipeline.

@WiredUK
Copy link

WiredUK commented Dec 16, 2018

@Cyberboss Sorry to pester, but I really need to know if you're going to be able to get this one sorted or not. If not, I'm going to switch out Coverlet with another tool.

@Cyberboss
Copy link
Author

I'll see if I can't find time to readdress this sometime this week, end of year has gotten pretty busy.

@WiredUK
Copy link

WiredUK commented Dec 17, 2018

I'll see if I can't find time to readdress this sometime this week, end of year has gotten pretty busy.

Thanks buddy, much appreciated. Same here or I would have looked myself!

@petli
Copy link
Collaborator

petli commented Dec 20, 2018

@WiredUK @Cyberboss I've played around with this today, but just changing to a pure memory-based file didn't help much. Instead I've tried a hybrid solution where hit counting is done directly in int arrays as before, but passing the counts to the coverage analysis is done via mapped memory. That can hopefully be quicker than writing to a file and thus have time to finish running in UnloadModule even on loaded build servers before it the process is killed.

I have to clean up the code a bit, but will send a PR tomorrow with this.

@petli
Copy link
Collaborator

petli commented Dec 21, 2018

Finally pushed #276.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants