Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent coverage failures (some test runs not counted) #91092

Closed
scole66 opened this issue Nov 20, 2021 · 12 comments · Fixed by #111469
Closed

Intermittent coverage failures (some test runs not counted) #91092

scole66 opened this issue Nov 20, 2021 · 12 comments · Fixed by #111469
Labels
A-code-coverage Area: Source-based code coverage (-Cinstrument-coverage) C-bug Category: This is a bug.

Comments

@scole66
Copy link

scole66 commented Nov 20, 2021

I have a large-ish project with ~6000 testcases, and for a long time now, code coverage has been hit-or-miss. The issue is that some of the test cases don't seem to get their data included in the profraw files, and so don't show up as having an effect on coverage. The unrecorded tests seem to be essentially random, but with thousands of tests, single-digit-percentage failures are noticed on every test run.

It's been annoying. So I finally sat down to try and reduce to a simplest error, but it still takes multiple files, so is difficult to include inline in a GitHub issue.

The tree:

.
├── Cargo.lock
├── Cargo.toml
├── check_bug.sh
└── src
    ├── main.rs
    └── statething.rs

When I run a test via:

tst ()
{
    rm -f res-*.profraw;
    RUST_BACKTRACE=1 RUSTFLAGS="-Zinstrument-coverage" LLVM_PROFILE_FILE="res-%m.profraw" cargo test "$@";
    cargo profdata -- merge res-*.profraw --output=res.profdata
}
tst statething

which shows 2 tests run:

    Finished test [unoptimized + debuginfo] target(s) in 0.00s
     Running unittests (target/debug/deps/res-6177536c73daf6cd)

running 2 tests
test statething::tests::state_has_false ... ok
test statething::tests::state_has_true ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

Then report on that particular function via:

report ()
{
    cargo cov -- show --use-color --ignore-filename-regex='/rustc/|/\.cargo/|\.rustup/toolchains' --instr-profile=res.profdata $(objects) --show-line-counts-or-regions -Xdemangler=rustfilt "$@"
}
report --name 5State3has

Most of the time, I see the correct result:

<res::statething::State>::has:
    8|      2|    pub fn has(&self, needle: &str) -> bool {
    9|      2|        self.0 == needle
   10|      2|    }

But sometimes (about 1 in 170 times), I see this:

<res::statething::State>::has:
    8|      1|    pub fn has(&self, needle: &str) -> bool {
    9|      1|        self.0 == needle
   10|      1|    }

It seems to be related to having multiple source files; I could not get similar behavior with only a main.rs. I've seen the problem appear on both MacOS (Mohave) and on Windows (in Windows Subsystem for Linux 2)

The two-source-file tree mentioned up above (including the script I run to repeat the test until a failure happens), is on a bug-report branch here: https://github.com/scole66/rust-e262/tree/reduction-for-bugreport

This really feels like whichever thread is controlling writes to the profraw file is missing messages. Queue overrun maybe? (I haven't looked.)

Meta

rustc --version --verbose:

rustc 1.58.0-nightly (a77da2d45 2021-11-19)
binary: rustc
commit-hash: a77da2d454e6caa227a85b16410b95f93495e7e0
commit-date: 2021-11-19
host: x86_64-unknown-linux-gnu
release: 1.58.0-nightly
LLVM version: 13.0.0
@scole66 scole66 added the C-bug Category: This is a bug. label Nov 20, 2021
@scole66
Copy link
Author

scole66 commented Nov 20, 2021

@rustbot label A-code-coverage

@rustbot rustbot added the A-code-coverage Area: Source-based code coverage (-Cinstrument-coverage) label Nov 20, 2021
@Swatinem
Copy link
Contributor

Swatinem commented Dec 7, 2021

I see you are using the %m placeholder for the profile file, which talks about locking and merging these files at runtime. Maybe adding the %p PID placeholder as well might avoid conflicts here. Can you give that a try?

@scole66
Copy link
Author

scole66 commented Dec 7, 2021

Gave it a try; still exhibits the same behavior.

@scole66
Copy link
Author

scole66 commented Dec 7, 2021

%m is definitely an indication to "please merge"; and %p doesn't do that, but %p is really about processes, not threads. I suspect whatever is doing the merging of thread data is what's got the issue here. (Though I get the same effect when I tell the test runner to just use one thread, as well.)

@scole66
Copy link
Author

scole66 commented May 10, 2022

Just learned that the -j option of cargo test is not the same thing as -- --test-threads=1 (which sends that flag to the test-runner). Adding -- --test-threads=1 to my harness removes the intermittent results.

Not closing this issue, though. Rust is supposed to be "fearless concurrency" so this should work correctly even if the test runner is using multiple threads.

taiki-e added a commit to taiki-e/cargo-llvm-cov that referenced this issue Jun 13, 2022
bors bot added a commit to taiki-e/cargo-llvm-cov that referenced this issue Jun 13, 2022
184: Limit the number of threads to work around rust-lang/rust#91092 r=taiki-e a=taiki-e



Co-authored-by: Taiki Endo <te316e89@gmail.com>
taiki-e added a commit to taiki-e/cargo-llvm-cov that referenced this issue Jun 13, 2022
taiki-e added a commit to taiki-e/cargo-llvm-cov that referenced this issue Jun 13, 2022
bors bot added a commit to taiki-e/cargo-llvm-cov that referenced this issue Jun 13, 2022
184: Limit the number of test threads to work around rust-lang/rust#91092 r=taiki-e a=taiki-e



Co-authored-by: Taiki Endo <te316e89@gmail.com>
dpc added a commit to dpc/fedimint that referenced this issue Apr 15, 2023
According to:

https://github.com/taiki-e/cargo-llvm-cov#known-limitations

it's only defaulting to 1 thread, because of rustc issue:

rust-lang/rust#91092

but it seems the issue is that relatively infrequently some tests
will fail to be reported... which if fine with me if it makes
the CI faster. And they are talking about thousands of tests,
while we probably have <100.
dpc added a commit to dpc/fedimint that referenced this issue Apr 15, 2023
According to:

https://github.com/taiki-e/cargo-llvm-cov#known-limitations

it's only defaulting to 1 thread, because of rustc issue:

rust-lang/rust#91092

but it seems the issue is that relatively infrequently some tests
will fail to be reported... which if fine with me if it makes
the CI faster. And they are talking about thousands of tests,
while we probably have <100.
@Dushistov
Copy link
Contributor

Still reproducible with

rustc --version --verbose                                                                                                                                                                
rustc 1.70.0-beta.1 (1b7dd2252 2023-04-19)                                                                                                                                                 
binary: rustc                                                                                                                                                                              
commit-hash: 1b7dd2252b99671ce5d1cb9664c5f8636329436d                                                                                                                                      
commit-date: 2023-04-19                                                                                                                                                                    
host: x86_64-unknown-linux-gnu                                                                                                                                                             
release: 1.70.0-beta.1                                                                                                                                                                     
LLVM version: 16.0.2       

I used such script to run against https://github.com/scole66/rust-e262/tree/reduction-for-bugreport ,
because of I can not find cargo profdata:

set -euo pipefail                                                                                                                                                                          
                                                                                                                                                                                           
rm -fr target                                                                                                                                                                              
mkdir -p target/coverage                                                                                                                                                                   
                                                                                                                                                                                           
export RUSTFLAGS="-Cinstrument-coverage"                                                                                                                                                   
cargo build                                                                                                                                                                                
export LLVM_PROFILE_FILE="target/coverage/%p-%m.profraw"                                                                                                                                   
                                                                                                                                                                                           
for i in $(seq 1 15000); do                                                                                                                                                                
    rm -f target/coverage/*.profraw target/coverage/cobertura.xml                                                                                                                          
    cargo test > /dev/null 2>&1                                                                                                                                                            
    grcov target/coverage --binary-path target/debug -s . -o target/coverage --keep-only 'src/*' --output-types cobertura                                                                  
    if [ ! -z "$(cat target/coverage/cobertura.xml | grep -E 'number="(8|9|10)' | grep -v 'hits="4"')" ]; then                                                                             
        echo "Found BUG at step $i"                                                                                                                                                        
        cat target/coverage/cobertura.xml                                                                                                                                                  
        exit 1                                                                                                                                                                             
    fi                                                                                                                                                                                     
done                                                                                                                                                                                       

at step 854, coverage <res::statething::State>::has: changed from 4 to 2.

@scole66
Copy link
Author

scole66 commented Apr 20, 2023

I can not find cargo profdata

This is in cargo-binutils.

@Dushistov
Copy link
Contributor

I created bug against upstream llvm/llvm-project#62558

@Dushistov
Copy link
Contributor

Looks like this is not llvm bug. There is bool flag InstrProfOptions::Atomic, that is false by default,
and depend on it llvm generates different code:

void InstrProfiling::lowerIncrement(InstrProfIncrementInst *Inc) {
  auto *Addr = getCounterAddress(Inc);

  IRBuilder<> Builder(Inc);
  if (Options.Atomic || AtomicCounterUpdateAll ||
      (Inc->getIndex()->isZeroValue() && AtomicFirstCounter)) {
    Builder.CreateAtomicRMW(AtomicRMWInst::Add, Addr, Inc->getStep(),
                            MaybeAlign(), AtomicOrdering::Monotonic);
  } else {
    Value *IncStep = Inc->getStep();
    Value *Load = Builder.CreateLoad(IncStep->getType(), Addr, "pgocount");
    auto *Count = Builder.CreateAdd(Load, Inc->getStep());
    auto *Store = Builder.CreateStore(Count, Addr);
    if (isCounterPromotionEnabled())
      PromotionCandidates.emplace_back(cast<Instruction>(Load), Store);
  }
  Inc->eraseFromParent();
}

So if set it to true via clang option -fprofile-update=atomic,
then C++ variant of test is unable to reproduce problem.

So is any way to set -fprofile-update=atomic via rustc command line?

@taiki-e
Copy link
Member

taiki-e commented May 11, 2023

-C llvm-args=--instrprof-atomic-counter-update-all or another atomic-counter flag may work, but I have not tested it yet.

$ rustc -C llvm-args='--help-list-hidden' | rg 'atomic-counter'
  --atomic-counter-update-promoted                                  - Do counter update using atomic fetch add  for promoted counters only
  --gcov-atomic-counter                                             - Make counter updates atomic
  --instrprof-atomic-counter-update-all                             - Make all profile counter updates atomic (for testing only)

@Dushistov
Copy link
Contributor

-C llvm-args=--instrprof-atomic-counter-update-all or another atomic-counter flag may work, but I have not tested it yet.

I can not see how any of this options can reach InstrProfOptions here

InstrProfOptions Options;
.

So I create PR to set Atomic to true by default.

@Dushistov
Copy link
Contributor

Looks like --instrprof-atomic-counter-update-all should also works.

void InstrProfiling::lowerIncrement(InstrProfIncrementInst *Inc) {
  auto *Addr = getCounterAddress(Inc);

  IRBuilder<> Builder(Inc);
  if (Options.Atomic || AtomicCounterUpdateAll ||

where AtomicCounterUpdateAll

cl::opt<bool> AtomicCounterUpdateAll(                                                                                                                                                      
    "instrprof-atomic-counter-update-all",                                                                                                                                                 
    cl::desc("Make all profile counter updates atomic (for testing only)"),                                                                                                                
    cl::init(false));       

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue May 11, 2023
…r=wesleywiser

Fix data race in llvm source code coverage

Fixes rust-lang#91092 .

Before this patch, increment of counters for code coverage looks like this:
```
 movq    .L__profc__RNvCsd6wgJFC5r19_3lib6bugaga+8(%rip), %rax
 addq    $1, %rax
movq    %rax, .L__profc__RNvCsd6wgJFC5r19_3lib6bugaga+8(%rip)
```

after this patch:

```
lock            incq    .L__profc__RNvCs3JgIB2SjHh2_3lib6bugaga+8(%rip)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-code-coverage Area: Source-based code coverage (-Cinstrument-coverage) C-bug Category: This is a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants