Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Malformed coverage data with a no_mangle function #117788

Closed
fsavy-tehtris opened this issue Nov 10, 2023 · 12 comments · Fixed by #118595
Closed

Malformed coverage data with a no_mangle function #117788

fsavy-tehtris opened this issue Nov 10, 2023 · 12 comments · Fixed by #118595
Labels
A-code-coverage Area: Source-based code coverage (-Cinstrument-coverage) C-bug Category: This is a bug. regression-from-stable-to-beta Performance or correctness regression from stable to beta.

Comments

@fsavy-tehtris
Copy link

Code

I tried to run cargo llvm-cov on this code:

use abi_stable::utils::{ffi_panic_message, PanicInfo};

#[no_mangle]
pub extern "C" fn lib_get_module() {
    ffi_panic_message(&PanicInfo {
        file: "abc",
        line: 0,
    });
}

I expected the tool to work normally, but instead, this happened:

error: Failed to load coverage: 'target/llvm-cov-target/debug/deps/cov_bug_repro-3c4d51dd97237a75': Malformed coverage data

Please note this is the most minimal reproducible example I've been able to find.

Removing either the no_mangle attribute or the ffi_panic_message call does not reproduce the error, but the extern "C" has not effect.
I've tried to remove the abi_stable dependency but putting the required code in the same file or in a different crate (in the same workspace) does not reproduce the error.

Associated abi_stable code

pub mod utils {
    /// Information about a panic, used in `ffi_panic_message`.
    #[derive(Debug, Copy, Clone)]
    pub struct PanicInfo {
        ///
        pub file: &'static str,
        ///
        pub line: u32,
    }

    /// Prints an error message for attempting to panic across the
    /// ffi boundary and aborts the process.
    #[inline(never)]
    #[cold]
    pub fn ffi_panic_message(info: &'static PanicInfo) -> ! {
        eprintln!("\nfile:{}\nline:{}", info.file, info.line);
        eprintln!("Attempted to panic across the ffi boundary.");
        eprintln!("Aborting to handle the panic...\n");
        std::process::exit(1);
    }
}

Version it worked on

It most recently worked on: 1.73.0

Version with regression

rustc --version --verbose:

rustc 1.74.0-beta.5 (efc300e54 2023-11-07)
binary: rustc
commit-hash: efc300e5460fd1ed057b882e9e29adfdd217eeef
commit-date: 2023-11-07
host: x86_64-unknown-linux-gnu
release: 1.74.0-beta.5
LLVM version: 17.0.4

@rustbot modify labels: +regression-from-stable-to-beta -regression-untriaged

@fsavy-tehtris fsavy-tehtris added C-bug Category: This is a bug. regression-untriaged Untriaged performance or correctness regression. labels Nov 10, 2023
@rustbot rustbot added I-prioritize Issue: Indicates that prioritization has been requested for this issue. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. regression-from-stable-to-beta Performance or correctness regression from stable to beta. and removed regression-untriaged Untriaged performance or correctness regression. labels Nov 10, 2023
@lqd
Copy link
Member

lqd commented Nov 10, 2023

cc @Zalathar

@saethlin saethlin added A-code-coverage Area: Source-based code coverage (-Cinstrument-coverage) and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Nov 10, 2023
@Zalathar
Copy link
Contributor

I was able to reproduce this on beta+nightly on my machine (macOS aarch64), so I should be able to dig into the mappings and see what LLVM is complaining about.

(Thanks for the detailed report!)

@Zalathar
Copy link
Contributor

@Zalathar
Copy link
Contributor

When the error occurs in the above code, we have startLoc = (159, 59) and endLoc = (1, 4), causing the check to fail.

That seems to correspond to this code:

https://github.com/rodrimati1992/abi_stable_crates/blob/f2485e20058294ef14d414d391d63ddb2a99ea69/abi_stable/src/macros/internal.rs#L137-L161

@Zalathar
Copy link
Contributor

If I print out the raw values being read by readMappingRegionsSubArray, I see:

[RAW] LineStartDelta = 159; ColumnStart = 59; NumLines = 4294967138; ColumnEnd = 4

That NumLines value is 0xFFFFFF62, which is clearly bogus. Probably the result of integer overflow somewhere.

@Zalathar
Copy link
Contributor

Actually I've traced the bogus (1, 4) coordinates to the actual span processed by make_code_region.

It looks like maybe_push_macro_name_span is producing a bogus span that extends into an adjacent file, completely trashing the span coordinates.

At this point I'm guessing the culprit is #116754.

@Zalathar
Copy link
Contributor

Hmm, but I can reproduce this on nightly-2023-10-15, which predates the merge of #116754, so I should look at changes earlier than that.

@Zalathar
Copy link
Contributor

Manually bisected this down to 2023-09-05 (good) to 2023-09-06 (bad).

@Zalathar
Copy link
Contributor

After some more manual bisection, I'm pretty sure this started failing after #115507. That seems consistent with it being a latent coverage bug that has now been made more common by compiler-wide changes to spans.

@Zalathar
Copy link
Contributor

Zalathar commented Nov 12, 2023

Even in prior revisions I can observe the spans created by check_invoked_macro_name_span sometimes being bogus; they're just bogus in a way that tends not to result in malformed mappings and llvm-cov failures.

@lqd
Copy link
Member

lqd commented Nov 12, 2023

(In this situation, cargo-bisect-rustc should work fine to avoid bisecting manually)

Zalathar added a commit to Zalathar/rust that referenced this issue Nov 13, 2023
This test isn't able to reproduce the malformed mappings that cause `llvm-cov`
to fail, but does reproduce the underlying problem in a way that results in
obviously incorrect mappings if the workaround is not present.
Zalathar added a commit to Zalathar/rust that referenced this issue Nov 13, 2023
This test isn't able to reproduce the malformed mappings that cause `llvm-cov`
to fail, but does reproduce the underlying problem in a way that results in
obviously incorrect mappings if the workaround is not present.
Zalathar added a commit to Zalathar/rust that referenced this issue Nov 13, 2023
Without the workaround applied, this test will produce malformed mappings that
cause `llvm-cov` to fail.

(And if it does emit well-formed mappings, they should be obviously incorrect.)
Zalathar added a commit to Zalathar/rust that referenced this issue Nov 13, 2023
Without the workaround applied, this test will produce malformed mappings that
cause `llvm-cov` to fail.

(And if it does emit well-formed mappings, they should be obviously incorrect.)
Mark-Simulacrum pushed a commit to Mark-Simulacrum/rust that referenced this issue Nov 13, 2023
Without the workaround applied, this test will produce malformed mappings that
cause `llvm-cov` to fail.

(And if it does emit well-formed mappings, they should be obviously incorrect.)
bors added a commit to rust-lang-ci/rust that referenced this issue Nov 13, 2023
…vidtwco

coverage: Avoid creating malformed macro name spans

This is a workaround for rust-lang#117788. It detects a particular scenario where we would create malformed coverage spans that might cause `llvm-cov` to immediately exit with an error, preventing the user from processing coverage reports.

The patch has been kept as simple as possible so that it's trivial to backport to beta (or stable) if desired.

---

The `maybe_push_macro_name_span` method is trying to detect macro invocations, so that it can split a span into two parts just after the `!` of the invocation.

Under some circumstances (probably involving nested macros), it gets confused and produces a span that is larger than the original span, and possibly extends outside its enclosing function and even into an adjacent file.

In extreme cases, that can result in malformed coverage mappings that cause `llvm-cov` to fail. For now, we at least want to detect these egregious cases and avoid them, so that coverage reports can still be produced.
@apiraino apiraino removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Nov 13, 2023
@Zalathar
Copy link
Contributor

The fix for this has been cherry-picked into the upcoming stable release of 1.74, and has also been merged on nightly (1.76).

It will probably still be broken in the initial version of the upcoming beta release (1.75); ideally the fix will also be be accepted on the beta branch at some point in the next 5 weeks.

Mark-Simulacrum pushed a commit to Mark-Simulacrum/rust that referenced this issue Nov 18, 2023
Without the workaround applied, this test will produce malformed mappings that
cause `llvm-cov` to fail.

(And if it does emit well-formed mappings, they should be obviously incorrect.)
quininer pushed a commit to quininer/rust that referenced this issue Nov 29, 2023
Without the workaround applied, this test will produce malformed mappings that
cause `llvm-cov` to fail.

(And if it does emit well-formed mappings, they should be obviously incorrect.)
bors added a commit to rust-lang-ci/rust that referenced this issue Dec 6, 2023
coverage: Be more strict about what counts as a "visible macro"

This is a follow-up to the workaround in rust-lang#117827, and I believe it now properly fixes rust-lang#117788.

The old code treats a span as having a “visible macro” if it is part of a macro-expansion, and its parent callsite's context is the same as the body span's context. But if the body span is itself part of an expansion, the macro in question might not actually be visible from the body span. That results in the macro name's length being meaningless as a span offset.

We now only consider spans whose parent callsite is the same as the source callsite, i.e. the parent has no parent.

---

I've also included some related cleanup for the code added by rust-lang#117827. That code was more complicated than normal, because I wanted it to be easy to backport to stable/beta.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-code-coverage Area: Source-based code coverage (-Cinstrument-coverage) C-bug Category: This is a bug. regression-from-stable-to-beta Performance or correctness regression from stable to beta.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants