Skip to content

Commit

Permalink
rust-demangler tool strips crate disambiguators with < 16 digits
Browse files Browse the repository at this point in the history
Addresses Issue rust-lang#77615.
  • Loading branch information
richkadel committed Oct 8, 2020
1 parent 6f62766 commit 796e6ac
Showing 1 changed file with 62 additions and 4 deletions.
66 changes: 62 additions & 4 deletions src/tools/rust-demangler/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,41 @@
//! $ "${TARGET}"/llvm/bin/llvm-cov show --Xdemangler="${TARGET}"/stage0-tools-bin/rust-demangler \
//! --instr-profile=main.profdata ./main --show-line-counts-or-regions
//! ```
//!
//! Note regarding crate disambiguators:
//!
//! Some demangled symbol paths can include "crate disambiguator" suffixes, represented as a large
//! hexadecimal value enclosed in square braces, and appended to the name of the crate. a suffix to the
//! original crate name. For example, the `core` crate, here, includes a disambiguator:
//!
//! ```rust
//! <generics::Firework<f64> as core[a7a74cee373f048]::ops::drop::Drop>::drop
//! ```
//!
//! These disambiguators are known to vary depending on environmental circumstances. As a result,
//! tests that compare results including demangled names can fail across development environments,
//! particularly with cross-platform testing. Also, the resulting crate paths are not syntactically
//! valid, and don't match the original source symbol paths, which can impact development tools.
//!
//! For these reasons, by default, `rust-demangler` uses a heuristic to remove crate disambiguators
//! from their original demangled representation before printing them to standard output. If crate
//! disambiguators are required, add the `-d` (or `--disambiguators`) flag, and the disambiguators
//! will not be removed.
//!
//! Also note that the disambiguators are stripped by a Regex pattern that is tolerant to some
//! variation in the number of hexadecimal digits. The disambiguators come from a hash value, which
//! typically generates a 16-digit hex representation on a 64-bit architecture; however, leading
//! zeros are not included, which can shorten the hex digit length, and a different hash algorithm
//! that might also be dependent on the architecture, might shorten the length even further. A
//! minimum length of 5 digits is assumed, which should be more than sufficient to support hex
//! representations that generate only 8-digits of precision with an extremely rare (but not
//! impossible) result with up to 3 leading zeros.
//!
//! Using a minimum number of digits less than 5 risks the possibility of stripping demangled name
//! components with a similar pattern. For example, some closures instantiated multiple times
//! include their own disambiguators, demangled as non-hashed zero-based indexes in square brackets.
//! These disambiguators seem to have more analytical value (for instance, in coverage analysis), so
//! they are not removed.

use regex::Regex;
use rustc_demangle::demangle;
Expand All @@ -29,7 +64,25 @@ use std::io::{self, Read, Write};
const REPLACE_COLONS: &str = "::";

fn main() -> io::Result<()> {
let mut strip_crate_disambiguators = Some(Regex::new(r"\[[a-f0-9]{16}\]::").unwrap());
// FIXME(richkadel): In Issue #77615 discussed updating the `rustc-demangle` library, to provide
// an option to generate demangled names without including crate disambiguators. If that
// happens, update this tool to use that option (if the `-d` flag is not set) instead stripping
// them via the Regex heuristic. The update the doc comments and help.

// Strip hashed hexadecimal crate disambiguators. Leading zeros are not enforced, and can be
// different across different platform/architecture types, so while 16 hex digits are common,
// they can also be shorter.
//
// Also note that a demangled symbol path may include the `[<digits>]` pattern, with zero-based
// indexes (such as for closures, and possibly for types defined in anonymous scopes). Preferably
// these should not be stripped.
//
// The minimum length of 5 digits supports the possibility that some target architecture (maybe
// a 32-bit or smaller architecture) could generate a hash value with a maximum of 8 digits,
// and more than three leading zeros should be extremely unlikely. Conversely, it should be
// sufficient to assume the zero-based indexes for closures and anonymous scopes will never
// exceed the value 9999.
let mut strip_crate_disambiguators = Some(Regex::new(r"\[[a-f0-9]{5,16}\]::").unwrap());

let mut args = std::env::args();
let progname = args.next().unwrap();
Expand All @@ -41,14 +94,19 @@ fn main() -> io::Result<()> {
eprintln!("Usage: {} [-d|--disambiguators]", progname);
eprintln!();
eprintln!(
"This tool converts a list of Rust mangled symbols (one per line) into a\n
"This tool converts a list of Rust mangled symbols (one per line) into a\n\
corresponding list of demangled symbols."
);
eprintln!();
eprintln!(
"With -d (--disambiguators), Rust symbols mangled with the v0 symbol mangler may\n\
include crate disambiguators (a 16 character hex value in square brackets).\n\
Crate disambiguators are removed by default."
include crate disambiguators (a hexadecimal hash value, typically up to 16 digits\n\
long, enclosed in square brackets)."
);
eprintln!();
eprintln!(
"By default, crate disambiguators are removed, using a heuristics-based regular\n\
expression. (See the `rust-demangler` doc comments for more information.)"
);
eprintln!();
std::process::exit(1)
Expand Down

0 comments on commit 796e6ac

Please sign in to comment.