Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow remapping source path prefixes in debug output #38322

Closed
codyps opened this issue Dec 12, 2016 · 77 comments
Closed

Allow remapping source path prefixes in debug output #38322

codyps opened this issue Dec 12, 2016 · 77 comments
Labels
A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) C-enhancement Category: An issue proposing an enhancement or a PR with one. T-dev-tools Relevant to the dev-tools subteam, which will review and decide on the PR/issue.

Comments

@codyps
Copy link
Contributor

codyps commented Dec 12, 2016

In gcc & clang, the flag -fdebug-prefix-map=old=new allows changing the prefix of source files referred to in debug information.

Related to #34902, this allows one to avoid having the particular source directory that a file was built in affect the output object/executable/library contents.

It also allows source-level debugging to work in cases where the source code is installed by the debug packages to a location that differs from where it was built (debian & OpenEmbedded, at least, take advantage of this).

As an alternate, the program debugedit allows modifying the files after generation to adjust the paths (as far as I can tell, Fedora uses this, and potentially other rpm based distros).

@codyps
Copy link
Contributor Author

codyps commented Dec 12, 2016

cc @michaelwoerister

@michaelwoerister michaelwoerister added A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) T-tools C-enhancement Category: An issue proposing an enhancement or a PR with one. labels Dec 12, 2016
@michaelwoerister
Copy link
Member

Thanks for writing up the issue, @jmesmon!

@rust-lang/tools: Can any of you think of a reason not to add something like this as a -C flag?

@alexcrichton
Copy link
Member

Sounds good to me!

@infinity0
Copy link
Contributor

A -C flag is good, but sometimes we have seen that certain buildsystems or individual projects like to save compiler flags (i.e. -fdebug-prefix-map=<PATH>=.) into other parts of the overall build output, thereby making it depend on the build path, even if rustc itself supports this -C flag. But it's reasonable to assume that compiler flags will affect the output, and save this somewhere else for auditing.

Because of this, we recently submitted some patches to GCC to also support the same behaviour as debug-prefix-map, except via an environment variable that explicitly should not be saved to any build output. The patches are still pending, but I haven't yet received any significant negative comments on it, and I'll be pinging GCC again soon about it. It would be good if rustc could also support this same environment variable in the future.

Actually, I didn't know about debugedit before, thanks for bringing that up! It might allow us to normalise files outside of GCC or independently of any other compiler, I will have to look into that.

@sanxiyn
Copy link
Member

sanxiyn commented Dec 13, 2016

I implemented this and it seems to work. I tested by building src/test/run-make/reproducible-build in two different directories. With -C debug-prefix-map=`pwd`=. the output is reproducible, without it isn't.

(reproducible-build test itself only tests stable symbol naming, not bit-for-bit output. In the past Rust could produce different symbol names between runs(!): see #30330.)

@codyps
Copy link
Contributor Author

codyps commented Dec 13, 2016

Unclear if it's important for rust, but in gcc/clang multiple mappings are supported. This ends up being important in C/C++ due to #include pulling in files from different directories. If something similar can happen in rust allowing multiple maps would be useful there too.

Also, it could be a good idea to avoid the splitting on = rather than copying gcc's interface to allow = to be included in the old path (though including = in a path is unlikely, it'd be a good idea to avoid leaving behind that landmine if possible).

@sanxiyn
Copy link
Member

sanxiyn commented Dec 13, 2016

How do multiple mappings work? Are they applied in command line order? Then is the order significant? I think it must be, since a/b=c a=d applied to a/b results in c, but a=d a/b=c applied to a/b results in d/b.

@codyps
Copy link
Contributor Author

codyps commented Dec 13, 2016

In gcc, the handling of multiple debug_prefix_maps is to search the mappings last-on-cmdline to first-on-cmdline, applying the first prefix that matches. The last-to-first strategy is common in gcc (and other command line tools) as it is intended that later options are able to override earlier ones.

@infinity0
Copy link
Contributor

infinity0 commented Dec 13, 2016

My GCC patches linked above, includes modifying the existing GCC behaviour to split on the final = instead of the initial one. I think that is better as well, I can imagine someone wanting to map a path that contains a =, but less likely to map something to such a path.

(edit: previously mentioned a space character, that was for some other thing that I got confused with)

@sanxiyn
Copy link
Member

sanxiyn commented Dec 14, 2016

Thanks for answers! I will implement multiple mappings, last-to-first order, and splitting on the final = now.

@codyps
Copy link
Contributor Author

codyps commented Dec 14, 2016

I'd recommend avoiding splitting on = entirely. Is the cmdline interface of rustc flexible enough to handle either having an argument take 2 parameters or allow 2 flags to work together to implement the same thing? (for example, -C debug_prefix_old=foo -C debug_prefix_new=bar and enforcing ordering + pairing).

It would be a really good idea to keep all of the escaping/special characters in callers of rustc (shells, etc) just to avoid funny limitations like this (= being special).

@michaelwoerister
Copy link
Member

How about requiring the mapping information to be provided in a file (similar to ld's --version-script for example)? Would that be too clunky?
But it seems to me that this is a feature that's only used in specialized settings, so that would seem fine to me.

@codyps
Copy link
Contributor Author

codyps commented Dec 14, 2016

I'd prefer avoiding needing to use (temporary?) external files to configure this feature. I say temporary because debug src mappings aren't something like a target specification where it is a fixed, predetermined value for all platforms: these are something that depend on the build directory & depend on where the source is being mapped to (which is, in the debug source packaging case) is typically a path under /usr/src/package-name-version, and version is potentially adjusted quite a bit.

And one would need to know the escaping in that file format, so it doesn't simplify things wrt allowing arbitrary paths, it just moves them somewhere else.

@michaelwoerister
Copy link
Member

michaelwoerister commented Jan 18, 2017

So, it seems that discussion on this issue has stalled (here and over in #38348) for two reasons:

  1. It's not clear what the command interface should look like. Passing a map of paths without introducing additional escaping rules is harder than it sounds.
  2. The initial implementation in Implement debuginfo path remapping #38348 revealed that the semantics of remapping are more complicated than it might seem at first. Is remapping based on strings or logical paths? E.g. if I replace /abc/def with xyz, what happens when I encounter /abc/./def or /abc/../abc/def/? What about relative paths?

I want to move forward with this so I propose the following solutions:

  1. If no one has a clear, practical reason otherwise, I say we use the CLI as proposed by @infinity0 and myself: have pairs of -Zdebug-prefix-map-from=<...> and -Zdebug-prefix-map-to=<...>, that are matched up nth from to nth to. It is a bit verbose but doesn't require any additional escaping and can handle paths on all platforms.

  2. The semantics are a bit more complicated but I propose that debuginfo paths are generally normalized to not contain . or .. components and that remapping works on absolute versions of these normalized paths. This gives predictable results. UPDATE: Prefix matching works at directory name level, not at the path-string content (see example below).
    Some examples:

map: /abc/def -> /xyz

Absolute paths containing the prefix:
/abc/def/file1.rs -> /xyz/file1.rs
/abc/def/build/../file1.rs -> /xyz/file1.rs
/abc/def/./file1.rs -> /xyz/file1.rs
/abc/./def/file1.rs -> /xyz/file1.rs  // would not match with gcc
/abc/def/mod1/file1.rs -> /xyz/mod1/file1.rs 
/abc/def/mod1/./file1.rs -> /xyz/mod1/file1.rs 

Absolute paths not containing the prefix:
/std/file1.rs -> /std/file1.rs // no change
/std/./file1.rs -> /std/file1.rs // normalization
/std/build/../file1.rs -> /std/file1.rs // normalization

Relative paths containing the prefix:
(DW_AT_comp_dir=/abc/def/build, path=../file.rs) => /xyz/file1.rs
(DW_AT_comp_dir=/abc, path=./def/file.rs) => /xyz/file1.rs  // would not match with gcc
(DW_AT_comp_dir=/std, path=./mod1/file.rs) => /std/mod1/file1.rs

Mapping happens at the directory name level, no partial names allowed:
/abc/def-2/file1.rs -> /abc/def-2/file1.rs
/abc/def.rs -> /abc/def.rs

The formula that produces these results is:

fn debuginfo_path(p: Path, map: [(Path, Path)]) -> Path {
    let p = normalize(make_absolute(p))
    for (from, to) in map {
        // Exit on the *first* match, order determined by commandline option order
        // UPDATE option order is last to first, i.e. later CLI options overrule earlier ones
        if p.starts_with(from) {
           return p.replace_prefix(from, to)
        }
    }
    // No remapping done, but still normalized and absolute now
    p
}

Note that whether paths are later stored as relative to their DW_AT_comp_dir again is an independent question that I don't want to discuss here.

Thoughts? @jmesmon @infinity0 @sanxiyn @jsgf @rust-lang/tools

@codyps
Copy link
Contributor Author

codyps commented Jan 18, 2017

I don't see an issue with that so long as the "commandline option order" is defined as last-to-first. Doing this bit differently from other command line utilities doesn't buy us anything (unlike having 2 seperate args for from & to).

It also looks like the examples decide to fix the mapping at directory name level, but doesn't appear in the english description. I don't have anything in mind that would break that, but given that allowing matching partials would allow appending '/' to the end to get matching-full-path-elements, I'm not sure such a restriction is a good idea.

@michaelwoerister
Copy link
Member

so long as the "commandline option order" is defined as last-to-first

If there's precedent for that I'm fine with going last-to-first.

@michaelwoerister
Copy link
Member

I don't have anything in mind that would break that, but given that allowing matching partials would allow appending '/' to the end to get matching-full-path-elements, I'm not sure such a restriction is a good idea.

I think it's just simpler to use. You don't have to worry if you need to append a / to avoid accidental renamings.

@codyps
Copy link
Contributor Author

codyps commented Jan 18, 2017

If there's precedent for that I'm fine with going last-to-first.

This is the order used by gcc & clang for all of their "more than 1 & pick 1" options (ignoring special cases): debug-maps (as discussed earlier in this thread), optimization levels, debug info levels, include directories, etc.

@codyps
Copy link
Contributor Author

codyps commented Jan 18, 2017

I think it's just simpler to use. You don't have to worry if you need to append a / to avoid accidental renamings.

Sure, it's simpler. The issue is that it's also less flexible, and it's trivial to get the match-full-paths behavior from match-anything, but going the other direction is impossible. Again, I don't have a use case that would want partial matches, but the ease of supporting both should be considered.

@michaelwoerister
Copy link
Member

This is the order used by gcc & clang

I updated the description above to reflect this.

@michaelwoerister
Copy link
Member

Sure, it's simpler. The issue is that it's also less flexible, and it's trivial to get the match-full-paths behavior from match-anything, but going the other direction is impossible.

Yes, I know that string-based matching is more powerful. My argument is that it gives you more subtle ways to get it wrong without a clear benefit. Is it from=/abc/ to=/xyz/? from=/abc/ to=/xyz? from=/abc to=/xyz? But I don't have a strong preference. If someone says they absolutely want this, I'm fine with implementing it.

@jsgf
Copy link
Contributor

jsgf commented Jan 18, 2017

Several points:

I think splitting the option into two is very unpretty, and somewhat ambiguous - if they're just related by being adjacent, it seems like it raises a lot of questions:

  • What if from and to get separated by other options?
  • What if there isn't the same number of from and to options?
  • Can the from and to be reordered (ie, what if they appear as to from)?

In particular it means that any tools that's parsing/processing the commandline needs to know about this special case in order to avoid breaking it.

Normalizing paths by eliminating .. is dangerous if the path contains symlinks: foo/bar/../blat/lib.rs is not the same as foo/blat/lib.rs if bar if a symlink. I'm happy with matching at the directory component level (though string matching is strictly more general), but I think going beyond that is a bad idea.

Edit: I can't think of a problem with eliminating . though. Perhaps that would be useful.

Proposal:

Retain the -Zdebug-prefix-map=OLD=NEW syntax, but also add -Zdebug-prefix-map-separator=X such that subsequent (left to right on the command line) OLD and NEW mappings can be separated by X. This allows a the mappings to contain any character (but not every character), and the tool generating the command line can select a separator that doesn't break the path.

@infinity0
Copy link
Contributor

infinity0 commented Jan 18, 2017

I think @michaelwoerister's suggestions are cleanest; tools that want to add this value to a rustc flag probably don't want to look inside the value to search it and then select a separator.

I agree that matching only full path components are best and less likely prone to error. I am a little concerned that GCC differs a bit from this, but the rustc code example is indeed very simple and it might be possible to ask GCC to adopt a similar approach - I have to send them a patch anyways, I may add this as well. Even if they don't adopt it, we (in the interests of standardising this behaviour) could define a standard that defines a "minimal" mapping behaviour but leave it open saying "the tool might perform further additional mappings".

I am less sure about normalisation, because it has the potential to mess with the semantics of various fields. For example this:

(DW_AT_comp_dir=/abc, path=./def/file.rs) => /xyz/file1.rs  // would not match with gcc

what would you map DW_AT_comp_dir to? / or .? It's unclear, and it messes with the semantics, which is supposed to be "the working directory of the build command". There could be other fields that depend on the original meaning.

There are two cases:

  • We have a mapping for /abc, in which case both fields are reproducible, no need to normalise.
  • We don't have a mapping for abc, in which case I'd say that there is an issue (could even call it a bug) with the tool that sets these mapping flags - it knowingly set the PWD but gave a mapping only for a specific child of the PWD. In this case, I think it's better to leave the situation unaltered so that it's at least detectable by reproducers, rather than trying to do something fancy.

(edit: "other mappings" -> "further additional mappings")

@codyps
Copy link
Contributor Author

codyps commented Apr 19, 2017

I'd imagine that lto in gcc/clang could mirror (the path mapping issues with) rust's function inlining as in the lto case the functions to be inlined (if the compiler chooses to do inlining) would be from another translation unit.

@jsgf
Copy link
Contributor

jsgf commented Apr 20, 2017

I'm still digesting the above, but it looks good so far.

Wishlist request: an option to make paths/filenames in compiler messages the remapped version, rather than the input version.

@steveklabnik
Copy link
Member

Triage: there's been a ton of discussion on this issue, but I have no idea what the current state of it is. can anyone summarize?

@infinity0
Copy link
Contributor

#41555 was the tracking issue and it's closed now.

We are using --remap-path-prefix in Debian and it works to make ripgrep reproducible.

rustc itself is still not reproducible however, and I don't know why. That is tracked in #34902. There have been quite a few regressions.

@michaelwoerister
Copy link
Member

Yes, it's available on stable so I'm closing this issue.

@danakj
Copy link
Contributor

danakj commented Jul 20, 2021

Hello,

We in Chromium are working on integrating Rust into our distributed build system. We really appreciate the work done here, as reproducible builds are a requirement for such a plan. However, while this allows the binary output of rustc to be reproducible, we have noticed that the (fully resolved) command-line itself ends up not being so. This destroys our ability to cache build objects between bots and/or developers, as any change in the command-line requires a recompilation (such as changing optimization flags).

Clang has resolved this by providing a -fdebug-compilation-dir flag, in addition to the equivalent of rustc's --remap-path-prefix (as -fdebug-prefix-map in Clang).

This problem is discussed, in the context of Clang, in this blog post: https://blog.llvm.org/2019/11/deterministic-builds-with-clang-and-lld.html

We would like to propose a similar flag for rustc. For simplicity we'd propose similar naming and behaviour as Clang's, in order to reuse previous work.

rustc --debug-compilation-dir . would be exactly equivalent to rustc --remap-path-prefix $(pwd)=.

In terms of ordering and presidence, it can follow the exact same rules as multiple --remap-path-prefix arguments.

Would it be best to discuss this here, or to open another issue?

Thanks for looking.

@jsgf
Copy link
Contributor

jsgf commented Jul 20, 2021

@danakj I think its worth filing a new issue for this case.

Does --remove-path-prefix $(cwd)=. affect the generated crate if $(cwd) expands to different things? That would be surprising.

Or is it that your build system is taking the specific command-line into account and that's affecting the input side of your reproducability?

@danakj
Copy link
Contributor

danakj commented Jul 20, 2021

The build output is not affected by different $(cwd) there, as they all map to . but the caching mechanisms need to rebuild the ouput if any inputs change, and one of those inputs is the command line. So if my $(cwd) and your $(cwd) are different, we have different command lines, so different inputs, so we can't share build outputs.

@jsgf
Copy link
Contributor

jsgf commented Jul 20, 2021

So if my $(cwd) and your $(cwd) are different, we have different command lines, so different inputs, so we can't share build outputs.

Sure, but that's a specific property of how your build system computes cache keys right?

Edit: For example if you wrap rustc in a shell script and put the --remap-path-prefix $(cwd)=. there then it would solve this problem. (I'm not arguing that it's a better solution than adding the option, I just want to understand the problem domain.)

Manishearth added a commit to Manishearth/rust that referenced this issue Sep 14, 2021
…haelwoerister

Introduce -Z remap-cwd-prefix switch

This switch remaps any absolute paths rooted under the current
working directory to a new value. This includes remapping the
debug info in `DW_AT_comp_dir` and `DW_AT_decl_file`.

Importantly, this flag does not require passing the current working
directory to the compiler, such that the command line can be
run on any machine (with the same input files) and produce the
same results. This is critical property for debugging compiler
issues that crop up on remote machines.

This is based on adetaylor's rust-lang@dbc4ae7

Major Change Proposal: rust-lang/compiler-team#450
Discussed on rust-lang#38322. Would resolve issue rust-lang#87325.
Manishearth added a commit to Manishearth/rust that referenced this issue Sep 14, 2021
…haelwoerister

Introduce -Z remap-cwd-prefix switch

This switch remaps any absolute paths rooted under the current
working directory to a new value. This includes remapping the
debug info in `DW_AT_comp_dir` and `DW_AT_decl_file`.

Importantly, this flag does not require passing the current working
directory to the compiler, such that the command line can be
run on any machine (with the same input files) and produce the
same results. This is critical property for debugging compiler
issues that crop up on remote machines.

This is based on adetaylor's rust-lang@dbc4ae7

Major Change Proposal: rust-lang/compiler-team#450
Discussed on rust-lang#38322. Would resolve issue rust-lang#87325.
Manishearth added a commit to Manishearth/rust that referenced this issue Sep 14, 2021
…haelwoerister

Introduce -Z remap-cwd-prefix switch

This switch remaps any absolute paths rooted under the current
working directory to a new value. This includes remapping the
debug info in `DW_AT_comp_dir` and `DW_AT_decl_file`.

Importantly, this flag does not require passing the current working
directory to the compiler, such that the command line can be
run on any machine (with the same input files) and produce the
same results. This is critical property for debugging compiler
issues that crop up on remote machines.

This is based on adetaylor's rust-lang@dbc4ae7

Major Change Proposal: rust-lang/compiler-team#450
Discussed on rust-lang#38322. Would resolve issue rust-lang#87325.
Manishearth added a commit to Manishearth/rust that referenced this issue Sep 14, 2021
…haelwoerister

Introduce -Z remap-cwd-prefix switch

This switch remaps any absolute paths rooted under the current
working directory to a new value. This includes remapping the
debug info in `DW_AT_comp_dir` and `DW_AT_decl_file`.

Importantly, this flag does not require passing the current working
directory to the compiler, such that the command line can be
run on any machine (with the same input files) and produce the
same results. This is critical property for debugging compiler
issues that crop up on remote machines.

This is based on adetaylor's rust-lang@dbc4ae7

Major Change Proposal: rust-lang/compiler-team#450
Discussed on rust-lang#38322. Would resolve issue rust-lang#87325.
Manishearth added a commit to Manishearth/rust that referenced this issue Sep 15, 2021
…haelwoerister

Introduce -Z remap-cwd-prefix switch

This switch remaps any absolute paths rooted under the current
working directory to a new value. This includes remapping the
debug info in `DW_AT_comp_dir` and `DW_AT_decl_file`.

Importantly, this flag does not require passing the current working
directory to the compiler, such that the command line can be
run on any machine (with the same input files) and produce the
same results. This is critical property for debugging compiler
issues that crop up on remote machines.

This is based on adetaylor's rust-lang@dbc4ae7

Major Change Proposal: rust-lang/compiler-team#450
Discussed on rust-lang#38322. Would resolve issue rust-lang#87325.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) C-enhancement Category: An issue proposing an enhancement or a PR with one. T-dev-tools Relevant to the dev-tools subteam, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

9 participants