-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite local labels in inline assembly #81570
Rewrite local labels in inline assembly #81570
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @oli-obk (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
I have two concerns about this change.
|
I remember seeing that being done in some of the codebases using GCC inline assembly, but we're still in a position to call it UB or something along those lines. |
That's unfortunate. I couldn't find any information in the llvm docs or in the inline assembly rfc either confirming or denying that it was allowed. Though it makes sense that it's at least legal on llvm's side, since that's the cause of this whole issue anyway, that labels are global by default and llvm sees them as a redefinition. |
I have done some thinking on how to possibly solve the fact that this makes all labels local. The possible solutions that I can think of are:
|
r? @nagisa feel free to pingpong back to me if you don't have capacity for this right now, just seemed like you know more about this than me :) |
I think it is fine to say labels are local to the given We can consider adding something for the global labels when and if somebody comes with well motivated use-cases and, probably, an RFC. |
@Amanieu you mention in this comment that the RFC specified only local labels shall be used, but then suggest localizing the labels in this one. Should that be considered the official position of the WG-inline-asm? If not, I perhaps would be more comfortable with a built-in diagnostic that prevents use of non-local labels in the first place, with suggestions on how the user could adjust their code to use the local labels instead. Denying that kind of code would also prevent accidental lock in into a design decision that we end up regretting later. That would also set us up for denying code that uses LLVM-specific features like… |
Labels defined in You can only define labels with a global scope in
The
I raised this suggestion in the @rust-lang/wg-inline-asm Any objections to adding this feature? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs some codegen tests that check that the correct string is passed to LLVM.
@@ -212,6 +213,10 @@ impl AsmBuilderMethods<'tcx> for Builder<'a, 'll, 'tcx> { | |||
} | |||
} | |||
} else { | |||
if let Some(label) = get_llvm_label_from_str(s) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A template string piece may consist of multiple lines (e.g. raw string literals). These strings could define more than one label.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thanks, I didn't think of this case
('a'..='z').contains(&c) || ('A'..='Z').contains(&c) || '_' == c || '.' == c | ||
} | ||
|
||
// All subsequent chars are [a-zA-Z0-9_$.@] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer if we disallowed @
and $
in symbol names. We don't need to support all symbol names here, the user is free to shoot themselves in the foot if they choose to use a symbol name with weird characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer if we disallowed @ and $ in symbol names.
That should happen elsewhere, possibly in the same location parsing for formatting specifiers happens. Should also be a concern for a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern here is that these may not be valid characters in symbol names for all targets and may have other special meanings on those targets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I agree. I still think this is the the wrong place in the codebase to enforce what characters we allow in asm labels. Probably should happen as a separate PR too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't actually prevent the user from using such labels. We just don't provide automatic rewriting into local labels for them.
Also we're not handling all possible label definitions. For example labels can be defined after a ;
on targets where that is a statement separator.
We're not trying to handle all possible cases here since that would require us to effectively parse the asm, which we really want to leave to LLVM. We're just trying to make the most common types of labels work correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't actually prevent the user from using such labels. We just don't provide automatic rewriting into local labels for them.
Aha, I see the distinction you're trying to make now.
We're just trying to make the most common types of labels work correctly.
That seems like a recipe for a pretty poor experience with a very non-obvious cliff at which things may start failing in non-obvious ways. For instance, user with passing familiarity with Rust's asm!
labels being local could write something like this:
#[inline] // not always
pub fn peach() {
unsafe {
asm!("per$$imon: jmp per$$imon")
}
}
and it might very well work out in basic tests, but start failing once the function is released to the field. If we do go the route of making things automagic, its better they're magic all the time, not only sometimes.
that would require us to effectively parse the asm, which we really want to leave to LLVM
I believe just lexing would be sufficient, but I do agree that we'd rather not do that at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the very least I would like @
to be removed from the set of allowed characters for symbols. @
is used as a symbol modifier in x86 intel syntax: sym@gotpcrel
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to confirm what we want to accept as a valid identifier for the purposes of replacing labels. Denying symbols outright should be properly implemented somewhere else, preferably with a compiler warning or error. This code can reflect what symbols are accepted, but it will ignore other things. For example, @foobar:
will be replaced with @${:private}foobar${:uid}
if we choose to not acknowledge @
in labels.
Is the set of symbols we will accept in a label [a-zA-Z0-9_$.]
?
Additionally, do we want to accept labels that are strings like nagisa commented here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that for label renaming we should only support [a-zA-Z0-9_.]
, and only plain labels not quoted ones.
// Fixes https://github.com/rust-lang/rust/issues/74262 | ||
for label in labels.iter() { | ||
let ranges: Vec<_> = template_str | ||
.match_indices(label) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this checks if the character before the match is a non-identifier character. This means that if you have a label bar
then foobar
will be replaced with foo${:private}bar${:uid}
which is really not what we want.
// Valid llvm symbols: https://llvm.org/docs/AMDGPUOperandSyntax.html#symbols | ||
// First char is [a-zA-Z_.] | ||
fn llvm_ident_start(c: char) -> bool { | ||
('a'..='z').contains(&c) || ('A'..='Z').contains(&c) || '_' == c || '.' == c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
('a'..='z').contains(&c) || ('A'..='Z').contains(&c) || '_' == c || '.' == c | |
matches!(c, 'a'..='z' | 'A'..='Z' | '_' | '.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
matches!
is more concise, thanks
|| '_' == c | ||
|| '$' == c | ||
|| '.' == c | ||
|| '@' == c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A ?
mark is a valid continuation character according to clang: https://rust.godbolt.org/z/EsP4sW, though it does appear like a bug and is only intended to be allowed when emulating the microsoft cl
. Nevertheless it is something that LLVM currently will let through.
return Some(substr); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One way to get more arbitrary stuff into a asm label is to quote it as a string, like this:
void test(void) {
__asm__(".intel_syntax noprefix\n"
"\"01☺☺@banana.?@!\": jmp \"01☺☺@banana.?@!\"");
}
or in Rust
#![feature(asm)]
pub fn test() {
unsafe {
asm!(r#"
"☺helloworld?@!$%^&": jmp "☺helloworld?@!$%^&"
"#);
}
}
So that's something we should handle or forbid too.
|
||
/// Gets a LLVM label from a &str if it exists | ||
fn get_llvm_label_from_str(s: &str) -> Option<&str> { | ||
if let Some(colon_idx) = s.find(':') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use trim_start
to strip leading whitespace from the line.
Forcing all labels within an asm block to be local would certainly improve usability for the common case (so that people don't have to use local labels). However, it is legal in clang and gcc to define a label in an asm block and make that label a global symbol. And I've seen that used, on occasion, in real projects, including the Linux kernel. (For instance, suppose you want a global symbol pointing to a location that can be overwritten as a "probe point".) Rewriting would cause such code to silently "work" while making a global symbol with the wrong name, rather than catching the issue. I agree that such code is not especially safe, if you're not certain that the asm code will appear in the binary exactly once. (In C that's a little easier to guarantee.) It's safer in In the meantime, I personally would prefer to make a best-effort attempt to catch the use of non-local labels in EDIT: I posted this to the PR because I would prefer to just reject non-local labels, which would be a change to this PR. If we decide instead to reject |
Let's move this discussion over to #81088. Comments on the PR should be about reviewing the implementation, not the design or discussing whether we should do this. |
I'm not entirely sure what it would take for this PR to land, myself. It seems pretty clear that we'd want to have all inline assembly labels be local, but on the other hand implementing a inline assembly parser in rustc is also not really something I think we should do. Thinking about the ideal way to solve this, I think we'd want to add support for this kind of interpretation of the inline assembly labels to the backend (i.e. LLVM) itself. Then you wouldn't have to think hard about the corner cases such as label names being almost anything if quoted, or different targets supporting their own special flavour of syntax/identifiers. I'm still not sure about only handling the most common cases, primarily because it might feed into a decision to stabilize the inline assembly in its partially finished state. Once stabilized, I don't think we'll have any leeway in changing the behaviour anymore (except maybe on edition boundaries?) – users would come to rely on the labels being either global or local, and behaviour change in any direction would break compatibility. |
Perhaps we should just lint against labels for now until we have made a decision on how best to support labels in |
Yeah, banning/linting against label definitions that don't follow |
@asquared31415 this is waiting on you to reply to the above reviews. Thanks |
☔ The latest upstream changes (presumably #83864) made this pull request unmergeable. Please resolve the merge conflicts. |
Triage: Seems this is blocked on #81088 |
I think that for the time being we have decided to not pursue this path. I am willing to do the work if this (or a similar) approach is decided on. |
It is possible for certain
asm!
blocks to be duplicated and emitted more than once by the compiler through inlining or other optimizations. If a duplicatedasm!
block has a label, the compiler will error with ainvalid symbol redefinition
error. If the-g
flag is passed to the compiler, the compiler will abort due to a crash with no output. #74262 shows an example of one such error and crash. This bug also occurs if two separateasm!
blocks have the same label, preventing the reuse of simple labels likeloop:
.This PR aims to fix these issues by making all labels in
asm!
blocks local labels using LLVM's${:private}
and${:uid}
as suggested in #81088. Local labels do not suffer from the same crashes and errors that previously occurred.Closes #74262
Closes #81088
Addresses part of this comment in the inline assembly tracking issue