-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
asm! should automatically rewrite labels in assembly code #81088
Comments
Any reason to bother with |
Internally the assembler will turn numbered labels into |
@rustbot claim |
@asquared31415 Thanks for taking up this issue! I would suggest doing the rewriting in the backend in |
I already have an implementation working in the simple cases, and was about to submit a pr when I realized it is going to be a lot more work to fix the edge cases and that I should claim these issues so someone else doesn't try to work on them too |
Reposting a comment that I left on #81570: Forcing all labels within an asm block to be local would certainly improve usability for the common case (so that people don't have to use local labels). However, it is legal in clang and gcc to define a label in an asm block and make that label a global symbol. And I've seen that used, on occasion, in real projects, including the Linux kernel. (For instance, suppose you want a global symbol pointing to a location that can be overwritten as a "probe point".) Rewriting would cause such code to silently "work" while making a global symbol with the wrong name, rather than catching the issue. I agree that such code is not especially safe, if you're not certain that the asm code will appear in the binary exactly once. (In C that's a little easier to guarantee.) It's safer in In the meantime, I personally would prefer to make a best-effort attempt to catch the use of non-local labels in I would prefer to just reject non-local labels. |
That approach would work, and perhaps be better for allowing future choices. We can always remove an error and make something work later, but if this causes weird behavior, it's a little harder to revert, especially if people rely on it. As discussion in #81570 progressed I have become more concerned about enforcing things that are wrong, or preventing the use of things that should work, so I support making them a compiler error for now and having a proper discussion about what Perhaps it would be a good idea to catch I think that if it is decided that we should error on global labels rather than rewrite, #81570 should be closed no matter the decision on the detection of |
There is a precedent for label rewriting: Clang does this for labels in MS-style inline asm (source), which is what inspired me to suggest this. Clang is a bit smarter than us since it uses LLVM MC to actually parse the asm as it is rewriting, but it also does a lot more than us (e.g. inferring operand & constraints). Also this LLVM support only exists for x86, so it wouldn't be suitable for us. I still think that automatic label rewriting is the best thing we can do at the moment. We should certainly lint against the use of |
Sure, but MS-style inline asm is overall quite a bit more magic than GNU-style inline asm. GNU-style inline asm has always been passed straight through to the assembler, except for making substitutions where substitutions are requested. In the case of GCC, or Clang with While rewriting labels doesn't directly affect that use case, it still violates the principle of transparency. Users expect assembly inside Also, if naked functions are stabilized, defining a global symbol within There's also the "cowboy code" concern mentioned by @joshtriplett. He said that ensuring asm code would appear only once is "a little easier to guarantee", but I don't think that's quite right. In both Rust and GNU C, it is impossible to guarantee inline assembly will be instantiated exactly once (not counting global inline assembly and perhaps naked functions), because the compiler is always allowed to duplicate code within a function. But Linux does it anyway, because that's their attitude toward a lot of compiler things. People might attempt the same in Rust. Personally, I'd prefer if rustc didn't go out of its way to break it; after all, if the compiler does decide to duplicate the asm, the result will be an assembler error, not UB. I suspect others might strongly prefer if rustc did go out of its way to break it. Regardless, it's a problem if it seems to work but does something unexpected due to the rewriting. Finally, this kind of semi-blind scanning of assembly code is impossible to make robust. For example, it's perfectly legal for a label to have the same name as an instruction ( |
I agree mostly with the assessment that the compiler shouldn't explicitly forbid things that may be legal. Especially with such a raw construct like inline assembly, outright forbidding thing is probably bad unless it's actually an issue. I definitely think that the compiler should emit a warning for labels that could be bad, and require the user to explicitly disable the warning if that's really what they want. This would still allow the compiler to have issues, but I think that warning the user is desirable. I think that in the case of #79869 and |
Wild idea that I just thought of, what if we added extra syntax to asm!(foo: "jmp {foo}"); This avoids the need to parse the assembly code since labels are defined outside the asm string and references to labels use the placeholder syntax. We could then have a lint which recommends people to use this syntax if we find labels inside the assembly code. |
That solution then prevents the use of multiline strings that could have multiple labels. I do kinda like it, but it also seems bad for the reasons mentioned by comex regarding transparency. Inline assembly is not a rust construct, and I don't know how much we should extract from the raw inline assembly. |
If an instance of this can't result in anything other than a compilation error then I think it's reasonable to merely document this and leave it as-is. Or is there some worse failure case that I'm not seeing? I'm not necessarily opposed to a warning here, but I would say that warnings must be actionable somehow. The user isn't capable of controlling whether or not the block gets duplicated, right? Therefore a warning here would really just mean "either silence this warning or don't use loop labels at all". And parsing/validating the assembly strings is something that the RFC seems to go to great lengths to avoid, which would make this a bit of an odd case. |
LLVM has been know to crash in some cases: #74262. The tricky part is that you only get a compilation error if the function containing the The reason I would like to have this issue resolved before stabilization is that the choice we make here will have a significant effect on the way people write inline asm. At the moment I'm still tending towards the solution that I came up with in #81088 (comment): asm!(foo: "jmp {foo}"); Labels in inline assembly are fundamentally different from labels in normal assembly because labels are local to the current asm block rather than global. |
Would this be paired with completely banning ordinary labels? After all, if you're going to parse the strings just to warn the user that they shouldn't be using labels, then perhaps an outright ban is better. |
Quick recap of 3 different options: Option 1: do nothingThe only correct way to write inline assembly is to use GAS-style numbered local labels. asm!("0: jmp 0b");
Option 2: automatic rewriting of labelsNamed labels can be used inside inline assembly and get automatically rewritten to a unique name by the compiler. asm!("foo: jmp foo");
Option 3: special syntax for labelsWe introduce special syntax to asm!(foo: "jmp {foo}");
However there are several downsides:
|
I would much rather be able to use more descriptive labels in inline asm, however, I don't think there's a way to do this without a lot of parsing of inline assembly, which I think should be avoided, and even if that were done, I don't think that it can be done perfectly. With this in mind, I think I am in favor of doing nothing, specifying in documentation that local labels are required and doing no parsing/rewriting. Declare use of other labels to be undefined behavior, because it depends on the exact way the compiler decides to emit things, and ideally get the crash on windows with |
For Option 3, why not do it like this? asm!("{foo}: jmp {foo}"); Then you still have a single string literal. |
Keep in mind that |
FWIW, I think it's worth considering a hybrid approach: we could go with approach 1 for now, stabilize inline assembly with that approach, and consider whether we also want to support approach 3 in the future along with things like asm goto. |
Doing nothing is the option that breaks the least, and doesn't lock into any irreversible design decisions. Could possibly introduce a Rust lint or warning rather than an error for labels that can currently cause label duplications. |
I'm not sure that's true. People will try to use |
As a possible solution that might not involve parsing asm code, binutils' gas does sort of have a solution:
This isn't directly applicable to inline asm, but inline asm could plausibly be instantiated as a macro and an invocation of the macro. A macro argument or some other such trick could be used to pass the name of the macro or a per-macro unique id in. Option 3 could also IMO be improved a bit to be less magic: instantiate |
How about asm!("{foo}: jmp {foo}", foo = label);
|
This is how I planned for |
…nieu Lint against named asm labels This adds a deny-by-default lint to prevent the use of named labels in inline `asm!`. Without a solution to rust-lang#81088 about whether the compiler should rewrite named labels or a special syntax for labels, a lint against them should prevent users from writing assembly that could break for internal compiler reasons, such as inlining or anything else that could change the number of actual inline assembly blocks emitted. This does **not** resolve the issue with rewriting labels, that still needs a decision if the compiler should do any more work to try to make them work.
How about |
We already have this with the |
I don't know if there is interest in community votes, but I would like to see any option other than Option 1 (use numeric labels). Assembly code is hard enough to read without being able to use descriptive labels. |
Defining labels in inline assembly is broken because inlining, function cloning and other compiler optimizations may result in an
asm!
block being emitted more than once in the final binary. This can cause duplicate symbol errors or even compiler crashes (#74262). For this reason it is recommended to only use GAS local labels when usingasm!
(#76704).To improve user experience when using
asm!
, rustc should automatically rewrite label names into a unique name. For example:rustc should search for lines beginning with
<ident>:
to find label names and then replace all instances of<ident>
in the assembly code with.Lasm_label_<ident><unique id>
. The.L
instructs the assembler that the label is local to the current compilation unit and should never be exported, even as a debug symbol. The unique ID can be generated by LLVM using${:uid}
and is guaranteed to be unique within the current compilation unit.The text was updated successfully, but these errors were encountered: