Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linker relaxation #157

Closed
rui314 opened this issue Sep 30, 2023 · 19 comments
Closed

Linker relaxation #157

rui314 opened this issue Sep 30, 2023 · 19 comments

Comments

@rui314
Copy link

rui314 commented Sep 30, 2023

Hi, I'm the original author of lld and am currently developing the mold linker. I think I have an idea to improve the control flow integrity mechanism proposed by this extension, so let me explain it below.

Problem

With this extension, the compiler is required to emit a lpad instruction at the beginning of a function if the function's address might be taken. This means the compiler has to emit a lpad at the beginning of every global function conservatively because it doesn't know if the address of the function is taken by another translation unit.

In reality, most global functions are not address-taken. This results in an excessive number of unnecessary gadgets in a program and also causes code bloat. I think that's a real issue, and we should fix it.

Proposal

The linker is in a position to be able to determine if a function's address is taken or not. If no address-taking relocation refers to a function, the function is called only directly, and vice versa. The linker should be able to remove the leading lpad instruction from a function if the function's address is not actually taken.

To make this possible, we need to emit a new marker relocation and a R_RISCV_RELAX relocation at the location of the leading lpad, so that the linker is able to tell if the leading lpad is expected to be removed if not necessary. And then the linker should remove the instruction and shrink the function by 4 bytes if the function's address was not taken.

I believe this mechanism should reduce the number of available gadgets in a program drastically.

What do you guys think?

@Yangff
Copy link

Yangff commented Sep 30, 2023

If it is accessed later by dlopen+dlsym ... say a pluginXX.so?

@rui314
Copy link
Author

rui314 commented Sep 30, 2023

A global function is put into the dynamic symbol table by default only when you are creating a DSO. If you are building an executable, a function is not visible from DSOs by default, and if the function's address was not taken within that executable, we can say that the function is not address-taken for sure.

In addition that, even if you are creating a DSO, you'd usually precisely control which function should be visible from other DSOs. There are many mechanisms for it -- e.g. --dynamic-list, --export-dynamic-symbol-list, etc. If a non-exported function's address was not taken within the DSO, we can say that the function is not address-taken for sure, too.

@ved-rivos
Copy link
Collaborator

I think this is a good idea. This should cause this optimization even if LTO was not used. Right?

@rui314
Copy link
Author

rui314 commented Sep 30, 2023

Right. LTO enables the compiler to do the whole program analysis, but the linker can naturally do the same without LTO in this case.

@rui314
Copy link
Author

rui314 commented Oct 1, 2023

What would be the next step? Should I write up a proposal as a change to the spec?

@ved-rivos
Copy link
Collaborator

I think this would be an addition to the psABI for Zicfilp and not to the ISA spec. @kito-cheng is working on the psABI extensions. For now we are accumulating the psAbi targeted documentation as issues in this repo. (see #151 and #112 ). I think attaching a proposal to this issue would be a good idea. As a next step if we should also include it in the prototype.

@rui314
Copy link
Author

rui314 commented Oct 2, 2023

It's not for RISC-V, but I've already implemented a prototype for x86-64. rui314/mold@17f0d85

@ved-rivos
Copy link
Collaborator

It's not for RISC-V, but I've already implemented a prototype for x86-64.

Thanks for sharing the link! If I understood this prototype right then it seems to be able to determine if the section is address take without needing a relaxation. What makes that possible for x86-64?

@kito-cheng
Copy link
Member

kito-cheng commented Oct 2, 2023

We don't have put anything in the psABI land yet since this spec still changing, and also toolchain prototyping are working on progress, but draft PR is welcome, BTW, I think this may put together with @sorear's linker optimization(optimized the jump target to symbol + 4 for direct jump), since those two optimization may have some interaction.

@rui314
Copy link
Author

rui314 commented Oct 3, 2023

@ved-rivos A new relocation is not absolutely necessary indeed. But by convention, the linker does not remove an instruction or replace it with a compact one unless the instruction is referred to by R_RISCV_RELAX. Another thing to consider is that the linker usually handles section contents as binary blobs; so a marker relocation would help it to identify that the leading 4 bytes is a landing pad instruction that can be removed. A future ISA extension might not want it, so doing it unconditionally without any marker may not be a good idea.

But it all depends on how we want to design it. We can make this relaxation an exception that it doesn't need any relocation, just emit a R_RISCV_RELAX with no corresponding relocation, or emit a pair of relocations. For x86-64, the feature has already been implemented, and we don't have such relocation, so there's no choice other than rewriting an endbr64 with a nop without any hints.

@ved-rivos
Copy link
Collaborator

@rui314 Thanks. It makes sense. I agree that being explicit with the relaxation is better and more robust.

@deepak0414
Copy link
Contributor

@rui314
dumb question (without understanding how much information linker with respect callbacks between objects).
Will this work with callbacks?

A <-- is compiled executable
B <-- Some .so which has exposed a function register_callback(struct foo *arg)

struct foo {
func_ptr fn_callback;
};

A has below in sources

struct foo abc={.fn_callback = A_global_function} <-- A_global_function being inside A
register_callback(&abc)

In this example, A_global_function is global but not taken in A object but is taken from B.

@ved-rivos
Copy link
Collaborator

The A_global_function had its address taken when it was used to initialize abc.fn_callback. This function would not be a candidate to be relaxed.

@deepak0414
Copy link
Contributor

thanks ved

@rui314
Copy link
Author

rui314 commented Dec 1, 2023

Correct. Specifically, if a symbol is referred to by any relocation other than R_RISCV_CALL or R_RISCV_CALL_PLT, the symbol is considered as address-taken. In your case, A_global_function would be referred to by R_RISCV_64 and therefore considered as address-taken.

@ved-rivos
Copy link
Collaborator

@rui314 - could you please provide a draft PR to the psABI repo?

@rui314
Copy link
Author

rui314 commented Jul 8, 2024

But we first need a proof-of-concept implementation, no?

@kito-cheng
Copy link
Member

PoC is requirement when merging psABI PR, but not required during creating PR :)

@ved-rivos
Copy link
Collaborator

@rui314 I will close this issue here and will request you to create a PR for the linker relaxation proposal in the psABI so it can be tracked in the right repo. Thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants