Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for the linkage feature #29603

Open
aturon opened this issue Nov 5, 2015 · 23 comments
Open

Tracking issue for the linkage feature #29603

aturon opened this issue Nov 5, 2015 · 23 comments
Labels
A-attributes Area: Attributes (`#[…]`, `#![…]`) A-FFI Area: Foreign function interface (FFI) A-linkage Area: linking into static, shared libraries and binaries B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. S-tracking-perma-unstable Status: The feature will stay unstable indefinitely. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@aturon
Copy link
Member

aturon commented Nov 5, 2015

Tracks stabilization for the linkage attribute.

@aturon aturon added T-lang Relevant to the language team, which will review and decide on the PR/issue. B-unstable Blocker: Implemented in the nightly compiler and unstable. labels Nov 5, 2015
@aturon
Copy link
Member Author

aturon commented Nov 5, 2015

cc @alexcrichton

@alexcrichton
Copy link
Member

Currently this translates to the various linkage models of LLVM, for example the values of the attribute can be:

match name {                                                         
    "appending" => Some(llvm::AppendingLinkage),                     
    "available_externally" => Some(llvm::AvailableExternallyLinkage),
    "common" => Some(llvm::CommonLinkage),                           
    "extern_weak" => Some(llvm::ExternalWeakLinkage),                
    "external" => Some(llvm::ExternalLinkage),                       
    "internal" => Some(llvm::InternalLinkage),                       
    "linkonce" => Some(llvm::LinkOnceAnyLinkage),                    
    "linkonce_odr" => Some(llvm::LinkOnceODRLinkage),                
    "private" => Some(llvm::PrivateLinkage),                         
    "weak" => Some(llvm::WeakAnyLinkage),                            
    "weak_odr" => Some(llvm::WeakODRLinkage),                        
    _ => None,                                                       
}                                                                    

Some worries about this are:

  • These are very LLVM specific, it's unclear how applicable they are to other backends.
  • Beyond external or weak, I've never seen a reason to use the other attributes.
  • These linkage methods are platform-specific and aren't guaranteed to work everywhere.

That being said, it's the only way to do weak symbols on Linux and it's also convenient for exporting a known symbol without worrying about privacy on the Rust side of things. I would personally want to reduce the set of accepted linkage types and then state "well of course linkage is platform-specific!"

@aturon
Copy link
Member Author

aturon commented Nov 5, 2015

cc #29629

@mahkoh
Copy link
Contributor

mahkoh commented Jan 11, 2016

This feature is fundamentally broken. Consider the following code:

static unsigned long B = 0;
unsigned long *A = &B;
void f(void) { }
#![feature(linkage)]

#[link_name = "c_part"]
extern {
    #[linkage = "extern_weak"] static A: *const usize;
    fn f();
}

fn main() {
    unsafe {
        f();
        println!("{:x} @ {:x} @ {:p}", *(*A as *const usize), *A, A);
    }
}

Which prints 0 @ 5617765adaa8 @ 0x5617765ad890 meaning that the A seen by the rust code contains the address of A and not A itself. It's also easy to get LLVM to abort with this.

@mahkoh
Copy link
Contributor

mahkoh commented Jan 11, 2016

This attribute is also used incorrectly here.

@nrc nrc added the T-tools label Aug 17, 2016
@Mark-Simulacrum Mark-Simulacrum added T-dev-tools Relevant to the dev-tools subteam, which will review and decide on the PR/issue. and removed T-tools labels May 24, 2017
@brson
Copy link
Contributor

brson commented Jun 1, 2017

#18804 needs to be resolved before stabilization.

@Mark-Simulacrum Mark-Simulacrum removed the T-dev-tools Relevant to the dev-tools subteam, which will review and decide on the PR/issue. label Jun 1, 2017
@Mark-Simulacrum Mark-Simulacrum added the C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. label Jul 22, 2017
@nagisa
Copy link
Member

nagisa commented Jan 6, 2018

I think we should at least work on stabilising the weak and extern_weak linkage. Both are very useful and a fairly widespread linkage options that seem to be supported (unlike many other options currently exposed) by at least all of the tier 1 targets.

To give an example of a use-case for weak linkage in Rust code (I need weak linkage for libloading), one would be for global statics unique even between multiple versions of the same crate. Consider that, currently, if a binary links log = ^0.3 and log = ^0.4 some way, or another, it will end up having two distinct global loggers. This could trivially be resolved with some use of the weak linkage option (as it ensures – at link time – only one instance of global with the same name).

That being said, #[linkage] should always infect whatever it applies to with unsafety. Consider for example these two crates:

// crate older version
#[linkage="weak"]
static mut FOO: u32 = !0;
// crate newer version
#[linkage="weak"]
static mut FOO: char = 'a';

when linked together, all uses of FOO as seen from the newer-version crate would be UB if the linker happened to choose to link-in the value from the older version. This is functionally a transmute without size checks.

@mahkoh
Copy link
Contributor

mahkoh commented Jan 13, 2018

extern_weak is broken in such a dubious way that even though I pointed out two years ago that it was being used incorrectly in the stdlib, the bug still hasn't been fixed.

~$ cat strong.rs 
extern {
    static __progname: *const u8;
}

fn main() {
    unsafe {
        println!("  __progname:\t\t{:?}", __progname);
        println!(" &__progname:\t\t{:?}", &__progname as *const _);
        if !__progname.is_null() {
            println!(" *__progname:\t\t{:?}", *__progname as char);
        }
    }
}
~$ cat weak.rs 
#![feature(linkage)]

extern {
    #[linkage = "extern_weak"]
    static __progname: *mut *const u8;
}

fn main() {
    unsafe {
        println!("  __progname:\t\t{:?}", __progname);
        println!(" &__progname:\t\t{:?}", &__progname as *const _);
        if !__progname.is_null() {
            println!(" *__progname:\t\t{:?}", *__progname);
            if !(*__progname).is_null() {
                println!("**__progname:\t\t{:?}", **__progname as char);
            }
        }
    }
}
~$ rustc strong.rs 
~$ rustc weak.rs 
~$ ./strong 
  __progname:		0x7ffdfc7438ac
 &__progname:		0x7f10560a5360
 *__progname:		's'
~$ ./weak 
  __progname:		0x7f730b21f360
 &__progname:		0x5574a33bb008
 *__progname:		0x7fffe4ca38b2
**__progname:		'w'

gcc and clang handle this attribute correctly.

PS: The address of __dso_handle isn't actually too significant in __cxa_thread_atexit_impl and &__dso_handle is only a few bytes off.

PSS: Wow, looks like I already explained this above (also two years ago). Maybe the NSA is trying to keep this potential remote code execution unfixed. 🤔

@cramertj
Copy link
Member

Ping @alexcrichton @nagisa what's the status here? Is there a bug here that can be solved / mentored?

@alexcrichton
Copy link
Member

@cramertj I personally consider this a perma-unstable issue for now. In general symbol visibility and ABIs are something that historically rustc hasn't done much to specify and has had a lot of freedom over. We relatively regularly tweak ABIs, symbol names, etc. There's a very thin layer at the end (like a C ABI) which is pretty stable but even that gets sketchy sometimes (#[no_mangle] deep in a module hierarchy?).

I think we've benefitted quite greatly from the symbol visibility flexibility we've had historically in terms of compiler refactorings and heading off regressions. It's hard to introduce a regression when you can't rely on the feature in the first place!

Along those lines I think there's definitely some select use cases where using something like #[linkage] is critical, but from what I've seen they tend to be few and far between and somewhat esoteric. A blanket and general #[linkage] attribute I think is way too powerful to solve this use case and it'd be better to poke around at various motivational use cases to see if there's a more narrow solution.

(plus that and the whole #[linkage] is incredibly platform/LLVM specific and I don't think we full understand all the linkage modes in LLVM and how they apply to all platforms as well)

@acmcarther
Copy link

acmcarther commented Mar 28, 2018

Given that crate owners can't control how many instances of their crate will be included in a given binary, it seems that we really need a mechanism at least for weak linkage in stable Rust.

I got bit by the fallout from #29603 (comment), where rust-libloading gained a cc complation step to workaround this missing feature.

@mjbshaw
Copy link
Contributor

mjbshaw commented Apr 25, 2018

I would love some mechanism to merge statics and variables with the same value (and name, possibly).

Consider the following code:

// In my actual code, this is a more complicated proc macro.
macro_rules! special_number {
  ($value: expr) => {
    {
      // In my actual code, these static variables also have
      // the special `export_name` of
      // "\x01L_special_number_<unique_id>", where
      // `<unique_id>` is a unique identifier to avoid
      // symbol conflicts.
      #[link_section = ".data,__custom_special_section"]
      static SPECIAL_NUMBER: usize = $value;
      &SPECIAL_NUMBER
    }
  };
}

extern {
  fn consume_special_number(value: &usize);
}

pub fn main() {
  unsafe {
    consume_special_number(special_number!(42));
    consume_special_number(special_number!(42));
    consume_special_number(special_number!(42));
  };
}

This will generate three different static variables. I would love it if I could get Rust to merge these static variables into one single static variable. Using a const doesn't work (it does merge the values, but you can't provide link attributes on a const).

There are two ways the merging could be performed:

  1. By value. Statics with the same value (that opt-in to merging) would be merged into a single static variable.
  2. By name. Statics with the same export_name (that opt-in to merging) would be merged into a single static variable (and wouldn't result in duplicate symbol errors).

I would be prefer option 2 (merging by name).

Perhaps this is what the linkonce_odr linkage type is for, but using the same export_name causes

The linkonce_odr and weak_odr linkage types are similar to this, I think, but don't work (in Rust) for merging globals/statics within a single translation unit. Rust could either extend them or introduce a new linkage type that does ODR merging within translation units.

@jethrogb
Copy link
Contributor

jethrogb commented Nov 8, 2020

@nvzqz probably at least some resolution to #31508

@jonas-schievink jonas-schievink added A-attributes Area: Attributes (`#[…]`, `#![…]`) A-linkage Area: linking into static, shared libraries and binaries labels Nov 25, 2020
@danielkeller
Copy link

I have a couple of ideas that might help stabilize weak linkage.

First of all, the syntax. The problem with weak linkage is that its semantics are "the address of this variable might be null" which is not allowed in Rust, so the compiler puts the real symbol inside the first level of indirection, which is confusing. I propose, instead of an attribute, a type std::ffi::Weak<T>, which only allowed at the top level on items in extern blocks, like so:

extern "C" {
    static foo: Weak<fn(usize) -> *const u8>;
}

This would be special-cased by the compiler in the same way it is now, where the symbol foo is of type extern "C" fn(usize) -> *const u8, and the variable foo is a handle to it, but it would make it much more clear what the actual type of the symbol is, since Weak is obviously not a C type. The interface of Weak would be something like

fn as_ref(&self) -> Option<&'static T>;
unsafe fn as_ref_unchecked(&self) -> &'static T;
unsafe fn as_mut(&mut self) -> Option<&'static mut T>;
unsafe fn as_mut_unchecked(&mut self) -> &'static mut T;

Second, the semantics. Instead of "whatever llvm's extern_weak does," Weak should be defined as:
On platforms that support it, an external symbol of type T and the name of the static item is created. If the symbol isn't present at link time or run time, no error is generated. If the program has loaded a dynamic library that defines the symbol, as_ref returns a reference to the symbol, otherwise it returns None. If the platform doesn't support this, a compile-time error is generated.

I think this is the behavior that people actually want, and is supported on Linux, OSX, and Windows. Getting it to work this way requires providing some flags to the linker (-U foo on OSX and /ALTERNATENAME:foo=null_foo on Windows), and it would be much easier and less error-prone for the compiler to do this than the programmer.

@joshtriplett joshtriplett added the S-tracking-perma-unstable Status: The feature will stay unstable indefinitely. label Dec 8, 2021
@joshtriplett
Copy link
Member

Marking the overall linkage attribute as perma-unstable. We should review individual linkages (notably "weak") for stabilization, which may want to occur as a separate attribute or a value of a separate attribute.

@yodaldevoid
Copy link
Contributor

Should a separate issue be opened to specifically discuss "weak" linkage (or any other desired linkage attributes), or should discussion of stable linkage attributes continue to be discussed here?

@comex
Copy link
Contributor

comex commented Mar 15, 2022

A small rant about weak linkage: It's really two or maybe three separate features stuffed into one. They have the same syntax in GNU extensions to C (__attribute__((weak))), and they use the same bit in ELF files (though not Mach-Os), but their semantics and use cases are different.

  1. A weak reference (LLVM's extern_weak) means "this symbol is allowed to not be defined; treat it as null". If the symbol is defined somewhere else, that definition does not have to be weak. On Darwin, this is sometimes used with OS APIs for backwards compatibility with old OS versions (somewhat outdated reference).

  2. A weak definition (LLVM's weak) means "there can be multiple copies of this symbol at (static) link time". Any given symbol name can have any number of weak definitions plus at most one strong definition. If there's a strong definition it wins (except sometimes it doesn't because static library semantics are weird). If there are only weak definitions, an arbitrary-ish one wins. In C++, inline functions are generated as weak since they're guaranteed to have a unique address. On Windows this is known as COMDAT or selectany.

  3. On Darwin only, a weak definition also sometimes implies "force the dynamic linker to pick one symbol with this name across all libraries in the process", in contrast to the default behavior where symbols with the same name in different dynamic libraries are independent of each other. (On ELF, the behavior is determined by factors including the presence of DF_SYMBOLIC and STB_GNU_UNIQUE, but weakness doesn't affect it.) It's not possible to turn this off using a LLVM linkage value.

I find this situation quite confusing. I'm not suggesting we should try to rewrite the terminology that platforms have established, but I think we can at least clearly differentiate between weak references and definitions. For example, in @danielkeller's suggestion, instead of Weak<T>, we could use WeakImport<T> or maybe WeakRef<T>. The linkage values are already different (weak versus extern_weak) but these could stand to have clearer names. And of course, documentation can help.

@thomcc
Copy link
Member

thomcc commented Apr 11, 2022

Yeah, I think that we should avoid conflating these if we ever decide to expose this as a feature. Weak imports behave quite well on darwin (where they're extremely widely used, usually via the macros like __OSX_AVAILABLE, which switch to a weak import when if your minimum target OS version isn't that recent).

That said, it's not clear what we need to do to make our implementation of them actually work -- #[linkage = "extern_weak"] on macOS does mark the symbol as N_WEAK_REF in the MachO, but macOS's linker still gets upset about it unless you also tell rustc to send the linker -U _symbol_that_may_be_weak (or -undefined dynamic_lookup, but we probably don't want to do that). Otherwise, while it is a weak symbol, it's not allowed to be undefined.

Oddly, everything actually works without the if the Rust code is built as a static library and linked into an XCode build. I haven't looked into it but this implies that... there's a workaround in XCode for the linker's behavior? If so, this would be pretty hairy, to be honest, so frankly I hope it is not what's happening?

@thomcc
Copy link
Member

thomcc commented Apr 12, 2022

Nope! That was wrong. It turns out that the way it works on apple is that the symbol must exist at link time (e.g. on the host system) or you get the undefined symbol error. This is orthogonal to it being weak reference/import, which just indicates whether it's allowed to resolve. I suppose this might be to save people who try to weakly link against _get_entropy rather than _getentropy, and that sort of thing. Tragically, I doubt these are the semantics we'd want for this, since it's too host-specific.

XCode does seem to work around this, using some shenanigans with .tbd and .map files, although

@kupiakos
Copy link
Contributor

kupiakos commented Aug 25, 2022

For now, can we open up the restrictions of #[linkage] to allow for Option<fn()> as well as *const T/*mut T? That's the natural way to describe a nullable function pointer. Right now, the only way to extern_weak a function is by declaring it as *const whatever and transmute: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=8b9866283aa51360db447f982b9839dd.

Update: the following now works:

extern "C" {
    #[linkage = "extern_weak"]
    static puts: Option<unsafe extern "C" fn(x: *const u8)>;
}

fn main() {
    let str = b"called puts\n\0";
    let p = unsafe { puts }.expect("puts isn't linked");
    unsafe { (p)(str.as_ptr()) }
}

@mojingran
Copy link
Contributor

mojingran commented Jun 19, 2023

I propose that the linkage attributes other than "weak" and "extern_weak" should be disabled, since they have no practical usages, and furthermore, some unsupported attributes are causing ICE, such as #109681.
I came up with this idea when I came across #109681 and was trying to solve it. I am wondering whether this is a good idea. If so, I can implement this change.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Sep 8, 2023
Tests crash from inappropriate use of common linkage

Follows up my proposal under the [tracking issue for the linkage feature](rust-lang#29603 (comment)). Adds test for [issue 109681](rust-lang#109681).
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Sep 9, 2023
Rollup merge of rust-lang#113807 - mojingran:master, r=WaffleLapkin

Tests crash from inappropriate use of common linkage

Follows up my proposal under the [tracking issue for the linkage feature](rust-lang#29603 (comment)). Adds test for [issue 109681](rust-lang#109681).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-attributes Area: Attributes (`#[…]`, `#![…]`) A-FFI Area: Foreign function interface (FFI) A-linkage Area: linking into static, shared libraries and binaries B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. S-tracking-perma-unstable Status: The feature will stay unstable indefinitely. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests