Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

musl 1.2.x support #2088

Closed
wants to merge 3 commits into from
Closed

musl 1.2.x support #2088

wants to merge 3 commits into from

Conversation

kaniini
Copy link
Contributor

@kaniini kaniini commented Feb 28, 2021

Switch the CI to use musl 1.2.2 and fix any CI problems.

@rust-highfive
Copy link

r? @Amanieu

(rust-highfive has picked a reviewer for you, use r? to override)

@JohnTitor
Copy link
Member

Note that rust-lang/rust's CI also uses 1.1.24 currently (https://github.com/rust-lang/rust/blob/7e9a36fa8a4ec06daec581e23f390389e05f25e4/src/ci/docker/scripts/musl.sh#L28) and I think we should keep in sync with it.

musl 1.1 maintenance has for all practical purposes ended.  Accordingly, there is
no point in supporting both 1.1 and 1.2 in the libc crate, so follow the
time_t type transition to 64-bit.
- add padding members for musl 1.2
- ensure the padding members have an appropriate type (always c_int)
@kaniini
Copy link
Contributor Author

kaniini commented Feb 28, 2021

@JohnTitor it is a chicken + egg issue. we have to update both. i intend to update both.

@JohnTitor
Copy link
Member

JohnTitor commented Feb 28, 2021

Yeah I just left a comment in case you haven't known that.

@kaniini kaniini marked this pull request as ready for review February 28, 2021 22:03
@ericonr
Copy link

ericonr commented Mar 1, 2021

Switching to musl 1.2.x only works despite of #1848 because musl doesn't support symbol versioning, so the linker allows linking against the time32 compat symbols without issue. I have no idea how to properly solve this.

The switch to 1.2.x will also make rustup provided binaries incompatible with Void Linux (since we haven't updated yet), I think? Or well, potentially incompatible, if new functions start being used.

The ancient glibc used for builds is also unsupported (I think?), but it's used because it allows all sorts of systems to use the "universal" binaries without issue. Because of that, I'm not sure it's already a good time to update the libc used for builds (which, if all goes well in rust-lang/rust#82556, should be different from the runtime libc).

EDIT: that said, I think moving to enable y2038 support is probably more valuable, so I guess we should use this as motivation to do our own update.

@ericonr
Copy link

ericonr commented Mar 1, 2021

Hmm, I believe this also needs to change the function names... The redirects to time64 functions in musl are implemented via headers, but linking to clock_gettime will get you the time32 impl, while the time64 impl is: __clock_gettime64.

@richfelker
Copy link

If Rust code is going to pass the modern time-related data structures with 64-bit fields, it needs to honor (ideally derive somehow rather than hard-coding, but they're not going to change so you could just ship the table) the symbol name redirects from the libc headers. But if it's still passing 32-bit versions, it's fine, and will continue to be fine, to reference the legacy symbol names. As such I don't think anything is broken with regard to targeting musl 1.2.x right now. But there may be problems mixing Rust code with third-party C libraries that have already been built using the new types.

@kaniini
Copy link
Contributor Author

kaniini commented Mar 1, 2021

Hmm, I believe this also needs to change the function names... The redirects to time64 functions in musl are implemented via headers, but linking to clock_gettime will get you the time32 impl, while the time64 impl is: __clock_gettime64.

Yes, the redirection is not yet implemented. That allows for musl 1.1 to still be supported. But Void really should upgrade to 1.2 already.

@ericonr
Copy link

ericonr commented Mar 1, 2021

Yes, the redirection is not yet implemented.

So changing time_t definition, as this PR does, is wrong.

But Void really should upgrade to 1.2 already.

I don't disagree :)

@Amanieu
Copy link
Member

Amanieu commented Mar 1, 2021

Could we keep time_t as 32-bit and expose a separate set of functions which use time64_t?

@ericonr
Copy link

ericonr commented Mar 1, 2021

Could we keep time_t as 32-bit and expose a separate set of functions which use time64_t?

I don't see how this would help the situation any. Pushing the choice of using time64 symbols on programmers will make them almost entirely useless, and people will have to find some way of deciding which symbols to use, at which point it'd just be better if libc could give them the appropriate version directly.

This is somewhat related to #2061... Could the libc crate try to build binaries (for the target, since host information is independent) in build.rs which link to symbols introduced in new versions of a certain library, in order to determine its version? So in this case, something like __clock_gettime64. And then create a file which the rest of the headers use to determine what musl version should be targeted?

Ideally libc would be a wrapper around bindgen, IMHO, but that would require shipping headers for platforms whose information is instead contained in libc in hardcoded form.

@joshtriplett
Copy link
Member

Given that this seems to be a necessarily backwards-incompatible change, would it potentially make sense to coordinate this with a potential switch of the musl target from static to dynamic?

@ericonr
Copy link

ericonr commented Apr 5, 2021

It isn't necessarily backwards incompatible; I believe the BSDs have already requested some mechanism to support ABI incompatible versions of their own platforms. If such a thing was implemented, it'd be possible to carry the two versions "just fine". Given that the rust ecosystem seems to dislike feature tests (which would be the usual way of implementing such a mechanism), it will probably end up being backwards-incompatible.

Anyway, I think both of these incompatible changes are very different in what they touch (default linkage vs minimum supported version), so I don't see much of a reason to coordinate them. Could be convinced differently.

@joshtriplett
Copy link
Member

Then, rather than risking runtime incompatibility and difficult-to-debug brokenness (e.g. a -sys crate that's building Rust bindings to a C library that uses time_t in its interface, such as openssl-sys), I think it'd make sense to just start requiring musl 1.2 at compile-time and runtime. The target tier policy has a provision for how to upgrade the minimum requirements for a target; I think we should use that to bump the baseline requirement for the -musl targets to 1.2.

@ericonr
Copy link

ericonr commented Apr 25, 2021

Requiring musl 1.2 would make sense for (old - the eventual official riscv32 target should be legacy free in this sense) 32bit targets, but not for 64bit ones.

That said, how would you detect the version? Or would it be simply a contract?

@joshtriplett
Copy link
Member

Requiring musl 1.2 would make sense for (old - the eventual official riscv32 target should be legacy free in this sense) 32bit targets, but not for 64bit ones.

That's a good point; the ABI hasn't changed for 64-bit targets, so the baseline wouldn't need to change.

That said, how would you detect the version? Or would it be simply a contract?

We wouldn't need to detect the version. We would bind time-related symbols to the new 64-bit versions that don't exist in older versions of musl, and then we'd fail to link with older versions of musl.

@richfelker
Copy link

Exactly. That seems very reasonable to me, and it automatically does the right thing for 64-bit targets without encoding more than the minimal needed knowledge (the symbol redirections).

BTW is Rust already encoding that sort of redirection for glibc targets, where it's needed to use most of the standard unix and stdio file functions (since the legacy symbols on 32-bit archs have 32-bit off_t)?

@wesleywiser
Copy link
Member

wesleywiser commented Feb 9, 2022

I've read through this thread and I'm trying to get a sense for what needs to be done to move forward the upgrade to musl 1.2.2.

I believe this is the current state of this thread:

  • We will upgrade to musl 1.2.2 and require that version at runtime. Per RFC 2803 this requires approval from the compiler and release teams.

    • This is a breaking change for the i686-unknown-linux-musl target but not the x86_64-unknown-linux-musl target.
  • There was a suggestion to bundle this breaking change with a change for the musl targets from static to dynamic but consensus seems to be that we don't need to do these at the same time.

  • There was a suggestion to expose a time64_t type but consensus seems to be that we should not go down that route.

  • There is the question of handling the redirects musl provides for 32-bit compatibility. Is this still a concern if we've decided to make this a breaking change to the target?

    • Yes, we will change our function definitions to link with the new 64-bit symbols instead of the normal ones.

I'm sure I've missed something so please feel free to correct me. I'd like to help push this along if possible 🙂

@kaniini
Copy link
Contributor Author

kaniini commented Feb 9, 2022

There is the question of handling the redirects musl provides for 32-bit compatibility. Is this still a concern if we've decided to make this a breaking change to the target?

In general, legacy binaries will continue to run as expected. From a source-compilation POV, musl only provides the 64-bit versions by default. So, you should just be able to upgrade and make time_t 64-bit.

@joshtriplett
Copy link
Member

If we currently defaulted to dynamically linking musl, I would want to have more of a transition plan for requiring a newer version.

But since currently people would have to go out of their way to dynamically link musl, it seems unlikely that bumping our requirement forward will cause much grief.

@richfelker
Copy link

If we don't include the trailing padding on little-endian, then the size of timespec will shrink from 24 bytes to 16 bytes. Is there a different way you're thinking of to keep the size of the struct the same?

I'm confused. The size of struct timespec is 16 bytes on all musl archs (with time64). The form is 8 bytes of tv_sec followed by 8 bytes of padding together with tv_nsec. On 64-bit archs there is no padding because tv_nsec (long) is already 8 bytes. On 32-bit archs, tv_nsec (long) is only 4 bytes, and the 4 bytes of padding appear on whichever side makes the location of the significant bits line up with where they would be in a 64-bit word with the arch's endianness. If you're getting it as 24 bytes somehow, then Rust has a wrong definition for the type.

@richfelker
Copy link

  • Of the remaining 48 regressions, 27 of these are because timespec now contains private padding fields on i686-unknown-linux-musl where it previously didn't. As such, a timespec struct can no longer be created using struct literal syntax.

In C, creation of a struct literal timespec is not valid without designated initializers. I don't know what the Rust equivalent is, but it's fundamentally wrong to assume an ordering of the members or absence of any additional members. Code doing so is buggy. The spec is:

The <time.h> header shall declare the timespec structure, which shall include at least the following members:

time_t  tv_sec    Seconds. 
long    tv_nsec   Nanoseconds. 

(Emphasis mine.)

@Amanieu
Copy link
Member

Amanieu commented Nov 18, 2022

In C, creation of a struct literal timespec is not valid without designated initializers. I don't know what the Rust equivalent is, but it's fundamentally wrong to assume an ordering of the members or absence of any additional members. Code doing so is buggy. The spec is:

Rust only has one kind of initialization for structs, which is similar to C designated initializers. Except it requires all fields to be specified, whereas C will zero-initialized any omitted fields.

The "proper" solution would be to zero-initialize the struct and then fill in its fields one-by-one as done here. The downside is that this is much more verbose and requires some unsafe code.

@wesleywiser
Copy link
Member

Sorry, that's my bad. I was looking at x86_64 in godbolt and messed up the code while trying to get it to compile there. I see what you're saying now and that should work fine. I'll try that and see what the crater results look like.

@wesleywiser
Copy link
Member

I see what you're saying now and that should work fine.

So that actually does not work fine. I thought i64 on i686 has 8 byte alignment and therefore the overall structure would have 8 byte alignment, forcing the final, trailing padding bytes, but it only has 4 byte alignment. However, without the explicit trailing padding, timespec only ends up taking 12 bytes instead of 16.

@Amanieu I think that means we have to include the private padding field which will prevent use of the literal struct initialization syntax unless you have another idea how we can force sizeof(timespec) == 16 without private fields.

@Amanieu
Copy link
Member

Amanieu commented Nov 22, 2022

We can add #[repr(align = 8)] which will round the size of the structure up to 16 bytes.

EDIT: Actually that would still be wrong since the alignment of the struct in C is still 4. In that case I think we have no choice but to leave the padding in.

@richfelker
Copy link

Indeed, setting alignment to 8 does not work for archs without a full alignment requirement for 64-bit types, and of course doesn't solve the big endian case where the padding is before tv_nsec not after.

Maybe this is a sign that Rust should have [an extension to do?] default initialization like C does for unmentioned members. I think the core issue here is that the Rust feature is incompatible with structures that may have implementation-defined or internal/private fields that are not part of the API, and thus that application code is not allowed to mention by name.

@wesleywiser
Copy link
Member

wesleywiser commented Nov 22, 2022

We can add #[repr(align = 8)] which will round the size of the structure up to 16 bytes.

EDIT: Actually that would still be wrong since the alignment of the struct in C is still 4. In that case I think we have no choice but to leave the padding in.

Yeah I gave that a try earlier today and the CI flagged the mismatched alignments right away. I don't think we want to do that.

Maybe this is a sign that Rust should have [an extension to do?] default initialization like C does for unmentioned members. I think the core issue here is that the Rust feature is incompatible with structures that may have implementation-defined or internal/private fields that are not part of the API, and thus that application code is not allowed to mention by name.

I think usually what we'd do for that case is something like this:

pub struct Padding<T> {
    _: T // this field is not pub so it can't be used outside of the module
}

pub struct timespec {
    pub tv_sec: time_t,
    pub tv_nsec: ::c_long,
    pub __pad0: Padding<u32>, // this field is pub to allow struct initialization syntax to work
}

pub fn zeroed() -> timespec {
    timespec {
        // 0 init all fields
    }
}

You would then do something like timespec { tv_sec: 1234, tv_nsec: 5678, ..zeroed() } when you want to literally initialize a timespec value. ..zeroed() copies any unmentioned field values from the return value of zeroed().

@Amanieu would there be interest in trying to explore some kind of solution like that (perhaps using Default instead of dedicated zero-initialization functions)? We would still have to help migrate dependent crates to support that but it could be done backwards compatibly.

The other option I see would be to pick some of the most used crates impacted by this and switch them to the MemUninit::zeroed() approach like I did with libstd. That can also be done backwards compatibly.

A second crater run revealed that at least 97% of the 5749 regressions from the initial run are solely due to timespec having private fields. I suspect most of this comes down to a few key crates that are widely used on top of libc like nix. So migrating a few of those to either of those approaches could resolve most of the breakage.

@Amanieu
Copy link
Member

Amanieu commented Nov 22, 2022

I don't mind making the padding fields public to allow struct initialization. For the zeroed function, we would ideally have some variant of a Pod trait to indicate that the type can be safely zero-initialized. However this feels more like a long-term plan for libc 2.0.

In the short term I think it's best to do what you suggested:

The other option I see would be to pick some of the most used crates impacted by this and switch them to the MemUninit::zeroed() approach like I did with libstd. That can also be done backwards compatibly.

Just a slight nit: I prefer mem::zeroed() over MaybeUninit::zeroed().assume_init() since there's no uninitialized memory involved.

@thomcc
Copy link
Member

thomcc commented Nov 22, 2022

..zeroed() copies any unmentioned field values from the return value of zeroed().

Note that they'd still have to be public, and #[non_exhaustive] doesn't help either (due to how FRU syntax is desugared). But we don't strictly have to match names with what musl calls its unmentionable fields and what we call them, so I guess that's fine.

bors bot added a commit to nix-rust/nix that referenced this pull request Nov 28, 2022
1886: Update use of libc::timespec to prepare for future libc version r=asomers a=wesleywiser

In a future release of the `libc` crate, `libc::timespec` will contain private padding fields on `*-linux-musl` targets and so the struct will no longer be able to be created using the literal initialization syntax.

Update places where `libc::timespec` is created to first zero initialize the value and then update the `tv_sec` and `tv_nsec` fields manually. Many of these places are in `const fn`s so a helper function `zero_init_timespec()` is introduced to help with this as `std::mem::MaybeUninit::zeroed()` is not a `const` function.

Some matches on `libc::timespec` are also updated to include a trailing `..` pattern which works when `libc::timespec` has additional, private fields as well as when it does not (like for
`x86_64-unknown-linux-gnu`).

See also rust-lang/libc#2088

Co-authored-by: Wesley Wiser <wesleywiser@microsoft.com>
peyo-hd pushed a commit to peyo-hd/external_rust_crates_libc that referenced this pull request Dec 20, 2022
Musl has use a 64-bit time_t since 1.2.0, but rust still uses the old
32-bit time_t on 32-bit platforms.
rust-lang/libc#2088 will eventually fix this,
but until then use i64 for time_t on musl in our tree.  This will
break binary compatibility with rust crates built against musl outside
our tree, but those shouldn't get mixed with our tree.

Bug: 216192129
Test: m USE_HOST_MUSL=true host-native -k
Change-Id: Iaafd06e180514157015607be50f625cb0661e1b8
@wesleywiser
Copy link
Member

@kaniini Could you pull the changes from my musl-1.2 branch into this PR? I've rebased and verified that a bors run will be green.

After that, I think we're ready to start an FCP! 🙂

@JohnTitor
Copy link
Member

Closing in favor of #3068, thanks for the PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.