Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Niches #3334

Closed
wants to merge 49 commits into from
Closed

Niches #3334

wants to merge 49 commits into from

Conversation

joshtriplett
Copy link
Member

@joshtriplett joshtriplett commented Oct 23, 2022

Add support for niches in a stabilizable way.

Define them as narrowly as possible, using only support that already exists in
the compiler.

Provide an extensive section on future possibilities, covering many of the
additional ideas generated over the years.

Rendered

@joshtriplett joshtriplett added the T-lang Relevant to the language team, which will review and decide on the RFC. label Oct 23, 2022
text/3334-niche.md Outdated Show resolved Hide resolved
text/3334-niche.md Show resolved Hide resolved
text/3334-niche.md Outdated Show resolved Hide resolved
@BoxyUwU

This comment was marked as resolved.

@SimonSapin

This comment was marked as resolved.

@joshtriplett
Copy link
Member Author

What types can the non-ZST field have?

While we may need to narrow it, right now there isn't a specific restriction.

What is “the appropriate unsigned equivalent to use” for negative signed integers? I assume it’s the as conversion with the unsigned integer type of the same width, but the RFC doesn’t say.

The two's-complement representation; added some text to that effect.

What syntax is accepted in parameters of #[niche(…)]?

Unsigned integers. Added text to that effect.

If it’s only (ranges) of integer literals, how are those values mapped to non-integer types? (If any are allowed)

They correspond to the in-memory bit-pattern representation. For instance, a float value maps via the bits of the float representation.

text/3334-niche.md Outdated Show resolved Hide resolved
text/3334-niche.md Outdated Show resolved Hide resolved
text/3334-niche.md Outdated Show resolved Hide resolved
text/3334-niche.md Show resolved Hide resolved
text/3334-niche.md Outdated Show resolved Hide resolved
@joshtriplett
Copy link
Member Author

joshtriplett commented Oct 23, 2022

@BoxyUwU wrote:

The RFC should mention pattern based refinement types (sort of like enum variant types but much more usable) as an alternative way of giving a type niches:

struct Foo(u32 is 0..100); // Foo has niches for all values >= 100
struct Bar(i32 is 0..); // Bar has niches for all values < 0 
struct Baz(i32 is ..0); // Baz has niches for all values >= 0
enum MyEnum {
    A,
    B,
    C,
}
struct Blah(MyEnum is MyEnum::A | MyEnum::B); // Blah has a niche for the value of `MyEnum::C`

Added as an alternative. Let me know if that covers what you would like to see covered.

@joshtriplett
Copy link
Member Author

I think I've addressed all comments up to this point.

text/3334-niche.md Outdated Show resolved Hide resolved
text/3334-niche.md Outdated Show resolved Hide resolved
text/3334-niche.md Outdated Show resolved Hide resolved
text/3334-niche.md Outdated Show resolved Hide resolved
@chorman0773

This comment was marked as resolved.

@ChrisDenton

This comment was marked as resolved.

@thomcc

This comment was marked as resolved.

@JakobDegen
Copy link
Contributor

Cross posting Zulip thread

@oli-obk

This comment was marked as resolved.


Constructing a structure with a niche value, or writing to the non-ZST field of
such a structure, or obtaining a mutable reference to such a field, requires
`unsafe` code. Causing a type with a niche to contain an invalid value (whether
Copy link
Member

@scottmcm scottmcm Oct 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm highly skeptical of this, because it feels like it's repeating the same mistakes that rust ended up having to spend multiple years fixing for repr(packed) (culminating finally in rust-lang/rust#102513).

Namely, you now have something supposedly "typed" as u32, but it's not really a u32 at all, because it has extra constraints. Sure, you can say that you need to be an unsafe block to get a &mut u32 to it, but that doesn't fix the "this is the wrong place to put the proof obligation" problem. Obviously this isn't as bad as mem::uninitialized, but it has the same structural problems: asking someone to say "I promise that nothing after here will ever misuse this" is awkward, compared to only asking "when you construct a value of the type, you have to prove that it's in-range".

This seems awkward for MIRI to check, too. Because if you have a &mut u32, how is it going to know that it's illegal to write a 42_u32 through it? Where does the UB happen? Since we don't have typed memory, it seems like it would be unable to diagnose the write as UB, and would have to wait until something later read it, which seems suboptimal -- especially for types that are Copy and thus might not ever be read!

For example, is this UB?

#[niche(value = 42)]
#[derive(Copy, Clone)]
struct MeaninglessNumber(u64);

unsafe {
    let mut x = Some(MeaninglessNumber(10));
    let r: &mut u64 = &mut x.as_mut().unwrap_unchecked().0;
    *r = 42;
}

It feels to me like it ought to be UB -- conceptually it's setting the value to its niche, which is definitely not okay -- but operationally I don't see where we can diagnose the problem.

I'm very much in favour of letting people control niches, but I think u32-with-a-particular-niche and normal u32 need to be different types.


Edit: This seems to be the root of other comments on the RFC too, such as

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels to me like it ought to be UB

There's a lot of discussion of this in rust-lang/unsafe-code-guidelines#84. It's been a while but I think the consensus is that it's very hard to enforce, we don't gain much from making it UB (we can have validity enforced on typed copy and load), and there are valid use cases for it.

Copy link
Contributor

@JakobDegen JakobDegen Oct 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scottmcm thank you for bringing this up, it's important that this is discussed somewhere. However, there is a nice way to represent this in the opsem. The interesting minirust operation here is a "load" - the operation that turns a list of bytes into a value at the given type. For tuples (and structs), that is currently defined like this:

fn decode(Type::Tuple { fields, size }: Self, bytes: List<AbstractByte>) -> Option<Value> {
    if bytes.len() != size { throw!(); }
    Value::Tuple(
        fields.iter().map(|(offset, ty)| {
            ty.decode(bytes[offset..][..ty.size()])
        }).collect()?,
    )
}

Essentially, this is saying that a value consists of all the values of the fields, and that the load is DB if the loads of all the fields are DB. Under this proposal, I believe we would adjust it to be the following:

fn decode(Type::Tuple { fields, size }: Self, bytes: List<AbstractByte>) -> Option<Value> {
    if bytes.len() != size { throw!(); }
    Value::Tuple(
        fields.iter().map(|(offset, ty, niche)| {
            let field_val = ty.decode(bytes[offset..][..ty.size()])?;
            if Some(niche) = niche {
                if !niche.contains(field_val.unwrap_primitive_int()) {
                    throw_ub!();
                }
            }
        }).collect()?,
    )
}

This additionally checks that if the field has a niche annotation, then the value of that field is inside the niche range (this should also help explain why I feel so strongly about the fields getting niche annotations, not the type).

Of course this only matters when there is a memory operation at type Foo. Your example above does not contain any such memory operation, and so it has no UB. Even something like this would be permitted:

struct Foo(
    #[niche(range = 4..8)]
    u8,
);

let mut x: Foo = Foo(5);
x.0 = 0;
dbg!(x.0);

The memory operations are at type u8, and so there is no UB. If it were instead to say dbg!(x) then there would be a load at type Foo and we would have UB.

As far as I know, these semantics are completely reasonable from an opsem perspective. It of course remains to be answered if they are the semantics we want, but I can't think of any concrete reason we should disfavor them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @thomcc, that's a good read.

I feel like there's a difference between references and pointers, here, though.

For example, if I had to write that as

unsafe {
    let mut x = Some(MeaninglessNumber(10));
    let p = ptr::addr_of_mut!(x.as_mut().unwrap_unchecked().0);
    p.cast::<u64>().write(42);
}

then I agree that it's logical that there's no UB, same as if I'd done just x.as_mut().unwrap_unchecked() as *mut _.

Perhaps it's illogical, but I expect more restrictions from references, and expect to have to fall down to pointers to do squicky things like changing the variant of an option by writing through reference to the field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is really about this RFC anymore, so moved to Zulip.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just don't think we should be giving out a &mut u64, as this RFC seems like it does, instead of a &mut Not42 .

Just rejecting giving out these references is also an option. But we if allow giving them out, we have to make it unsafe -- I hope that much is uncontroversial. :) (And that's what the current rustc-private niche attribute does.)

FWIW this is not quite as bad as packed since creating an &mut u64 reference that points to 42 is obviously perfectly fine in and of itself. It's only writes to that reference that are subtle, and that's where we hit rust-lang/unsafe-code-guidelines#84.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To the extent this is related to the RFC, a few points:

  1. I do think we should support giving out a &mut FieldType from unsafe code, precisely because this allows calling other functions that mutate a field that were designed to operate on the field type. Otherwise, if you want to mutate the field, you have to extract it, operate on it, and write the result back to it. This seems to me like a reasonable operation for unsafe code to have. This isn't at all like packed, as @RalfJung noted. With packed, you can end up with an unaligned reference; with niches you cannot.

  2. Even if we want to provide types that use const generics to represent value ranges, we need some means of creating those types. I would like to give the ecosystem the tools we're using, not just things we've built with those tools. (See the motivation of the RFC.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a more detailed discussion of mutable references. I'm going to mark this "resolved" for now, but feel free to un-resolve if there's anything missing.

Copy link
Member

@scottmcm scottmcm Oct 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unresolving because to me this isn't so much about about the UCG rules -- I agree it's not against the validity invariants to give &mut to the field here, and it's possible to use that without triggering UB in unsafe code -- but because the field type is still basically a lie.

Completely by happenstance, I just stumbled on an interesting example of how the "u32 but not really a u32" does weird things: rust-lang/rust#103556 (comment).

For the 2 years that git will conveniently trace history (https://github.com/rust-lang/rust/blame/e2267046859c9ceb932abc983561d53a117089f6/library/core/src/num/nonzero.rs#L47) and probably forever, we've been deriving PartialEq for the NonZero types. (EDIT: I traced it further and it's since before derive was called derive, which I think I'm good to call forever.) But what does derive do for a struct? Takes references to the fields and calls PartialEq using those references.

And voila, the type lie immediately impacts things. Rather than telling LLVM we're comparing non-zero types by emitting the !range metadata that would let it take advantage of the rustc_layout_scalar_valid_range_start -- as happens if you write the obvious implementation yourself https://rust.godbolt.org/z/Ye5xr8P8x -- it thinks it's loading ordinary integers and the resulting code has no range metadata. That missing metadata then, presumably, makes it hard for LLVM to optimize == on Option<NonZeroNN>, resulting in the linked PR to use specialization to try to fix it.

So then this very much is liked packed in the sense that if you want your derive macro to handle these types well then you'll have to look for it specifically in the macro and handle those cases differently (https://github.com/rust-lang/rust/pull/98446/files#diff-d1ddb1b94e4a7a467f9a396742ceaa3a94e7cdc96055c36a18b8116614682df1R965-R968).

Is that really worth it to be able to write

#[niche = 4]
struct MyType(u32);

instead of a normal newtype?

#[repr(transparent)]
struct MyType(NicheU32<4>);

Personally, I don't think so.

See also, for example, derive(Deserialize). For the #[niche] that's something that either won't work -- because it needs unsafe to construct -- or needs special handling in the derive macro. Whereas for the newtype it just needs a library implementation for the NicheU32 type, and I'll take normal rust library code over a bespoke path in the proc macro any day.

EDIT: Thanks, @RalfJung, I've corrected my phrasing in the first paragraph.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's sound to give out references here, in the same way that it's sound to give out &mut u32s to f32s

FWIW it is definitely not sound to give out &mut u32 to a NonZeroU32, in the sense defined here.

But it is possible to write UB-free code that takes an &mut u32 to a NonZeroU32.

@Lokathor
Copy link
Contributor

@jhpratt "niche" is a real english word that means "space for a thing", like an alcove in a wall, or a place in an ecosystem for a species. It is used within rust because it was already a real word.

@jhpratt
Copy link
Member

jhpratt commented Oct 27, 2022

@Lokathor I'm a native English speaker 🙂 I've just never heard it in a programming context, so I suspect others may be thrown off by its usage.

@RalfJung
Copy link
Member

Regarding the 'niche' terminology, also see this subthread -- I argue that the concept of a "valid range/set of values" is a lot more common in programming and we should design the interface around that.

@Pointerbender
Copy link

Just dropping a suggestion for the "Future possibilities" section of this RFC: Liquid Types :) (a.k.a. "Logically Qualified Data Types")

Instead of just providing a range of (in)valid values, it would be great if in the future we could also add match guards that can express more than just continuous ranges, similar to the match guards that exist for Rust pattern matching today (although for liquid subtyping, the guards should not be allowed to have any side-effects!). This has the advantage that it would become easier to reason about types and their optimizations, leading to less runtime checks if the compiler has support for this (i.e. better optimized binaries), as well as the simplification of formally verified programs (this could be useful for static analysis tools like Prusti and Cruseot - or even for an interpreter like miri, too). The above linked paper about liquid types claims a significant reduction in manual proof obligation thanks to liquid types (read: this could translate to Rust as having less unsafe in your code), and it was pretty successful for Ada as well, from what I heard (they call it predicated subtypes).

@tialaramex
Copy link

@joshtriplett This RFC feels like it was at least somewhat inspired by our discussion on LWN. I said there that I felt like the very simplest thing, just specifying a single niche, was very valuable. You obviously have a lot more experience of this process than I do, but I urge that this RFC tries to be uncontroversial even if that means it gets less done than a richer RFC might have.

@bestouff
Copy link
Contributor

Small datapoint: I perfectly know about niches, and was more or less expecting what's in the RFC, and even then when I read

#[niche(range = 2..)]
struct Bit(u8);

for a while I thought "but why is that value starting at 2 and called Bit ?" - despite the obviousness of this example.
So I know I'm bikeshedding but find the syntax unintuitive. I'd really prefer something like

#[niche(exclude = 2..)]
struct Bit(u8);

or something as explicit.

Comment on lines +466 to +468
- **`unsafe` fields**: If in the future Rust introduces `unsafe` fields,
declaring a niche could internally mark the field as unsafe, taking advantage
of the same machinery.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Project Safe Transmute is also very interested in unsafe fields! I think the RFC should perhaps make a stronger statement here, as we shouldn't have two mechanisms for doing the same thing for any extended amount of time.

If this RFC stabilizes prior to unsafe fields, then we should require that, on an edition boundary, uses of:

#[niche(value = 42)]
struct MeaninglessNumber(u64);

...are migrated to:

#[niche(value = 42)]
struct MeaninglessNumber(unsafe u64);

(or whatever the syntax happens to be).

Comment on lines +319 to +320
depends on the value of another field). It also would not work as well for
niches on a `union`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also would not work as well for niches on a union.

Can you elaborate on this?

Comment on lines +294 to +296
We could use a syntax based on patterns, such as `struct S(u8 is 0..=32);` or
`struct S(MyEnum is MyEnum::A | MyEnum::B)`. The `niche` attribute could be
forward-compatible with this, by generating the appropriate patterns.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to suggest another alternative: slightly extending the capabilities of enums.

Rust already has a stable mechanism for establishing language-level validity invariants: enums. I can, for instance, already define bool in stable Rust:

#[repr(u8)]
enum Bool {
    False = 0,
    True = 1,
}

...and I can even already define NonZeroU8 in stable Rust:

#[repr(u8)]
enum NonZeroU8 {
    x01=0x01, x02, x03, x04, x05, x06, x07, x08, x09, x0A, x0B, x0C, x0D, x0E, x0F, 
    x10, x11, x12, x13, x14, x15, x16, x17, x18, x19, x1A, x1B, x1C, x1D, x1E, x1F, 
    x20, x21, x22, x23, x24, x25, x26, x27, x28, x29, x2A, x2B, x2C, x2D, x2E, x2F, 
    x30, x31, x32, x33, x34, x35, x36, x37, x38, x39, x3A, x3B, x3C, x3D, x3E, x3F, 
    x40, x41, x42, x43, x44, x45, x46, x47, x48, x49, x4A, x4B, x4C, x4D, x4E, x4F, 
    x50, x51, x52, x53, x54, x55, x56, x57, x58, x59, x5A, x5B, x5C, x5D, x5E, x5F, 
    x60, x61, x62, x63, x64, x65, x66, x67, x68, x69, x6A, x6B, x6C, x6D, x6E, x6F, 
    x70, x71, x72, x73, x74, x75, x76, x77, x78, x79, x7A, x7B, x7C, x7D, x7E, x7F, 
    x80, x81, x82, x83, x84, x85, x86, x87, x88, x89, x8A, x8B, x8C, x8D, x8E, x8F, 
    x90, x91, x92, x93, x94, x95, x96, x97, x98, x99, x9A, x9B, x9C, x9D, x9E, x9F, 
    xA0, xA1, xA2, xA3, xA4, xA5, xA6, xA7, xA8, xA9, xAA, xAB, xAC, xAD, xAE, xAF, 
    xB0, xB1, xB2, xB3, xB4, xB5, xB6, xB7, xB8, xB9, xBA, xBB, xBC, xBD, xBE, xBF, 
    xC0, xC1, xC2, xC3, xC4, xC5, xC6, xC7, xC8, xC9, xCA, xCB, xCC, xCD, xCE, xCF, 
    xD0, xD1, xD2, xD3, xD4, xD5, xD6, xD7, xD8, xD9, xDA, xDB, xDC, xDD, xDE, xDF, 
    xE0, xE1, xE2, xE3, xE4, xE5, xE6, xE7, xE8, xE9, xEA, xEB, xEC, xED, xEE, xEF, 
    xF0, xF1, xF2, xF3, xF4, xF5, xF6, xF7, xF8, xF9, xFA, xFB, xFC, xFD, xFE, xFF, 
}

The above example is ridiculous, but it doesn't need to be. Rather than building out a parallel mechanism for defining validity ranges, I think we should seriously consider allowing ranged explicit discriminants. The above example would, then, collapse into something like:

#[repr(u8)]
enum NonZeroU8 {
    Value = 1..=u8::MAX,
}

In doing this, we avoid the situation of having two parallel language-level mechanisms for defining validity ranges.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this idea, actually -- we can start by requiring a repr and exactly one variant, but there's the obvious future possibility of eventually supporting things like

#[repr(u32)]
enum char {
    _Low = 0..=0xD7FF,
    _High = 0xE000..=0x10FFFF,
}

should we ever start supporting multiple niches in the same type.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already support disjoint regions in enums, we simply don't exploit all the niches. So maybe we could just allow ranges in enums then later add niches where profitable.

Copy link
Contributor

@oli-obk oli-obk Nov 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we save much complexity with adding ranges to enum discriminants vs adding entirely new type syntax. The complexity is not as much in the type, but in the construction of the type. It is not clear to me how we would initialize such enums without transmute.

Also, if we ever want to allow things like

match s {
    x @ 0..=1 => {
        /*logic goes here*/
        match x {
            0 => {}
            1 => {}
            // look no fallback needed
        }
    }
    _ => {}
}

We'll need something like pattern types anyway

@typecasto
Copy link

typecasto commented Nov 7, 2022

Apologies if this is bikeshedding, but value = (value that this type CANNOT be) and range = (range that this type CANNOT be) is a bit confusing to me. When I see this:

#[niche(value = 42)]
struct MeaninglessNumber(u64);

#[niche(range = 2..)]
struct Bit(u8);

I internally read it as "MeaninglessNumber's value can only equal 42" and "the range of Bit is anything above 2".

My suggestion is to swap = for !=:

#[niche(value != 42)]
struct MeaninglessNumber(u64);

#[niche(range != 2..)]
struct Bit(u8);

I think it makes it a bit more understandable, to me it translates as "MeaninglessNumber has a niche: the value cannot be equal to 42" or "Bit has a niche: it is not in the range 2..". My intuition says most people will be thinking of niches not as "places to fit another type's information" but more as "the acceptable values for this type", since the former is more of a compiler optimization, e.g. you rarely have to think about it, whereas the latter affects how you use and interact with the type in practice.

Edit: This RFC doesn't actually impact the type system.

@chorman0773
Copy link

chorman0773 commented Nov 7, 2022 via email

@SOF3
Copy link

SOF3 commented Nov 8, 2022

maybe it would be a fair compromise to just rename to exclude = 42 and exclude_range = 2..?

@bestouff
Copy link
Contributor

bestouff commented Nov 8, 2022

Why not just "exclude" and autodetect if it's a range ?

@typecasto
Copy link

Is there a rationale for sidestepping the type system here? I feel like knowing which values a type can and cannot hold is the type system's domain, so duplicating that responsibility elsewhere in the compiler doesn't seem like the best idea. It also feels like the requirement to use these types in unsafe blocks is a consequence of keeping the type system in the dark, rather than being because niches are inherently unsafe.

To phrase it differently, the type system makes it impossible (in safe rust) to construct an enum with an invalid variant, why not extend that functionality for this purpose, where the type system can enforce it? As it stands, this seems like it'd require an entirely new mechanism, one where all construction is impossible in safe rust.

Compare niches as they are in this RFC to something like the following, just a simple extension of the enum system to use non-ident variants, some syntax for making many variants at once, and some new derives for fallible and infallible conversion. (I'm not suggesting this exact system, just using it as an example.)

#[derive(TryFrom, Into)]
enum Bit as u8 {
    0,
    1
}
#[derive(TryFrom, Into)]
enum MeaninglessNumber as u64 {
    0..42, 43..
}
fn main() -> {
	let a: MeaninglessNumber = 250.into(); // Ok
	let b: MeaninglessNumber = 42.into(); // Error
	match a {
		0..42 => println!("smol"),
		43.. => println!("lorge")
	}
}

Because this is handled by the type system, these values would be checked at compile time, making it impossible (in safe code) to use an invalid variant, and, more importantly, possible to use at all. If you want zero-cost(?) unchecked conversions, std::mem::transmute can do it, but being unsafe is a choice, with a system like this you'd have the option to deal with these types in purely safe rust, and doing so might even help catch some bugs at compile time.

So, why not implement this as an extension to the type system?

@SOF3
Copy link

SOF3 commented Nov 29, 2022

A much simpler API (which probably already exists in the ecosystem, although without niche support) is to use const generics:

struct RangedU32<const MIN: u32, const MAX: u32>(u32);

enum char {
    Low(RangedU32<0, 0xD7FF>),
    High(RangedU32<0xE000, 10FFFF>),
}

@oli-obk
Copy link
Contributor

oli-obk commented Nov 29, 2022

A much simpler API (which probably already exists in the ecosystem, although without niche support) is to use const generics:

I attempted this in rust-lang/rust#103724, but it is a performance regression due to the extra field now existing for NonNull<T> which is inside Box and Vec and thus used a lot. LLVM requires extra work to avoid this extra field, and I haven't found a nonhacky way to avoid that yet.

@CraftSpider
Copy link

I really like the ideas raised of using enums for this, as they encode the concept of a set of valid values. It feels very natural to define niches in terms of enums with valid ranges, and I feel makes the type system very coherent: even builtin types, like u32 or char, fit into this mental model. All types can be imagined as either structs or enums, where structs combine types, and enums encode valid values. Bool is just enum bool { true, false }, and ints are just enum uX { value = 0..=X::MAX }.

(Pointers are kind of like struct Ptr<T> { addr: usize, provenance: CompilerMagic }, meaning this doesn't get rid of all type magic, but I think it still feels much better)

@NobodyXu
Copy link

NobodyXu commented Dec 1, 2022

Is there a rationale for sidestepping the type system here? I feel like knowing which values a type can and cannot hold is the type system's domain, so duplicating that responsibility elsewhere in the compiler doesn't seem like the best idea. It also feels like the requirement to use these types in unsafe blocks is a consequence of keeping the type system in the dark, rather than being because niches are inherently unsafe.

To phrase it differently, the type system makes it impossible (in safe rust) to construct an enum with an invalid variant, why not extend that functionality for this purpose, where the type system can enforce it? As it stands, this seems like it'd require an entirely new mechanism, one where all construction is impossible in safe rust.

Compare niches as they are in this RFC to something like the following, just a simple extension of the enum system to use non-ident variants, some syntax for making many variants at once, and some new derives for fallible and infallible conversion. (I'm not suggesting this exact system, just using it as an example.)

#[derive(TryFrom, Into)]
enum Bit as u8 {
    0,
    1
}
#[derive(TryFrom, Into)]
enum MeaninglessNumber as u64 {
    0..42, 43..
}
fn main() -> {
	let a: MeaninglessNumber = 250.into(); // Ok
	let b: MeaninglessNumber = 42.into(); // Error
	match a {
		0..42 => println!("smol"),
		43.. => println!("lorge")
	}
}

Because this is handled by the type system, these values would be checked at compile time, making it impossible (in safe code) to use an invalid variant, and, more importantly, possible to use at all. If you want zero-cost(?) unchecked conversions, std::mem::transmute can do it, but being unsafe is a choice, with a system like this you'd have the option to deal with these types in purely safe rust, and doing so might even help catch some bugs at compile time.

So, why not implement this as an extension to the type system?

That is only appliable to literals, it is more complex for compile-time constants to be verified.

For the former, maybe we can have something like C++'s operater literal so that we can write sth like 12_NonZeruU8.

And for the later, I would like Rust to support exception on const evaluation so that we can have NonZeroU8::new(12).unwrap().

@mejrs
Copy link

mejrs commented Dec 6, 2022

And for the later, I would like Rust to support exception on const evaluation so that we can have NonZeroU8::new(12).unwrap().

That is already supported. const Option::unwrap is available on nightly, and matching + panic works on stable.

@chorman0773
Copy link

chorman0773 commented Dec 6, 2022 via email

@lylythechosenone
Copy link

Here's something I stumbled upon relating to niches. On x86_64, we have two different kinds of pml2/pml3s, that change depending on the 7th bit. Both variants are the same size, but they have different behavior with regard to flags and bit locations. A simple implementation could look like:

#[repr(transparent)]
pub struct Pml2Entry4KiB {
    data: u64,
}
impl Pml2Entry4KiB {
    // 4KiB code
}

#[repr(transparent)]
pub struct Pml2Entry2MiB {
    data: u64,
}
impl Pml2Entry2MiB {
    // 2MiB code
}

pub enum Pml2Entry {
    Large(Pml2Entry2MiB),
    Small(Pml2Entry4KiB),
}

but obviously, this will not work. It would be nice if one could somehow specify which bit should be the discriminator, that way this could benefit from some abstraction. Currently, I just have to have a single struct with matches for the bit on every function.

This is certainly a very niche niche (ha get it), but I'd love to see it implemented.

@joshtriplett
Copy link
Member Author

@oli-obk successfully convinced me that an approach using const generics, rust-lang/rust#103724 , is better than the attribute approach. And he proved that it's feasible to implement. I'm closing this in favor of rust-lang/rust#103724 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.