Formatting code is in conflict with unnamed_addr #58320

RalfJung · 2019-02-09T13:44:37Z

The code for passing arguments into format strings relies on equality of function pointers. This is in conflict with the fact that we set unnamed_addr on all functions.

Global variables can be marked with unnamed_addr which indicates that the address is not significant, only the content. Constants marked like this can be merged with other constants if they have the same initializer.

rust/src/libcore/fmt/mod.rs

Lines 295 to 301 in 618f5a0

    
           fn as_usize(&self) -> Option<usize> { 
        
               if self.formatter as usize == ArgumentV1::show_usize as usize { 
        
                   Some(unsafe { *(self.value as *const _ as *const usize) }) 
        
               } else { 
        
                   None 
        
               } 
        
           }

It might be possible for a user to write a function that gets merged with ArgumentV1::show_usize, and then cause bugs in the above code.

@oli-obk @eddyb

The text was updated successfully, but these errors were encountered:

oli-obk · 2019-02-10T10:48:23Z

These uses are impossible to abuse for users (this is an Argument internal API that is only used if it's a usize anyway). The invariants that make this possible are very subtle though, and we should totally document this better

@oli-obk

miri: give non-generic functions a stable address This makes Miri correctly handle format string parameters despite rust-lang#58320. Matching Miri PR: rust-lang/miri#626 r? @oli-obk

@oli-obk

miri: give non-generic functions a stable address This makes Miri correctly handle format string parameters despite rust-lang#58320. Matching Miri PR: rust-lang/miri#626 r? @oli-obk

@oli-obk

miri: give non-generic functions a stable address This makes Miri correctly handle format string parameters despite rust-lang#58320. Matching Miri PR: rust-lang/miri#626 r? @oli-obk

RalfJung · 2019-02-11T14:41:08Z

The invariants that make this possible are very subtle though, and we should totally document this better

I wonder which invariant you had in mind here?

oli-obk · 2019-02-11T16:21:52Z

We are only using the as_usize method in getcount, which is only used for the width and the precision fields. These must be usize anyway, so the as_usize function essentially double checks whether we are doing this for usize.

RalfJung · 2019-02-12T10:08:10Z

So you are saying format_args! actually ensures that width/precision fields are always created with from_usize?

oli-obk · 2019-02-12T10:15:24Z

That's my reading of the code, but I might have missed some stuff. The formatting code is very... frameworky

eddyb · 2019-02-13T12:44:18Z

Yeah, from_usize is used by format_args! in all cases as_usize will be called, the behavior of the latter for non-usize formatting arguments is impossible to observe without going through the unstable API.

But maybe we should benchmark (cc @anp) the cost of using an enum that relies on the null pointer optimization, i.e. Option where Some(fn_ptr) indicates formatting-only, while None indicates usize, which can be formatted (or not?), but also extracted.

eddyb · 2020-02-11T18:48:27Z

What if... we actually called the function?! And there was a Formatter field that the fmt::Display impl for usize would fill in if a flag was set, and return without printing anything?

Mark-Simulacrum · 2020-02-11T22:04:58Z

That would increase Formatter size (though it's only ever on the stack), so doesn't matter in practice that much. I would personally prefer not to go with the stashed approach.

We could switch the &'a Void reference to be *const Void and use the value of the pointer instead of reading the usize out from behind that reference.

An alternative is to embed the usize (only used for the count of width/precision) directly into the format argument, instead of indirectly referencing it via index. I think that might not be entirely trivial with today's implementation in the compiler, not sure. I believe this should be possible as the format argument is &usize, so dereferencing it early should be fine. I am not sure if the same argument can be both a width and a formatting argument. I believe the answer is no (otherwise the show_usize argument would conflict).

eddyb · 2020-02-11T22:22:25Z

I am not sure if the same argument can be both a width and a formatting argument.

I think that's ~~true~~ possible though (i.e. using the same argument for both), which is why I went with the show_usize hack.

Mark-Simulacrum · 2020-02-11T23:00:33Z

What's true? That it cannot be the same argument? show_usize implies that it can't, in which case we can do the change that I suggested (simplifying the runtime logic, too).

eddyb · 2020-02-11T23:21:06Z

@Mark-Simulacrum Sorry, no, I mean that the usize argument can also be used for as a regular argument, which is why show_usize's body is Display::fmt(x, f).

Mark-Simulacrum · 2020-02-11T23:22:09Z

Hm, but if you use it for a non-Debug/Display argument (e.g., LowerHex) then you're going to do the wrong thing? So we can't just have one show_usize.... unless we only deduplicate if it's specifically Display/Debug (equivalent for usize)....?

eddyb · 2020-02-11T23:26:45Z

I have no idea how this works, but this:

println!("{0:#0$x}", 20)

prints:

                0x14

Mark-Simulacrum · 2020-02-11T23:32:05Z

That expands to, which has a dedicated from_usize field for the same argument. Seems like we generate a separate arg for it...

    {
        ::std::io::_print(::core::fmt::Arguments::new_v1_formatted(
            &["", "\n"],
            &match (&20,) {
                (arg0,) => [
                    ::core::fmt::ArgumentV1::new(arg0, ::core::fmt::LowerHex::fmt),
                    ::core::fmt::ArgumentV1::from_usize(arg0),
                ],
            },
            &[::core::fmt::rt::v1::Argument {
                position: 0usize,
                format: ::core::fmt::rt::v1::FormatSpec {
                    fill: ' ',
                    align: ::core::fmt::rt::v1::Alignment::Unknown,
                    flags: 4u32,
                    precision: ::core::fmt::rt::v1::Count::Implied,
                    width: ::core::fmt::rt::v1::Count::Param(1usize),
                },
            }],
        ));
    }

The non-LowerHex code also expands to pretty much the same thing, so looks like we don't even deduplicate:

    {
        ::std::io::_print(::core::fmt::Arguments::new_v1_formatted(
            &["", "\n"],
            &match (&20,) {
                (arg0,) => [
                    ::core::fmt::ArgumentV1::new(arg0, ::core::fmt::Display::fmt),
                    ::core::fmt::ArgumentV1::from_usize(arg0),
                ],
            },
            &[::core::fmt::rt::v1::Argument {
                position: 0usize,
                format: ::core::fmt::rt::v1::FormatSpec {
                    fill: ' ',
                    align: ::core::fmt::rt::v1::Alignment::Unknown,
                    flags: 4u32,
                    precision: ::core::fmt::rt::v1::Count::Implied,
                    width: ::core::fmt::rt::v1::Count::Param(1usize),
                },
            }],
        ));
    }

Mark-Simulacrum · 2020-02-11T23:44:34Z

The hard part in changing this is that whereas before we could refer to arg0 twice, since both were within the match and it was guaranteed to be a reference, it'll be more painful to do so now -- in particular, we can only evaluate the expression once still (it could have side effects) so we would need a more complicated desugaring if we want to drop fmt::rt::v1::Count::Param in favor of always specifying a count exactly (if specifying it at all).

Mark-Simulacrum · 2020-02-11T23:49:17Z

{
    let count0: usize;

    ::std::io::_print(::core::fmt::Arguments::new_v1_formatted(
        &["", "\n"],
        &match (&20,) {
            (arg0,) => {
                count0 = *arg0;
                [
                    ::core::fmt::ArgumentV1::new(arg0, ::core::fmt::Display::fmt),
                ]
            },
        },
        &[::core::fmt::rt::v1::Argument {
            position: 0usize,
            format: ::core::fmt::rt::v1::FormatSpec {
                fill: ' ',
                align: ::core::fmt::rt::v1::Alignment::Unknown,
                flags: 4u32,
                precision: ::core::fmt::rt::v1::Count::Implied,
                width: ::core::fmt::rt::v1::Count::Exact(count0),
            },
        }],
    ));
}

I think this desugaring should work though. It preserves the evaluation order and keeps everything consistent from the perspective of the "real" arguments, I believe.

eddyb · 2020-02-12T00:48:46Z

@Mark-Simulacrum The problem with that is that the &[core::fmt::rt::v1::Argument] is really a &'static [_] from promotion (and used to be an explicit static), IIUC, so now you'd have a lot of stuff on the stack instead.

Mark-Simulacrum · 2020-02-12T00:52:49Z

Hm, interesting. We could have a separate (third) array for just the counts, I guess, and store them as &[usize] directly...

eddyb · 2020-02-12T01:09:52Z

I mean, I did consider that, but do you want to take the hit of 16 extra bytes on the stack even when not needed?

However, if we do that thing we've talked about, where the fn pointers are in 'static' memory while the references to the runtime data are separate, we would have to wrap accesses to the runtime data instead of having a safe &[fmt::ArgumentV1], so the runtime data could be a bunch of usizes followed by references, and the only cost would be the count of counts (ugh) in 'static memory.

jonas-schievink added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. C-bug Category: This is a bug. labels Feb 9, 2019

RalfJung mentioned this issue Feb 9, 2019

miri: give non-generic functions a stable address #58324

Merged

Mark-Simulacrum mentioned this issue Feb 11, 2020

Document unsafe blocks in core::fmt #69011

Merged

Mark-Simulacrum mentioned this issue Feb 16, 2020

Miscellaneous cleanup to formatting #69209

Merged

bors closed this as completed in e028f26 Feb 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Formatting code is in conflict with unnamed_addr #58320

Formatting code is in conflict with unnamed_addr #58320

RalfJung commented Feb 9, 2019

oli-obk commented Feb 10, 2019 •

edited

Loading

RalfJung commented Feb 11, 2019

oli-obk commented Feb 11, 2019

RalfJung commented Feb 12, 2019

oli-obk commented Feb 12, 2019

eddyb commented Feb 13, 2019

eddyb commented Feb 11, 2020

Mark-Simulacrum commented Feb 11, 2020

eddyb commented Feb 11, 2020 •

edited

Loading

Mark-Simulacrum commented Feb 11, 2020

eddyb commented Feb 11, 2020

Mark-Simulacrum commented Feb 11, 2020

eddyb commented Feb 11, 2020

Mark-Simulacrum commented Feb 11, 2020

Mark-Simulacrum commented Feb 11, 2020

Mark-Simulacrum commented Feb 11, 2020

eddyb commented Feb 12, 2020

Mark-Simulacrum commented Feb 12, 2020

eddyb commented Feb 12, 2020

Formatting code is in conflict with unnamed_addr #58320

Formatting code is in conflict with unnamed_addr #58320

Comments

RalfJung commented Feb 9, 2019

oli-obk commented Feb 10, 2019 • edited Loading

RalfJung commented Feb 11, 2019

oli-obk commented Feb 11, 2019

RalfJung commented Feb 12, 2019

oli-obk commented Feb 12, 2019

eddyb commented Feb 13, 2019

eddyb commented Feb 11, 2020

Mark-Simulacrum commented Feb 11, 2020

eddyb commented Feb 11, 2020 • edited Loading

Mark-Simulacrum commented Feb 11, 2020

eddyb commented Feb 11, 2020

Mark-Simulacrum commented Feb 11, 2020

eddyb commented Feb 11, 2020

Mark-Simulacrum commented Feb 11, 2020

Mark-Simulacrum commented Feb 11, 2020

Mark-Simulacrum commented Feb 11, 2020

eddyb commented Feb 12, 2020

Mark-Simulacrum commented Feb 12, 2020

eddyb commented Feb 12, 2020

oli-obk commented Feb 10, 2019 •

edited

Loading

eddyb commented Feb 11, 2020 •

edited

Loading