Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formatting code is in conflict with unnamed_addr #58320

Closed
RalfJung opened this issue Feb 9, 2019 · 19 comments · Fixed by #69209
Closed

Formatting code is in conflict with unnamed_addr #58320

RalfJung opened this issue Feb 9, 2019 · 19 comments · Fixed by #69209
Labels
C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@RalfJung
Copy link
Member

RalfJung commented Feb 9, 2019

The code for passing arguments into format strings relies on equality of function pointers. This is in conflict with the fact that we set unnamed_addr on all functions.

Global variables can be marked with unnamed_addr which indicates that the address is not significant, only the content. Constants marked like this can be merged with other constants if they have the same initializer.

rust/src/libcore/fmt/mod.rs

Lines 295 to 301 in 618f5a0

fn as_usize(&self) -> Option<usize> {
if self.formatter as usize == ArgumentV1::show_usize as usize {
Some(unsafe { *(self.value as *const _ as *const usize) })
} else {
None
}
}

It might be possible for a user to write a function that gets merged with ArgumentV1::show_usize, and then cause bugs in the above code.

@oli-obk @eddyb

@jonas-schievink jonas-schievink added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. C-bug Category: This is a bug. labels Feb 9, 2019
@oli-obk
Copy link
Contributor

oli-obk commented Feb 10, 2019

These uses are impossible to abuse for users (this is an Argument internal API that is only used if it's a usize anyway). The invariants that make this possible are very subtle though, and we should totally document this better

pietroalbini added a commit to pietroalbini/rust that referenced this issue Feb 10, 2019
miri: give non-generic functions a stable address

This makes Miri correctly handle format string parameters despite rust-lang#58320.

Matching Miri PR: rust-lang/miri#626

r? @oli-obk
Mark-Simulacrum added a commit to Mark-Simulacrum/rust that referenced this issue Feb 10, 2019
miri: give non-generic functions a stable address

This makes Miri correctly handle format string parameters despite rust-lang#58320.

Matching Miri PR: rust-lang/miri#626

r? @oli-obk
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this issue Feb 10, 2019
miri: give non-generic functions a stable address

This makes Miri correctly handle format string parameters despite rust-lang#58320.

Matching Miri PR: rust-lang/miri#626

r? @oli-obk
@RalfJung
Copy link
Member Author

The invariants that make this possible are very subtle though, and we should totally document this better

I wonder which invariant you had in mind here?

@oli-obk
Copy link
Contributor

oli-obk commented Feb 11, 2019

We are only using the as_usize method in getcount, which is only used for the width and the precision fields. These must be usize anyway, so the as_usize function essentially double checks whether we are doing this for usize.

@RalfJung
Copy link
Member Author

So you are saying format_args! actually ensures that width/precision fields are always created with from_usize?

@oli-obk
Copy link
Contributor

oli-obk commented Feb 12, 2019

That's my reading of the code, but I might have missed some stuff. The formatting code is very... frameworky

@eddyb
Copy link
Member

eddyb commented Feb 13, 2019

Yeah, from_usize is used by format_args! in all cases as_usize will be called, the behavior of the latter for non-usize formatting arguments is impossible to observe without going through the unstable API.

But maybe we should benchmark (cc @anp) the cost of using an enum that relies on the null pointer optimization, i.e. Option where Some(fn_ptr) indicates formatting-only, while None indicates usize, which can be formatted (or not?), but also extracted.

@eddyb
Copy link
Member

eddyb commented Feb 11, 2020

What if... we actually called the function?! And there was a Formatter field that the fmt::Display impl for usize would fill in if a flag was set, and return without printing anything?

@Mark-Simulacrum
Copy link
Member

That would increase Formatter size (though it's only ever on the stack), so doesn't matter in practice that much. I would personally prefer not to go with the stashed approach.

We could switch the &'a Void reference to be *const Void and use the value of the pointer instead of reading the usize out from behind that reference.

An alternative is to embed the usize (only used for the count of width/precision) directly into the format argument, instead of indirectly referencing it via index. I think that might not be entirely trivial with today's implementation in the compiler, not sure. I believe this should be possible as the format argument is &usize, so dereferencing it early should be fine. I am not sure if the same argument can be both a width and a formatting argument. I believe the answer is no (otherwise the show_usize argument would conflict).

@eddyb
Copy link
Member

eddyb commented Feb 11, 2020

I am not sure if the same argument can be both a width and a formatting argument.

I think that's true possible though (i.e. using the same argument for both), which is why I went with the show_usize hack.

@Mark-Simulacrum
Copy link
Member

What's true? That it cannot be the same argument? show_usize implies that it can't, in which case we can do the change that I suggested (simplifying the runtime logic, too).

@eddyb
Copy link
Member

eddyb commented Feb 11, 2020

@Mark-Simulacrum Sorry, no, I mean that the usize argument can also be used for as a regular argument, which is why show_usize's body is Display::fmt(x, f).

@Mark-Simulacrum
Copy link
Member

Hm, but if you use it for a non-Debug/Display argument (e.g., LowerHex) then you're going to do the wrong thing? So we can't just have one show_usize.... unless we only deduplicate if it's specifically Display/Debug (equivalent for usize)....?

@eddyb
Copy link
Member

eddyb commented Feb 11, 2020

I have no idea how this works, but this:

println!("{0:#0$x}", 20)

prints:

                0x14

@Mark-Simulacrum
Copy link
Member

That expands to, which has a dedicated from_usize field for the same argument. Seems like we generate a separate arg for it...

    {
        ::std::io::_print(::core::fmt::Arguments::new_v1_formatted(
            &["", "\n"],
            &match (&20,) {
                (arg0,) => [
                    ::core::fmt::ArgumentV1::new(arg0, ::core::fmt::LowerHex::fmt),
                    ::core::fmt::ArgumentV1::from_usize(arg0),
                ],
            },
            &[::core::fmt::rt::v1::Argument {
                position: 0usize,
                format: ::core::fmt::rt::v1::FormatSpec {
                    fill: ' ',
                    align: ::core::fmt::rt::v1::Alignment::Unknown,
                    flags: 4u32,
                    precision: ::core::fmt::rt::v1::Count::Implied,
                    width: ::core::fmt::rt::v1::Count::Param(1usize),
                },
            }],
        ));
    }

The non-LowerHex code also expands to pretty much the same thing, so looks like we don't even deduplicate:

    {
        ::std::io::_print(::core::fmt::Arguments::new_v1_formatted(
            &["", "\n"],
            &match (&20,) {
                (arg0,) => [
                    ::core::fmt::ArgumentV1::new(arg0, ::core::fmt::Display::fmt),
                    ::core::fmt::ArgumentV1::from_usize(arg0),
                ],
            },
            &[::core::fmt::rt::v1::Argument {
                position: 0usize,
                format: ::core::fmt::rt::v1::FormatSpec {
                    fill: ' ',
                    align: ::core::fmt::rt::v1::Alignment::Unknown,
                    flags: 4u32,
                    precision: ::core::fmt::rt::v1::Count::Implied,
                    width: ::core::fmt::rt::v1::Count::Param(1usize),
                },
            }],
        ));
    }

@Mark-Simulacrum
Copy link
Member

The hard part in changing this is that whereas before we could refer to arg0 twice, since both were within the match and it was guaranteed to be a reference, it'll be more painful to do so now -- in particular, we can only evaluate the expression once still (it could have side effects) so we would need a more complicated desugaring if we want to drop fmt::rt::v1::Count::Param in favor of always specifying a count exactly (if specifying it at all).

@Mark-Simulacrum
Copy link
Member

{
    let count0: usize;

    ::std::io::_print(::core::fmt::Arguments::new_v1_formatted(
        &["", "\n"],
        &match (&20,) {
            (arg0,) => {
                count0 = *arg0;
                [
                    ::core::fmt::ArgumentV1::new(arg0, ::core::fmt::Display::fmt),
                ]
            },
        },
        &[::core::fmt::rt::v1::Argument {
            position: 0usize,
            format: ::core::fmt::rt::v1::FormatSpec {
                fill: ' ',
                align: ::core::fmt::rt::v1::Alignment::Unknown,
                flags: 4u32,
                precision: ::core::fmt::rt::v1::Count::Implied,
                width: ::core::fmt::rt::v1::Count::Exact(count0),
            },
        }],
    ));
}

I think this desugaring should work though. It preserves the evaluation order and keeps everything consistent from the perspective of the "real" arguments, I believe.

@eddyb
Copy link
Member

eddyb commented Feb 12, 2020

@Mark-Simulacrum The problem with that is that the &[core::fmt::rt::v1::Argument] is really a &'static [_] from promotion (and used to be an explicit static), IIUC, so now you'd have a lot of stuff on the stack instead.

@Mark-Simulacrum
Copy link
Member

Hm, interesting. We could have a separate (third) array for just the counts, I guess, and store them as &[usize] directly...

@eddyb
Copy link
Member

eddyb commented Feb 12, 2020

I mean, I did consider that, but do you want to take the hit of 16 extra bytes on the stack even when not needed?

However, if we do that thing we've talked about, where the fn pointers are in 'static' memory while the references to the runtime data are separate, we would have to wrap accesses to the runtime data instead of having a safe &[fmt::ArgumentV1], so the runtime data could be a bunch of usizes followed by references, and the only cost would be the count of counts (ugh) in 'static memory.

@bors bors closed this as completed in e028f26 Feb 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants