Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raw string literal escape depth is technically limited #50111

Closed
varkor opened this issue Apr 20, 2018 · 11 comments
Closed

Raw string literal escape depth is technically limited #50111

varkor opened this issue Apr 20, 2018 · 11 comments
Labels
A-diagnostics Area: Messages for errors, warnings, and lints E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion.

Comments

@varkor
Copy link
Member

varkor commented Apr 20, 2018

This is not entirely serious, because I can't imagine this plausibly happening in real-world code (generated or otherwise), but in Nightly (Stable has a higher threshold):

fn main() {
    r#[... repeat 2^16 times ...]#""#[... repeat 2^16 times ...]#;
}

fails to parse with:

error: expected one of `.`, `;`, `?`, `}`, or an operator, found `#`
 --> src/main.rs:3:65544
  |
3 |     r######...
  | 

This limit is not mentioned in the reference.

I think we should document the maximum number of #s. We could also reduce the number permitted: 255 should be enough for anyone (#49993 (comment)).

Alternatively, we could ignore it as being a non-problem. Feel free just to close the issue 😄

@varkor varkor changed the title Raw string literals escape depth is technically limited Raw string literal escape depth is technically limited Apr 20, 2018
@varkor varkor added A-diagnostics Area: Messages for errors, warnings, and lints A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools labels Apr 20, 2018
@varkor
Copy link
Member Author

varkor commented Apr 20, 2018

On the other hand, this should make for a fairly straightforward beginner's contribution to the compiler: adding a specific error message for too many #s (and possibly reducing the maximum to 255).
(And ideally accompanying the change with a corresponding update to the reference.)

If anyone wants to do this to get started, I'm happy to mentor it (assuming it hasn't been closed 😄).

@varkor varkor added E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. labels Apr 20, 2018
@steveklabnik
Copy link
Member

@rust-lang/lang do you want to make some kind of guarantee here or just leave it open for implementations ?

@dcao
Copy link

dcao commented Apr 24, 2018

This seems like a nice first issue for a beginner like me to contribute to, so I'd be happy to tackle this one (w/ some mentoring help)!

@varkor
Copy link
Member Author

varkor commented Apr 24, 2018

@CMDD: Great!

There are two places you'll need to modify in the parser for a custom error message:

'r' => {
let start_bpos = self.pos;
self.bump();
let mut hash_count: u16 = 0;
while self.ch_is('#') {
self.bump();
hash_count += 1;
}

(Which parses raw string literals.)
And:
fn scan_raw_byte_string(&mut self) -> token::Lit {
let start_bpos = self.pos;
self.bump();
let mut hash_count = 0;
while self.ch_is('#') {
self.bump();
hash_count += 1;
}

(Which parses raw byte string literals.)

For both, hash_count can be u8, and then you'll want to check the hash_count does not exceed the maximum value. If it does, you want to raise an error (fatal_span_ should work — there are examples in the same file) mentioning the limit.

Then you can lower the hash count field in the tokens themselves:

StrRaw(ast::Name, u16), /* raw str delimited by n hash symbols */
ByteStr(ast::Name),
ByteStrRaw(ast::Name, u16), /* raw byte str delimited by n hash symbols */

You might need to update some other variables for this to type check (./x.py check is helpful here, and 4d34bfd is a similar commit).

Finally, make a new UI test (in src/test/ui — there are lots of examples there) testing the new error. Let me know if you need any more pointers!

@eddyb
Copy link
Member

eddyb commented Apr 24, 2018

I don't understand why those use u16, when ast::Name aka Symbol is an u32 - unless the u16 gets reordered first?

@varkor
Copy link
Member Author

varkor commented Apr 24, 2018

@eddyb: I'm not sure about u32 vs u16, but it was recently changed from usize to u16 by this PR for space efficiency.

@eddyb
Copy link
Member

eddyb commented Apr 24, 2018

I think #49993 (comment) confirms my suspicions. I find it odd though, that @nnethercote relied on field reordering instead of putting the u16 field first (although that would break more code).

@nnethercote
Copy link
Contributor

@varkor: how did you find this -- from code inspection, or fuzzing, or something else?

@eddyb: why is it odd that I relied on field ordering? Is it not reliable?

@eddyb
Copy link
Member

eddyb commented Apr 24, 2018

@nnethercote It was just not immediately obvious why u32 "followed" by u16 was useful size-wise, nor is there a comment about the packing being done, in the source.
Also, we have stricter conditions (that I can never remember) for enum field reordering, so if you want to enforce a specific layout, I'd suggest #[repr(u16)] (or u8) and placing the u16s first.

@varkor Also note that there are other limitations in rustc, although most of them are u32 aka 4Gi.

@varkor
Copy link
Member Author

varkor commented Apr 24, 2018

@nnethercote: I think I saw your PR and noticed that the limit was (slightly) more reachable.

@eddyb: yeah, that's not surprising. I decided to report as in the vein of #40161 (in which the examples are also often unrealistic), but it's probably sensible just to ignore these in general.

@steveklabnik steveklabnik removed the A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools label May 28, 2018
@steveklabnik
Copy link
Member

Removing T-doc; this is basically an "implementation defined" kind of thing, and if we were to guarantee any length here, that would go on the reference. Leaving the issue open because it's tracking the diagnostic aspect.

bors added a commit that referenced this issue Jun 15, 2018
Add error message for using >= 65535 hashes for raw string literal escapes

Fixes #50111.
bors added a commit that referenced this issue Jun 15, 2018
Add error message for using >= 65535 hashes for raw string literal escapes

Fixes #50111.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-diagnostics Area: Messages for errors, warnings, and lints E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion.
Projects
None yet
Development

No branches or pull requests

5 participants