Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GDScript: Add raw string literals (r-strings) #74995

Merged
merged 1 commit into from
Sep 20, 2023

Conversation

dalexeev
Copy link
Member

@dalexeev dalexeev commented Mar 16, 2023

  • Add support for r-strings to the tokenizer.
    • There are actually few changes, just added 1 indent level (use "Hide whitespace").
    • The behavior is the same as in Python (it is not well described in the proposal).
  • Add syntax highlighting.
    • Known issue: After a line break, escape sequences are highlighted again (like in regular strings).
    • Bonus: Fixed highlighting of Unicode sequences in regular strings.

Closes godotengine/godot-proposals#5362.

@MewPurPur
Copy link
Contributor

MewPurPur commented Mar 16, 2023

Code looks good to me. Implementation idk, I don't get if it's possible to have a raw string with both " and ' in it. Though I guess that's inherent to the concept.

Known issue: After a line break, escape sequences are highlighted again (like in regular strings).

I don't think there's a neat way to solve this without making the highlighter remember things between lines. It might be solved by an extra region paintover, as in getting the string delimiters and then getting all slices starting with "r" + str_delim and ending with str_delim (not sure how to do this last part, I don't think we have a version of get_slice() with different starting and ending delimiters). It might be an overall simplification actually, since it removes the need for in_raw_string in the get_line_syntax_highlighting() func.

@dalexeev
Copy link
Member Author

Code looks good to me. Implementation idk, I don't get if it's possible to have a raw string with both " and ' in it. Though I guess that's inherent to the concept.

Yes, you can combine quotes of different types.

I don't think there's a neat way to solve this without making the highlighter remember things between lines.

I think we can think about this later, the highlighter has other issues related to newlines and not only. I think the main use case for r-strings is regular expressions.

Copy link
Contributor

@MewPurPur MewPurPur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good, but what's up with having a bunch of navigation-related files in the changed files? Did you add them by accident?

@AThousandShips
Copy link
Member

I don't see any navigation files in the changed files, what files are these?

@MewPurPur
Copy link
Contributor

Oh I'm so sorry, I didn't realize I was looking at something else ;w;

@dalexeev
Copy link
Member Author

How it works?

In regular string literals, you can use special characters (\n, \t) and escape quotes, special characters and the escape character itself (backslash). Thanks to this, you get a cleaner literal (special characters instead of whitespaces, zero-width characters, etc.) and the ability to represent any string

r-string literals have the advantage that you get the same string that you see in the source code between r" and " (or between r""" and """, same with single quotes). But you lose the ability to represent any string (some strings become unrepresentable, but they are almost never needed).

For example, in the literal r"..." you cannot use a single ", since it is the terminator of the string literal. But if it is preceded by a backslash, then it is allowed (r"\""). In other words, "escaping" in r-strings exists, but the sequences are escaped into themselves. (Note that you can almost always use other quotes or triple quotes, although in general you still cannot represent an arbitrary string.)

There is another type of raw string literals. R-strings in C++ or Nowdoc syntax in PHP. You can specify a sequence of characters that does not appear in a string, and terminate the literal with the same sequence (in PHP) or the reverse sequence (C++).

   R"(")"
//  <<^>>

   R"*()")*"
//  <<<^^>>>

   R"**()*")**"
//  <<<<^^^>>>>

But I don't think we need a solution as universal as C++ or PHP. String literals in GDScript are already very similar to Python, perhaps we should just copy them like we did in this PR?

@adamscott
Copy link
Member

There is another type of raw string literals. R-strings in C++ or Nowdoc syntax in PHP. You can specify a sequence of characters that does not appear in a string, and terminate the literal with the same sequence (in PHP) or the reverse sequence (C++).

   R"(")"
//  <<^>>

   R"*()")*"
//  <<<^^>>>

   R"**()*")**"
//  <<<<^^^>>>>

But I don't think we need a solution as universal as C++ or PHP. String literals in GDScript are already very similar to Python, perhaps we should just copy them like we did in this PR?

Why not both? Lowercase "r" would be the current implementation, and uppercase "R" would be the C++/PHP one?

@dalexeev
Copy link
Member Author

Why not both?

This is very rarely required, in my opinion. It is a more complex syntax, while GDScript is a simple language. I think that a language like GDScript should not have many different syntaxes for similar purposes, it can confuse users.

Copy link
Member

@adamscott adamscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't check the code yet, but the GDScript team approve the PR.

@dalexeev
Copy link
Member Author

There was a question about whether we should add regular expression literals, like in JavaScript. Answer: RegEx is optional Godot module. r-strings can be useful for more than just regular expressions.

modules/gdscript/gdscript_tokenizer.cpp Outdated Show resolved Hide resolved
modules/gdscript/gdscript_tokenizer.cpp Outdated Show resolved Hide resolved
modules/gdscript/gdscript_tokenizer.cpp Outdated Show resolved Hide resolved
Copy link
Member

@adamscott adamscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved by the GDScript team, r-strings as presented, are a good addition to the language.

@akien-mga akien-mga merged commit 21b1326 into godotengine:master Sep 20, 2023
16 checks passed
@dalexeev dalexeev deleted the gds-r-strings branch September 20, 2023 11:13
@akien-mga
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add raw strings literals to GDScript
7 participants