-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EEP 64: Triple-Quoted Strings #47
Conversation
a77d229
to
2fca160
Compare
eeps/eep-0064.md
Outdated
""" | ||
remove_double_quotes(X) -> | ||
|
||
#### Binary-Strings Errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps worth mentioning it also errors when there is a """
and it is not immediately followed spaces and a newline? Elixir says:
iex(1)> """foo"""
** (SyntaxError) iex:1:1: heredoc allows only optional whitespace followed by a new line after """
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are still compatible with existing code, and one can write strings using single quotes, we may have to allow code that for some reason is """foo"""
, which in Erlang produces "foo"
. If we use backtick instead of quotes, then your point stands
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it is worth pushing a deprecation warning to Erlang/OTP 26.x that warns on triple quotes?
eeps/eep-0064.md
Outdated
|
||
### Runtime semantics | ||
|
||
Triple-quoted strings should only produce binary-strings. This makes easy to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry this may be potentially confusing. Why is "foo"
a list but """foo"""
a binary? What if I want to write a long text but as a charlist instead? Here is a potential example. So, even though I would prefer binaries, I believe a more consistent option is to return charlists.
Then, for binaries, there are a few options:
-
Write
<<"""...""">>
-
Build on EEP 63 to introduce
u"""..."""
-
However, for documentation in particular, the compiler can convert
-doc """...""".
into binaries, so the distinction may not be terribly important
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completely agreeing with José here, I think triple quotes strings should produce strings in erlang unless I closed in a binary context, this would be more consistent with the existing Erlang syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with José as well, I think it is confusing to use """"
as a binary. Another option for unicode binaries could be to use backticks:
io:format(`backtick unicode binary`)
-doc ```
backtick
multiline
unicode
binary
```.
Not the most convenient key on some keyboards, but Javascript seems to get away with using it so maybe we can aswell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like using backticks to mean binary. That nicely solves how to write binary literal strings on one line as well. I think we have a winner!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backticks is interesting because it also exists in Prolog. Erlang's syntax comes from Prolog as we all know (which also has single quote for atoms). In SWI Prolog it seems to be configurable how double-quoted and back-quoted strings work: https://www.swi-prolog.org/pldoc/man?section=string and in GNU Prolog similarily.
But, for simple binary strings, isn't a prefix like b"binary string"
simpler and more aligned with Python, C, Rust, etc. for different kinds of strings?
I think b""
is enough for UTF-8 encoded binaries as long as the source code is UTF-8 encoded, which it normally is. (If we then add triple quotes for charlist strings, then b"""..."""
seems the natural choice for the binary version.)
(Btw C11 uses u""
for UTF-16 strings, U""
for UTF-32 strings and u8""
for UTF-8, so u""
for UTF-8 would be confusing in this context.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but a regexp string is a different animal where I guess you do not want any escape sequences, and therefore cannot use interpolation. Possibly chosen with a prefix, e.g r"foo\(\.\)+"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that Erlang becomes more Perl-like. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally, choosing backtick will also depend on the documentation format. If we choose Markdown, the backticks in Markdown are used for code snippets. Which means that, if I want to write
string:to_upper(`hello`)
This will be also problematic with heredocs and fenced code blocks:
(I'm pasting an image because I don't know how to write it in Markdown so it renders correctly. :))
I'm personally a fan of code blocks done with 4 spaces indent but fenced blocks are sometimes useful too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wojtekmach Markdown allows any number of backticks, so if you need 3 backticks, just write it within 4 backticks. Perhaps Erlang could allow that trick too:
-doc ````
Some function.
Example:
```
> 1 + 2
3
```
````
(Note that the above example was enclosed in 5 backticks.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the backtick idea, but I'll prefer it to be used like in Javascript, where it permits to do interpolation, for example:
Bar = `bar`,
<<"foo bar 2">> = `foo ~s{Bar} ~p{1+1}`.
or
-define(BAR, `bar`).
-doc ```
foo ~s{?BAR} ~p{1+1}
```
Or (why not) single backtick as an alternative to triple quotes using the same idea, the last backtick defines the indentation
-doc `
backtick
multiline
unicode
binary
`
% Single line also a valid syntax
-doc `backtick unicode binary`
BTW, triple quotes give more control of the encoding (I don't see any problem writing the below)
<<"""
foo bar
"""/utf8>>
I'm not a fan of triple quoting for this. Providing an end marker avoids having to worry about escaping and whatnot. Something like this in a terminal:
But of course adapted to Erlang syntax. The end marker is whatever you need, it could be |
eeps/eep-0064.md
Outdated
has no indentation | ||
""" | ||
|
||
Equivalent to: <<"This text has no indentation\n">> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense the possibility to comment after \
?
For example:
"""
foo \ % Maybe I want to break this line to add a comment about it
bar
"""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's ambiguous because \
is no longer at the end. Should \%
be allowed then at the end? And, if so, how do you actually know \
is at the end? Couldn't \
also be used to escape space for some reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's ambiguous because
\
is no longer at the end.
Oh, this is true and makes things tricky.
Should
\%
be allowed then at the end?
Oh, using \%
makes sense to me.
And, if so, how do you actually know
\
is at the end?
I think this is easy, I've implemented this now based on your idea of the \%
on this branch of a side project called heredocs
. That's what the implementation outputs:
"foo bar" = """
foo \% This is a comment and must be ignored
bar\
"""
I don't know if this is a good idea, but it's possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is that the last example is still ambiguous with escaping the %
character itself. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand why you want a comment in a multi line string.
All in it should be string content, otherwise it isn't very useful, right?
We want it for Markdown documentation, and Markdown uses trailing \
for forcing a line break, which clashes head on with using trailing \
for newline escape.
So I do not know if we want newline escape. Or any escape codes at all, for that matter, since Markdown uses \
for escaping things, just as the regular Erlang string syntax, and these may also collide...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want it for Markdown documentation, and Markdown uses trailing \ for forcing a line break, which clashes head on with using trailing \ for newline escape.
\
will escape the new line character. To force the line break, you need \\
(because Markdown needs to see the \
itself). So to me it makes logical sense: if you use \\
, you are escaping the character that would escape/hide the line break. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment inside the string makes the syntax too complex IMO. It's already possible to put two strings side-by-side to concatenate them, e.g.
"foo " % this comment is ignored
"bar" = """
foo \
"""
%% this comment would be ignored too
"""
bar\
""".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want it for Markdown documentation
Sorry @RaimoNiskanen, I commented without thinking in Markdown.
BTW, we can write comments in Markdown in many different ways.
This would make Erlang syntax context-dependent. You wouldn't be able to parse this anymore with In general with triple quote syntax, I love it from Elixir. However, having used Rust quite extensively that doesn't have it, regular multi-line strings can serve this use-case relatively decently. With a good-enough integration with a formatter, it's not particularly annoying either. For example see how it's used in Rust Analyzer for testing - https://github.com/rust-lang/rust-analyzer/blob/f8dec25bd70cc3568069daf8c3d5f2a65e3aa4cb/crates/ide/src/move_item.rs#L193. I'd say the much more useful feature is "raw strings" that disable escape sequences. Taking examples from the EEP, they would need to be written as below: -doc "
Removes double-quotes (\") from a given string
".
foo() ->
X = "
This is the beginning of the triple-quote text
",
use_x(X). The un-nesting of string under This does leave the issue of the extra whitespace at the beginning (and end) though it could be handled with "escaping the newline", and potentially foo() ->
X = "\
This is the beginning of the triple-quote text\
",
use_x(X). While this is "uglier" it does avoid introducing an new concept and syntactic form to the language. |
@michalmuskala I haven't felt the need for triple quoted strings in Erlang so far but if the goal is to introduce docstrings then I think the need for them is well justified. Imagine writing examples in the documentation and having to escape every string in the documentation itself. So I think the |
@michalmuskala My concern is to generalize this syntax because of the list of integers. How to deal with this: 1> [92,65,66,67].
"\\ABC" If I understand you correctly, the compiler would automatically strip the 92.
|
I'm not sure why the parser would have to be context dependent. The scanner, sure, but we already can't parse Erlang with |
If it's only for
... which would need to be handled by the scanner and be equivalent to |
Yes, I fully agree that "raw" strings or strings that disable escaping are important - more than triple-quoted strings. Their utility would be beyond docs - e.g. regex |
@zuiderkwast It's not only for documentation, it's also useful to write better multi-line strings in Erlang code. See the EEP abstract and rationale. It's the main reason, but not only for this propose. |
Documentation is the main reason for this EEP and Markdown was mentioned a lot of times to be compatible and to not clash with it, but reading about Markdown to technical writings I ended to AsciiDoc:
The license is Apache 2.0. It's simple and powerful. O'Reilly has it in their docs and has a repo with examples of how to, so books are written using it. Looking at the examples the format is compatible with GitHub. I'm not trying to cause any confusion about this, quite the opposite. Maybe it can be a better alternative to Markdown for docs. This link compares it to Markdown and this link talks about Markdown compatibility. |
Yes or no. It should be possible to use Verbatims strings for these test strings. Your editor should allow you to insert CR into the source code, last on the line. In Emacs you use Ctrl-Q to quote the CR ([Enter]). It is only on the last line; where All whitespace ( Also, the string start is Note that |
@williamthome: Regarding trailing backslash and escaping. Just because of all the quirks of escape sequences e.g. by backslashes in different documentation text formats such as Markdown, maybe also AsciiDoc, and whatnot, I think it is simplest to not have any escape character in these Verbatim strings. We'll see how to do about value interpolation one day... |
Improve the use of conditionals and punctuation. Rename to "Verbatim Multi-line Indented Strings" to emphasize the verbatim property and to avoid ambiguity about how to pronounce the abbreviation. Clarify that there are two forms of the empty string.
Thank you for the feedback! I have updated (and renamed, again). I realized that there are two forms of empty string, so I wrote a paragraph about that. |
Alright then it is possible but it is not ideal: the CR is hidden (like the LF), so it's no longer possible to see the difference between a test that tries sending requests with only LF, compared to a test that properly sends CRLF for example. You can see it if you display hidden whitespaces but you won't see it on GitHub or others. Modifying the string to make the LF into CRLF would make things more obvious I think ( |
That is a problem. In Emacs it shows as
There are many suggestions on creative string prefixes appearing.
How to combine them into a reasonable semantics will have to be a future problem. This EEP mentions "specialized strings such as regular expressions, interpolated variables ([PR-7343][]), Unicode binary strings, etc" and I think this can fall under other "specialized strings". Your use case needs escape sequences, so as you do concatenating single quoted strings is maybe not that bad... |
Probably not a good idea anyway. CRLF is increasingly becoming a legacy problem. I wouldn't build a feature around it. Workarounds in the documentation are appreciated though.
The downside is not being able to just copy paste stuff, for example examples from RFCs or output from developer tools. Or copying a test input into |
eeps/eep-0064.md
Outdated
This EEP proposes the introduction of Verbatim Multi-line Indented strings, | ||
*VMI Strings*, and defines their semantics. The main benefit is to allow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the term "VMI Strings" going to be used in docs whenever we describe such strings? I don't love this acronym. I think its nicer to call them triple-quoted strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@essen: I guess the fitting general feature here would be Multi-line Indented string, that has escape sequences. The question is if that merits a prefix...
@zuiderkwast: I just wanted to call them something in the EEP that was not too long since I have to mention them many times. "Triple quoted strings" is a better name, even if they could be "N>=3 quoted strings". "Verbatim strings" is another possibility. But if we later add prefixes to make them more flexible, e.g. loose "verbatim", it is maybe "triple quoted" that is the one property that must remain...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Triple quoted" or "Multi-line". I'd say we cannot take that away from them either. That they can be indented is just what makes them good for multi-line.
I have polished two small language details. That, I hope, would be the last on this track. Now, in the next commit, I will switch viewpoint and argue for multi-line strings with escape sequences... |
Polish the text and clarify CR LF handling
That did not happen. I renamed them, again, to "triple-quoted strings" and clarified some details. I tried to see that escape sequences would be useful, but ended up back in that it will be nice to have them out of the way for any documentation text markup source format |
@RaimoNiskanen Now I see what you mean and I agree with you. This EEP should not include the trailing backslash or any other, this is a job for a post-process or format functions. Triple-quoted strings should only handle multiline strings and permit "quotes inside quotes". -doc {file, "foo.md"}. and the This is the first line.\
This is the second one. considering the backslash trailing % this will be the output:
-doc "This is the first line.This is the second one."
% but this is expected
-doc "This is the first line.\\\nThis is the second one." So, I switch viewpoint and I'm against the trailing backslash that I mentioned before. |
@williamthome quotes cannot be escaped. Nothing can be escaped. |
Yes I'm arriving at the same conclusion. It will be very useful to not have to escape anything, and not just for documentation. Strings can always be post-processed if necessary (even as a parse transform if that's important). |
eeps/eep-0064.md
Outdated
"\nX" = """ | ||
|
||
X | ||
"""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""" | |
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops!
eeps/eep-0064.md
Outdated
"\r\nX" = """ | ||
|
||
X | ||
"""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""" | |
""" |
* Verbatim strings: Motivate more why * Comparison with Elixir: Fleshes out the motivations
I added the sections "Verbatim strings" and "Comparison with Elixir" |
For what is worth, I am very happy with the current proposal. It outlines the problem well, the design decisions and their limitations, plus possible future improvements. 👍 Regarding the name, I would propose indeed verbatim strings, as it clarifies it does not support escape characters. |
I also think "Verbatim Strings" is a better name, today, but given that a future extension might be Sigils, these Triple-quoted Strings could start to support e.g. escape sequences and interpolation... So "triple-quoted" is the thing that they always will be. Except maybe if they become "at-least-triple-quoted", but the name is still sort of valid. "Heredocs" could be a name, especially for a free-to-choose delimiter as at-least-triple-quoted. I actually think the name "heredocs" in Elixir is a bit misleading since the delimiter is fixed to |
I merged this EEP which only means that it is recognized and in status: Draft. I also wrote an implementation of the EEP: erlang/otp#7451 |
This EEP discusses the design of the triple-quoted binary strings.
In the case that document attributes EEP 59 and interpolation strings (EEP 62) are added before this EEP, interpolation attributes are to be disallowed in documentation attributes (this is mentioned in the EEP).
The semantics should feel familiar for Elixir developers, and should be pretty close to Elixir triple-quote semantics.
Feedback is welcome.