Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EEP 64: Triple-Quoted Strings #47

Merged
merged 10 commits into from
Jun 28, 2023
187 changes: 187 additions & 0 deletions eeps/eep-0064.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
Author: Kiko Fernandez-Reyes <kiko(at)erlang(dot)org>
Status: Draft
Type: Standards Track
Created: 07-Jun-2023
Erlang-Version: OTP-27
Post-History:
****
EEP XXX: Triple-quoted binary strings
----

Abstract
========

This EEP proposes the introduction of triple-quoted binary-strings syntax and
runtime semantics. The main benefit is to allow multi-line binary strings in an
easy way, similar to other languages, e.g., Elixir.

Rationale
=========

One limitation of Erlang today (June 2023) is the ability to write multi-line
string binaries. That said, the main reason to consider this EEP is
[EEP-59][] where documentation attributes can
benefit from multi-line string binaries: the documentation generates Docs chunks
format -- documentation in its binary format-- which saves space and makes the
documentation available to the shell.

Triple-Quoted Binary-String Design Decisions
---------------------------------------------

### Runtime semantics

Triple-quoted strings should only produce binary-strings. This makes easy to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry this may be potentially confusing. Why is "foo" a list but """foo""" a binary? What if I want to write a long text but as a charlist instead? Here is a potential example. So, even though I would prefer binaries, I believe a more consistent option is to return charlists.

Then, for binaries, there are a few options:

  1. Write <<"""...""">>

  2. Build on EEP 63 to introduce u"""..."""

  3. However, for documentation in particular, the compiler can convert -doc """...""". into binaries, so the distinction may not be terribly important

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely agreeing with José here, I think triple quotes strings should produce strings in erlang unless I closed in a binary context, this would be more consistent with the existing Erlang syntax.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with José as well, I think it is confusing to use """" as a binary. Another option for unicode binaries could be to use backticks:

io:format(`backtick unicode binary`)
-doc ```
       backtick 
       multiline
       unicode
       binary
     ```.

Not the most convenient key on some keyboards, but Javascript seems to get away with using it so maybe we can aswell.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like using backticks to mean binary. That nicely solves how to write binary literal strings on one line as well. I think we have a winner!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backticks is interesting because it also exists in Prolog. Erlang's syntax comes from Prolog as we all know (which also has single quote for atoms). In SWI Prolog it seems to be configurable how double-quoted and back-quoted strings work: https://www.swi-prolog.org/pldoc/man?section=string and in GNU Prolog similarily.

But, for simple binary strings, isn't a prefix like b"binary string" simpler and more aligned with Python, C, Rust, etc. for different kinds of strings?

I think b"" is enough for UTF-8 encoded binaries as long as the source code is UTF-8 encoded, which it normally is. (If we then add triple quotes for charlist strings, then b"""...""" seems the natural choice for the binary version.)

(Btw C11 uses u"" for UTF-16 strings, U"" for UTF-32 strings and u8"" for UTF-8, so u"" for UTF-8 would be confusing in this context.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but a regexp string is a different animal where I guess you do not want any escape sequences, and therefore cannot use interpolation. Possibly chosen with a prefix, e.g r"foo\(\.\)+".

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that Erlang becomes more Perl-like. :-)

Copy link

@wojtekmach wojtekmach Jun 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally, choosing backtick will also depend on the documentation format. If we choose Markdown, the backticks in Markdown are used for code snippets. Which means that, if I want to write string:to_upper(`hello`)

This will be also problematic with heredocs and fenced code blocks:

image

(I'm pasting an image because I don't know how to write it in Markdown so it renders correctly. :))

I'm personally a fan of code blocks done with 4 spaces indent but fenced blocks are sometimes useful too.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wojtekmach Markdown allows any number of backticks, so if you need 3 backticks, just write it within 4 backticks. Perhaps Erlang could allow that trick too:

-doc ````
Some function.

Example:

```
> 1 + 2
3
```
````

(Note that the above example was enclosed in 5 backticks.)

Copy link

@williamthome williamthome Jun 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the backtick idea, but I'll prefer it to be used like in Javascript, where it permits to do interpolation, for example:

Bar = `bar`,
<<"foo bar 2">> = `foo ~s{Bar} ~p{1+1}`.

or

-define(BAR, `bar`).

-doc ```
foo ~s{?BAR} ~p{1+1}
```

Or (why not) single backtick as an alternative to triple quotes using the same idea, the last backtick defines the indentation

-doc `
backtick 
multiline
unicode
binary
`

% Single line also a valid syntax
-doc `backtick unicode binary`

BTW, triple quotes give more control of the encoding (I don't see any problem writing the below)

<<"""
foo bar
"""/utf8>>

avoid typing issues with union types, and makes clear that a triple-quoted string
does not need function with guards to test if the produce result is a list or a
binary (this can happen if Erlang adds string interpolation,
[EEP-62][],
[PR-7343][]).

Triple-quoted binary-strings do not allow string interpolation and, if we were
to accept them, only calls to statically known values should be allowed, e.g.,
macros. This design decision forbids the creation of dynamic documentation that
generates code at runtime, e.g., doing an IO call to get a value not known
statically and based on it perform an action (notice that this is allowed when
using string interpolation).

### Binary-String Design

To make the usage of binary-string similar to other languages, this EEP proposal
makes the binary-string design decisions similar to the ones for the Elixir language.

#### Binary-String Start and End

Binary-strings must start and end with triple-quotes in their own lines:

*Code Example*

X = """
This is the beginning of the triple-quote text
"""

*Documentation Example*

-doc """
Removes double-quotes (") from a given string
"""
remove_double_quotes(X) ->

#### Binary-Strings Closing of Triple-Quotes

The closing of the triple-quotes denotes the space (indentation):

*1. Code Example*

>> X = """
This text
has no indentation
"""

Equivalent to: <<"This text\nhas no indentation\n">>

*2. Code Example*

>> Y = """
This text
has indentation
"""

Equivalent to: <<" This text\n has indentation\n">>

*3. Code Example*

>> Z = """
This text has
two space indentation
"""

Equivalent to: <<" This text has\n two space indentation\n">>

*1. Documentation Example - No indentation*

-doc """
Removes double-quotes (") from a given string
"""
remove_double_quotes(X) ->

*2. Documentation Example - No indentation*

-doc """
Removes double-quotes (") from a given string
"""
remove_double_quotes(X) ->

*3. Documentation Example - Indentation*

-doc """
Removes double-quotes (") from a given string
"""
remove_double_quotes(X) ->

#### Binary-Strings Errors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps worth mentioning it also errors when there is a """ and it is not immediately followed spaces and a newline? Elixir says:

iex(1)> """foo"""
** (SyntaxError) iex:1:1: heredoc allows only optional whitespace followed by a new line after """

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are still compatible with existing code, and one can write strings using single quotes, we may have to allow code that for some reason is """foo""", which in Erlang produces "foo". If we use backtick instead of quotes, then your point stands

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it is worth pushing a deprecation warning to Erlang/OTP 26.x that warns on triple quotes?


An error should be raised when there is text that starts before the closing of triple-quotes, where
the error makes the code written using triple-quotes to have a uniform style:

*Code Example*

>> Err = """
This shall be an error
"""

error

*Documentation Example*

-doc """
Removes double-quotes (") from a given string
"""
remove_double_quotes(X) ->

#### Escaping Newlines in Triple-Quotes

Escaping new lines using `\`:

To write `\` inside triple quotes, one needs to use `\\`. [EEP-59][] should
ignore the meaning of `\` if inside a code block.

*Code Example*

>> X = """
This text \
has no indentation
"""

Equivalent to: <<"This text has no indentation\n">>
kikofernandez marked this conversation as resolved.
Show resolved Hide resolved

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense the possibility to comment after \?
For example:

"""
foo \ % Maybe I want to break this line to add a comment about it
bar
"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's ambiguous because \ is no longer at the end. Should \% be allowed then at the end? And, if so, how do you actually know \ is at the end? Couldn't \ also be used to escape space for some reason?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's ambiguous because \ is no longer at the end.

Oh, this is true and makes things tricky.

Should \% be allowed then at the end?

Oh, using \% makes sense to me.

And, if so, how do you actually know \ is at the end?

I think this is easy, I've implemented this now based on your idea of the \% on this branch of a side project called heredocs. That's what the implementation outputs:

"foo bar" = """
            foo \% This is a comment and must be ignored
            bar\
            """

I don't know if this is a good idea, but it's possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that the last example is still ambiguous with escaping the % character itself. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand why you want a comment in a multi line string.
All in it should be string content, otherwise it isn't very useful, right?

We want it for Markdown documentation, and Markdown uses trailing \ for forcing a line break, which clashes head on with using trailing \ for newline escape.

So I do not know if we want newline escape. Or any escape codes at all, for that matter, since Markdown uses \ for escaping things, just as the regular Erlang string syntax, and these may also collide...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want it for Markdown documentation, and Markdown uses trailing \ for forcing a line break, which clashes head on with using trailing \ for newline escape.

\ will escape the new line character. To force the line break, you need \\ (because Markdown needs to see the \ itself). So to me it makes logical sense: if you use \\, you are escaping the character that would escape/hide the line break. :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment inside the string makes the syntax too complex IMO. It's already possible to put two strings side-by-side to concatenate them, e.g.

"foo " % this comment is ignored
"bar" = """
        foo \
        """
        %% this comment would be ignored too
        """
        bar\
        """.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want it for Markdown documentation

Sorry @RaimoNiskanen, I commented without thinking in Markdown.
BTW, we can write comments in Markdown in many different ways.


*Documentation Example*

-doc """
Removes double-quotes (") \
from a given string
"""
remove_double_quotes(X) ->

[EEP 59]: https://www.erlang.org/eeps/eep-0059
"EEP 59: Module attributes for documentation"

[EEP-62]: https://www.erlang.org/eeps/eep-0062
"String Interpolation Syntax"

[PR-7343]: https://github.com/erlang/otp/pull/7343
"Feature: String Interpolation"

Copyright
=========
RaimoNiskanen marked this conversation as resolved.
Show resolved Hide resolved

This document is placed in the public domain or under the CC0-1.0-Universal
license, whichever is more permissive.

[EmacsVar]: <> "Local Variables:"
[EmacsVar]: <> "mode: indented-text"
[EmacsVar]: <> "indent-tabs-mode: nil"
[EmacsVar]: <> "sentence-end-double-space: t"
[EmacsVar]: <> "fill-column: 70"
[EmacsVar]: <> "coding: utf-8"
[EmacsVar]: <> "End:"
[VimVar]: <> " vim: set fileencoding=utf-8 expandtab shiftwidth=4 softtabstop=4: "