erlang · RaimoNiskanen · Jun 28, 2023 · Jun 7, 2023 · Jun 16, 2023 · Jun 16, 2023
diff --git a/eeps/eep-0064.md b/eeps/eep-0064.md
@@ -0,0 +1,364 @@
+    Author: Kiko Fernandez-Reyes <kiko(at)erlang(dot)org>,
+        Raimo Niskanen <raimo(at)erlang(dot)org>
+    Status: Draft
+    Type: Standards Track
+    Created: 07-Jun-2023
+    Erlang-Version: OTP-27
+    Post-History:
+****
+EEP 64: Verbatim Multi-line Indented Strings
+----
+
+Abstract
+========
+
+This EEP proposes the introduction of Verbatim Multi-line Indented strings,
+*VMI Strings*, and defines their semantics.  The main benefit is to allow
+*multi-line* strings in an easy and useful (*indented*) way,
+similar to other languages, e.g. Elixir.
+
+Their first use case is for in-module documentation attributes
+containing [Markdown][] or similarly formatted text where *verbatim* text
+is desirable since any documentation text format has its own notion
+of escape sequences which will collide with Erlang's escape sequences.
+
+Rationale
+=========
+
+Today (June 2023), writing multi-line strings is awkward and arguably ugly.
+They may contain escape sequences and have no concept of indentation:
+
+    foo () ->
+        case bar() of
+             ok ->
+                 X = "First line
+    Second line with \"\\*not emphasized\\* Markdown\"
+    Third line",
+                 {ok, X}
+        end.
+
+The content's indentation cannot adhere to the surrounding code's
+and the `*` has to be doubly escaped to get a  `\*` character
+sequence into the actual content.
+
+In a documentation attribute as suggested in [EEP 59][],
+the indentation problem is not that pronounced because
+the documentation attribute itself is not much indented:
+
+    -doc """
+        First line
+        Second line with "\*not emphasized\* Markdown"
+        Third line
+        """.
+
+The main reason to consider this EEP is for documentation
+attributes, where not having to worry about escape sequences
+is this EEP's most attractive property.  Introducing a new string
+format, however, will also require defining how it shall behave
+in Erlang code.
+
+Having a string format that is only allowed in attributes would
+simply be very strange and the one suggested in this EEP
+would also be useful in Erlang code.
+
+Design Decisions
+----------------
+
+An attribute is an Erlang form in the source code
+that consists of a `-` token, an atom, one value term and a full stop
+(dot).  The value term may be enclosed in parentheses
+(which is not very interesting for documentation attributes).
+
+    -doc "  Badly formatted
+    documentation paragraph
+    /-\\
+    \\-/".
+
+A documentation attribute should have a string as its content term,
+and here we want to use our new and more convenient VMI String
+instead of a normal string:
+
+    -doc """
+          Better formatted
+        documentation paragraph
+        /-\
+        \-/
+        """.
+
+### VMI String Scanner Token
+
+A VMI String must be a token that the scanner recognizes as a string,
+which makes it suitable for a documentation attribute value term.
+It starts and ends with three double quotes: `"""`.
+
+Double quotes, `"`, are chosen because normal Erlang strings use them
+and this is just a new variant.  Since double quotes are used
+a VMI String shall, as a normal string, produce a list of codepoints.
+
+It would be more convenient if a VMI String produced an UTF-8 binary,
+but that would be a surprising feature for double quotes, and
+the documentation build process can work around this by converting
+the codepoint list (`string()`) into the needed binary chunk.
+
+In source code a VMI String is valid in a binary,
+so producing a Unicode binary is reasonably straightforward:
+
+    X = <<"""
+        Line 1
+        Line 2
+        """/utf8>>
+
+The extra overhead is not exhausting since this we are targeting
+multi-line strings here.
+
+As a future expansion it has been proposed to use prefixes
+for specialized strings such as regular expressions,
+interpolated variables ([PR-7343][]), Unicode binary strings, etc.
+For example: `X = u"Tschüß"` for an UTF-8 encoded string.
+
+### VMI String Start
+
+After the starting `"""` only white-space is allowed
+up to the end of the line.
+
+As a possible future expansion we might allow a keyword here
+that shouldn't be part of the string content, but could be
+a hint for for syntax highlighting in the editor.
+
+    -doc """ md
+        Markdown content
+        * Bullet list
+        """.
+
+The scanner does not need to have any special treatment
+for the string content on the line after the starting
+`"""` except that it should not search for an ending `"""`.
+
+A later step, preferably done by the parser, strips the
+characters up to and including the newline from the
+first line of the string.
+
+If any of these characters is not white-space,
+the parser reports a syntax error.
+
+### VMI String End
+
+All bytes are collected as they are (verbatim) and becomes
+the VMI String content.
+
+A VMI String ends with newline followed by optional white-space
+and then by  `"""`.  This completes the scanner token.
+
+A later step, preferably done by the parser, uses the white-space
+on the ending line as the definition of the string's indentation
+and strips that white-space sequence from every line in the string,
+and strips the newline preceding the ending line.
+
+If any of the lines do not start with the defined indentation
+either because the line is too short or if the prefix differs,
+the parser reports a syntax error.
+
+Requiring that all lines must have exactly the same white-space
+characters as indentation is a simple solution to not have
+to define how indentation white-space (tab vs. space) normalization
+should be done, and also seems like a reasonable requirement.
+
+### Leading and trailing newline
+
+The rules above strips one leading and one trailing newline.
+This is a simple convention that also gives control over
+the string's content:
+
+    "\n  X\n" = """
+
+          X
+
+        """,
+
+    "X" = """
+        X
+        """,
+
+    "" = """
+
+         """
+
+Note that the following could be a syntax error; too short
+multi-line string since trailing newline shall be stripped
+both from the starting line and from the last content line,
+so the content could be seen as less then empty.
+But it is more convenient to allow it as also an empty string:
+
+    "" = """
+         """
+
+The definition of newline and white-space is the same as
+the current in the scanner, but; when stripping the newline
+from the line preceding the ending line, a CR should also
+be stripped, if the line ends in CR LF.
+
+This is a convenience for systems where CR LF is used as newline.
+In most places in the scanner CR is treated as white-space,
+but in this case it would be inconvenient to not strip the CR.
+
+### Indentation
+
+The rules above facilitates indentation of the content
+to adhere to the surrounding code.  The ending line
+determines the indentation.
+
+    "This text\nhas no indentation" = """
+        This text
+        has no indentation
+        """,
+
+    "    This text\n    has indentation" = """
+            This text
+            has indentation
+        """,
+
+    "  This text\nhas an indented first line" =
+        """
+          This text
+        has an indented first line
+        """,
+
+    """
+    This is a syntax error (incorrect indentation)
+        """,
+
+    """ This is a syntax error
+    (non-white-space on start line)
+    """,
+
+    """
+    This will probably be a syntax error
+    since no ending line can be found"""
+
+### Backwards incompatibility
+
+This is valid today:
+
+    X = """
+        X
+        """
+
+It is equivalent to:
+
+    X = "" "
+        X
+        " ""
+
+Which is equivalent to:
+
+    X = "
+        X
+        "
+
+Which is equivalent to:
+
+    X = "\n    X\n"
+
+But with the suggested VMI Strings the first
+code snippet would instead be equivalent to:
+
+    X = "X"
+
+Also, this is valid today:
+
+    X = """ xxx
+      X
+        """
+
+But according to this EEP it would be two syntax errors:
+
+1. The start line has got non-white-space after `"""`.
+2. The first content line has incorrect indentation.
+
+There are many other similar constructions that also
+would be syntax errors.
+
+* It is far from likely that anyone has deliberately
+  used `"""` in source code to mean an empty string
+  concatenated to another string.
+* Most today allowed combinations with `"""` will cause
+  syntax errors.  Only a few will have a subtly changed
+  behaviour (string content).
+* Users can simply grep for `"""` in their source code.
+  Causing the same sequence e.g through macros would
+  be harder to find, but the worst problem would not
+  be new syntax errors (hard to miss), but changed
+  behaviour.  And the changed behaviour would be
+  a slightly different string content.
+
+Therefore, it should be very unlikely that anyone
+encounters a real backwards incompatibility problem
+from the suggestions in this EEP.
+
+### Quoting of `"""`
+
+In the rules above there is no possibility to have `"""`
+first on a line in a VMI String.
+
+This would be allowed:
+
+    -doc """
+        A VMI String starts with: """
+        and ends with: """
+        """.
+
+As long as `"""` isn't first on a line.
+
+It would be possible to work around in Erlang code:
+
+    X = """
+        A VMI String starts with:
+        ""
+        """ """
+        "
+        and ends with:
+        ""
+        """ "\""
+
+That is ugly, and it is not possible in a documentation
+attribute where string concatenation isn't allowed.
+
+We can either ignore the problem since it is only
+when placed first on a line that `"""` is a problem,
+or we can use the GitHub [Markdown][] trick to allow
+3 or more start characters and matching end characters
+so this would be valid:
+
+    X = """"
+        A VMI String starts with:
+        """
+        and ends with:
+        """
+        """"
+
+[EEP 59]: https://www.erlang.org/eeps/eep-0059
+    "EEP 59: Module attributes for documentation"
+
+[EEP 62]: https://www.erlang.org/eeps/eep-0062
+    "String Interpolation Syntax"
+
+[PR-7343]: https://github.com/erlang/otp/pull/7343
+    "Feature: String Interpolation"
+
+[Markdown]: https://github.github.com/gfm/
+    "GitHub Flavored Markdown"
+
+Copyright
+=========
+
+This document is placed in the public domain or under the CC0-1.0-Universal
+license, whichever is more permissive.
+
+[EmacsVar]: <> "Local Variables:"
+[EmacsVar]: <> "mode: indented-text"
+[EmacsVar]: <> "indent-tabs-mode: nil"
+[EmacsVar]: <> "sentence-end-double-space: t"
+[EmacsVar]: <> "fill-column: 70"
+[EmacsVar]: <> "coding: utf-8"
+[EmacsVar]: <> "End:"
+[VimVar]: <> " vim: set fileencoding=utf-8 expandtab shiftwidth=4 softtabstop=4: "