erlang · RaimoNiskanen · Jun 28, 2023 · Jun 7, 2023 · Jun 16, 2023 · Jun 16, 2023
diff --git a/eeps/eep-0064.md b/eeps/eep-0064.md
@@ -0,0 +1,354 @@
+    Author: Kiko Fernandez-Reyes <kiko(at)erlang(dot)org>,
+        Raimo Niskanen <raimo(at)erlang(dot)org>
+    Status: Draft
+    Type: Standards Track
+    Created: 07-Jun-2023
+    Erlang-Version: OTP-27
+    Post-History:
+****
+EEP 64: Multi-line Indented Verabatim Strings
+----
+
+Abstract
+========
+
+This EEP proposes the introduction of Multi-line Indented Verbatim strings,
+*MIV Strings* and defines their semantics.  The main benefit is to allow
+*multi-line* strings in an easy and useful, *indented*, way,
+similar to in other languages, e.g Elixir.
+
+Their first use case is for in-module documentation attributes
+containing [Markdown][] or similarly formatted text where *verbatim* text
+is desirable since any documentation text format has its own notion
+of escape sequences which will collide with Erlang's escape sequences.
+
+Rationale
+=========
+
+Today (June 2023), writing multi-line strings is awkward and arguably ugly.
+They may contain escape sequences and have no concept of indentation:
+
+    foo () ->
+        case bar() of
+             ok ->
+                 X = "First line
+    Second line with \"\\*not emphasized\\* Markdown\"
+    Third line",
+                 {ok, X}
+        end.
+
+The content's indentation cannot adhere to the surrounding code's
+and the `*` has to be doubly escaped to get a  `\*` character
+sequence into the actual content.
+
+In a documentation attribute as suggested in [EEP 59][]
+the indentation problem is not that pronounced because
+the documentation attribute itself is not much indented:
+
+    -doc """
+        First line
+        Second line with "\*not emphasized\* Markdown"
+        Third line
+        """.
+
+The main reason to consider this EEP is for documentation
+attributes, and here not having to worry about escape sequences
+is the most attractive property of this EEP.  But introducing
+a new string format will also require defining how it would behave
+in Erlang code.
+
+Having a string format only allowed in attributes would simply be
+very strange and the one suggested in this EEP would be useful
+also in Erlang code.
+
+Design Decisions
+----------------
+
+An attribute is an Erlang form in the source code
+that consists of a `-` token, an atom, one value term and a full stop
+(dot).  The value term may be enclosed in parentheses
+(which is not very interesting for documentation attributes).
+
+    -doc "  Badly formatted
+    documentation paragraph
+    /-\\
+    \\-/".
+
+A documentation attribute should have a string as its content term,
+and here we want to use our new and more convenient MIV String
+instead of a normal string:
+
+    -doc """
+          Better formatted
+        documentation paragraph
+        /-\
+        \-/
+        """.
+
+### MIV String Scanner Token
+
+A MIV String must be a token that the scanner recognizes as a string,
+which makes it suitable for a documentation attribute value term.
+It starts and ends with three double quotes: `"""`.
+
+Double quotes, `"`, are chosen because normal Erlang strings use them
+and this is just a new variant.  Since double quotes are used
+a MIV String should, as a normal string, produce a list of codepoints.
+
+It would be more convenient if a MIV String produced an UTF-8 binary,
+but that would be surprising when using double quotes, and
+the documentation build process can convert the codepoint list
+into the needed binary chunk.
+
+In source code a MIV String is valid in a binary so
+producing a unicode binary is reasonably straightforward:
+
+    X = <<"""
+        Line 1
+        Line 2
+        """/utf8>>
+
+The extra overhead is not exhausting since this we are targeting
+multi-line strings here.
+
+As a future expansion it has been proposed to use prefixes
+for specialized strings such as regular expressions,
+interpolated variables ([PR-7343][]), unicode binary strings, etc.
+For example: `X = u"Tschüß"` for an UTF-8 encoded string.
+
+### MIV String Start
+
+After the starting `"""` only whitespace is allowed
+up to the end of the line.
+
+As a possible future expansion we might allow a keyword here
+that shouldn't be part of the string content, but could be
+a hint for for syntax highlighting in the editor.
+
+    -doc """ md
+        Markdown content
+        * Bullet list
+        """.
+
+The scanner does not need to have any special treatment
+for the string content on the line after the starting
+`"""` except that it should not search for an ending `"""`.
+
+A later step, preferably done by the parser, strips the
+characters up to and including the newline from the
+first line of the string.
+
+If any of these characters is not whitespace,
+the parser reports a syntax error.
+
+### MIV String End
+
+All bytes are collected as they are (verbatim) and becomes
+the MIV String content.
+
+A MIV String ends with whitespace on a new line followed by `"""`.
+This completes the scanner token.
+
+A later step, preferably done by the parser, uses the whitespace
+on the ending line as the definition of the string's indentation
+and strips that whitespace seqence from every line in the string,
+and strips the newline preceding the ending line.
+
+If any of the lines should not start with the defined indentation
+either because the line is too short or if the prefix differs,
+the parser reports a syntax error.
+
+Requiring that all lines must have exactly the same whitspace
+characters as indentation is a simple solution to not have
+to define how indentation whitespace (tab vs. space) normalization
+should be done, and also seems like a reasonable requirement.
+
+### Leading and trailing newline
+
+The rules above strips one leading and one trailing newline.
+This is a simple convention that also gives control over
+the string's content:
+
+    "\n  X\n" = """
+
+          X
+
+        """,
+
+    "X" = """
+        X
+        """,
+
+    "" = """
+         """
+
+The definition of newline and whitespace is the same as
+the current in the scanner, but; when stripping the newline
+from the line preceding the ending line, a CR should also
+be stripped, if the line ends in CR LF.
+
+This is a convenience for systems where CR LF is used as newline.
+In most places in the scanner CR is treated as whitespace,
+but in this case it would be inconvenient to not strip the CR.
+
+### Indentation
+
+The rules above facilitates indentation of the content
+to adhere to the surrounding code.  The ending line
+determines the indentation.
+
+    "This text\nhas no indentation" = """
+        This text
+        has no indentation
+        """,
+
+    "    This text\n    has indentation" = """
+            This text
+            has indentation
+        """,
+
+    "  This text\nhas an indented first line" =
+        """
+          This text
+        has an indented first line
+        """,
+
+    """
+    This is a syntax error (incorrect indentation)
+        """,
+
+    """ This is a syntax error
+    (non-whitespace on start line)
+    """,
+
+    """
+    This will probably be a syntax error
+    since no ending line can be found"""
+
+### Backwards incompatibility
+
+This is valid today:
+
+    X = """
+        X
+        """
+
+It is equivalent to:
+
+    X = "" "
+        X
+        " ""
+
+Which is equivalent to:
+
+    X = "
+        X
+        "
+
+Which is equivalent to:
+
+    X = "\n    X\n"
+
+But with the suggested MIV Strings the first
+code snippet would instead be equivalent to:
+
+    X = "X"
+
+Also, this is valid today:
+
+    X = """ xxx
+      X
+        """
+
+But according to this EEP it would be two syntax errors:
+
+1. The start line has got non-whitespace after `"""`.
+2. The first content line has incorrect indentation.
+
+There are many other similar constructions that also
+would be syntax errors.
+
+* It is far from likely that anyone has deliberately
+  used `"""` in source code to mean an empty string
+  concatenated to another string.
+* Most today allowed combinations with `"""` will cause
+  syntax errors.  Only a few will have a subtly changed
+  behaviour (string content).
+* Users can simply grep for `"""` in their source code.
+  Causing the same sequence e.g through macros would
+  be harder to find, but the worst problem would not
+  be new syntax errors (hard to miss), but changed
+  behaviour.  And the changed behaviour would be
+  a slightly different string content.
+
+Therefore, it should be very unlikly that anyone
+encounters a real backwards incompatibility problem
+from the suggestions in this EEP.
+
+### Quoting of `"""`
+
+In the rules above there is no possibility to have `"""`
+first on a line in a MIV String.
+
+This would be allowed:
+
+    -doc """
+        A MIV String starts with: """
+        and ends with: """
+        """.
+
+As long as `"""` isn't first on a line.
+
+It would be possible to work around in Erlang code:
+
+    X = """
+        A MIV String starts with:
+        ""
+        """ """
+        "
+        and ends with:
+        ""
+        """ "\""
+
+That is ugly, and it is not possible in a documentation
+attribute where string concatenation isn't allowed.
+
+We can either ignore the problem since it is only
+when placed first on a line that `"""` is a problem,
+or we can use the GitHub [Markdown][] trick to allow
+3 or more start characters and matching end characters
+so this would be valid:
+
+    X = """"
+        A MIV String starts with:
+        """
+        and ends with:
+        """
+        """"
+
+[EEP 59]: https://www.erlang.org/eeps/eep-0059
+    "EEP 59: Module attributes for documentation"
+
+[EEP 62]: https://www.erlang.org/eeps/eep-0062
+    "String Interpolation Syntax"
+
+[PR-7343]: https://github.com/erlang/otp/pull/7343
+    "Feature: String Interpolation"
+
+[Markdown]: https://github.github.com/gfm/
+    "GitHub Flavored Markdown"
+
+Copyright
+=========
+
+This document is placed in the public domain or under the CC0-1.0-Universal
+license, whichever is more permissive.
+
+[EmacsVar]: <> "Local Variables:"
+[EmacsVar]: <> "mode: indented-text"
+[EmacsVar]: <> "indent-tabs-mode: nil"
+[EmacsVar]: <> "sentence-end-double-space: t"
+[EmacsVar]: <> "fill-column: 70"
+[EmacsVar]: <> "coding: utf-8"
+[EmacsVar]: <> "End:"
+[VimVar]: <> " vim: set fileencoding=utf-8 expandtab shiftwidth=4 softtabstop=4: "