From 7023cb12b7f2c589282c854ae0f9170a029b10ea Mon Sep 17 00:00:00 2001 From: Eric McCarthy Date: Thu, 10 Feb 2022 16:24:01 -0800 Subject: [PATCH 1/3] clarify string descriptions --- toml.md | 41 +++++++++++++++++++++++++++-------------- 1 file changed, 27 insertions(+), 14 deletions(-) diff --git a/toml.md b/toml.md index 4a8f6ab2..e1c2d204 100644 --- a/toml.md +++ b/toml.md @@ -259,12 +259,20 @@ String ------ There are four ways to express strings: basic, multi-line basic, literal, and -multi-line literal. All strings must contain only valid UTF-8 characters. +multi-line literal. -**Basic strings** are surrounded by quotation marks (`"`). Any Unicode character -may be used except those that must be escaped: quotation mark, backslash, and -the control characters other than tab (U+0000 to U+0008, U+000A to U+001F, -U+007F). +All strings must contain only valid UTF-8 encoded characters as is the case for +the TOML document as a whole. Certain control characters are not allowed to +occur literally in any kind of string: U+0000 to U+0008, U+000B, U+000C, U+000E +to U+001F, and U+007F. In basic strings and multi-line basic strings, but not in +literal strings or multi-line literal strings, those control characters can be +described with escapes as specified below. Additional restrictions are described +below. + +**Basic strings** are surrounded by quotation marks (`"`). In addition to the +characters disallowed for all strings mentioned above, U+000A (LF) and U+000D +(CR) may not occur literally in basic strings. Backslash and quotation mark may +only occur literally if they are part of a valid escape sequence. ```toml str = "I'm a string. \"You can quote me\". Name\tJos\u00E9\nLocation\tSF." @@ -340,10 +348,10 @@ str3 = """\ """ ``` -Any Unicode character may be used except those that must be escaped: backslash -and the control characters other than tab, line feed, and carriage return -(U+0000 to U+0008, U+000B, U+000C, U+000E to U+001F, U+007F). Carriage returns -(U+000D) are only allowed as part of a newline sequence. +In addition to the characters disallowed for all strings mentioned above, U+000D +(CR) is allowed only as part of a newline sequence U+000D U+000A (CRLF). As +with basic strings, backslash and quotation mark may only occur literally if +they are part of a valid escape sequence. You can write a quotation mark, or two adjacent quotation marks, anywhere inside a multi-line basic string. They can also be written just inside the delimiters. @@ -405,9 +413,12 @@ apos15 = "Here are fifteen apostrophes: '''''''''''''''" str = ''''That,' she said, 'is still pointless.'''' ``` -Control characters other than tab are not permitted in a literal string. Thus, -for binary data, it is recommended that you use Base64 or another suitable ASCII -or UTF-8 encoding. The handling of that encoding will be application-specific. +As in all strings, most control characters are not permitted even in a literal +string or multi-line literal string. Thus, these literal strings are not suited +for representing blobs of binary data. It is recommended that you use Base64 or +another suitable ASCII or UTF-8 encoding. The handling of that encoding will be +application-specific. + Integer ------- @@ -763,7 +774,8 @@ member_since = 1999-08-04 Dotted keys create and define a table for each key part before the last one. Any such table must have all its key/value pairs defined under the current `[table]` -header, or in the root table if defined before all headers, or in one inline table. +header, or in the root table if defined before all headers, or in one inline +table. ```toml fruit.apple.color = "red" @@ -1008,6 +1020,7 @@ When transferring TOML files over the internet, the appropriate MIME type is ABNF Grammar ------------ -A formal description of TOML's syntax is available, as a separate [ABNF file][abnf]. +A formal description of TOML's syntax is available, as a separate +[ABNF file][abnf]. [abnf]: ./toml.abnf From 4d9eba6536048c5d487b168c1e58323f30e3fce9 Mon Sep 17 00:00:00 2001 From: Eric McCarthy Date: Fri, 11 Feb 2022 23:49:36 -0800 Subject: [PATCH 2/3] make string spec wording more concise while keeping precision --- toml.md | 37 +++++++++++++------------------------ 1 file changed, 13 insertions(+), 24 deletions(-) diff --git a/toml.md b/toml.md index e1c2d204..9bf6f19e 100644 --- a/toml.md +++ b/toml.md @@ -259,20 +259,13 @@ String ------ There are four ways to express strings: basic, multi-line basic, literal, and -multi-line literal. +multi-line literal. All strings must be encoded as valid UTF-8, and can contain +any codepoint except control characters other than tab (U+0000 to U+0008, U+000A +to U+001F, U+007F). Multi-line strings can also contain newlines (U+000A) and +carriage returns (U+000D). -All strings must contain only valid UTF-8 encoded characters as is the case for -the TOML document as a whole. Certain control characters are not allowed to -occur literally in any kind of string: U+0000 to U+0008, U+000B, U+000C, U+000E -to U+001F, and U+007F. In basic strings and multi-line basic strings, but not in -literal strings or multi-line literal strings, those control characters can be -described with escapes as specified below. Additional restrictions are described -below. - -**Basic strings** are surrounded by quotation marks (`"`). In addition to the -characters disallowed for all strings mentioned above, U+000A (LF) and U+000D -(CR) may not occur literally in basic strings. Backslash and quotation mark may -only occur literally if they are part of a valid escape sequence. +**Basic strings** are surrounded by quotation marks (`"`). Backslash and +quotation mark may only occur if they are part of a valid escape sequence. ```toml str = "I'm a string. \"You can quote me\". Name\tJos\u00E9\nLocation\tSF." @@ -305,6 +298,9 @@ like to break up a very long string into multiple lines. TOML makes this easy. **Multi-line basic strings** are surrounded by three quotation marks on each side and allow newlines. A newline immediately following the opening delimiter will be trimmed. All other whitespace and newline characters remain intact. +Carriage returns (U+000D) are allowed only as part of a newline sequence U+000D +U+000A (CRLF). Backslash may only occur if it is part of a valid escape +sequence. ```toml str1 = """ @@ -348,11 +344,6 @@ str3 = """\ """ ``` -In addition to the characters disallowed for all strings mentioned above, U+000D -(CR) is allowed only as part of a newline sequence U+000D U+000A (CRLF). As -with basic strings, backslash and quotation mark may only occur literally if -they are part of a valid escape sequence. - You can write a quotation mark, or two adjacent quotation marks, anywhere inside a multi-line basic string. They can also be written just inside the delimiters. @@ -413,12 +404,10 @@ apos15 = "Here are fifteen apostrophes: '''''''''''''''" str = ''''That,' she said, 'is still pointless.'''' ``` -As in all strings, most control characters are not permitted even in a literal -string or multi-line literal string. Thus, these literal strings are not suited -for representing blobs of binary data. It is recommended that you use Base64 or -another suitable ASCII or UTF-8 encoding. The handling of that encoding will be -application-specific. - +Because most control characters are not permitted even in literal and multi-line +literal strings, these literal strings are not suited for representing blobs of +binary data. It is recommended that you use Base64 or another suitable ASCII or +UTF-8 encoding. The handling of that encoding will be application-specific. Integer ------- From fa22931f421b05837f12e67189401279ac6dc866 Mon Sep 17 00:00:00 2001 From: Eric McCarthy Date: Sun, 13 Feb 2022 19:31:21 -0800 Subject: [PATCH 3/3] reword four ways of strings --- toml.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/toml.md b/toml.md index 9bf6f19e..fb13bf1a 100644 --- a/toml.md +++ b/toml.md @@ -259,10 +259,10 @@ String ------ There are four ways to express strings: basic, multi-line basic, literal, and -multi-line literal. All strings must be encoded as valid UTF-8, and can contain -any codepoint except control characters other than tab (U+0000 to U+0008, U+000A -to U+001F, U+007F). Multi-line strings can also contain newlines (U+000A) and -carriage returns (U+000D). +multi-line literal. Strings can contain any valid Unicode codepoint except the +following control characters: U+0000 to U+0008, U+000A to U+001F, and +U+007F. Note that tab (U+0009) is allowed. Multi-line strings can also contain +newlines (U+000A) and carriage returns (U+000D). **Basic strings** are surrounded by quotation marks (`"`). Backslash and quotation mark may only occur if they are part of a valid escape sequence.