Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Will custom type syntax be good for TOML health? #603

Closed
LongTengDao opened this issue Mar 10, 2019 · 59 comments
Closed

Will custom type syntax be good for TOML health? #603

LongTengDao opened this issue Mar 10, 2019 · 59 comments

Comments

@LongTengDao
Copy link
Contributor

LongTengDao commented Mar 10, 2019

key = (compute) ' 5 * 60 * 60 '

key = (toSecond) ' 5h '

key = (toTable) [
  ['name', 'age', 'sex'],# head
  ['Jack', '10', 'male'],# item 1
  ['Max', '20', 'male'],# item 2
]

key = (toDOM) '''
  <div>
    <span></span>
  </div>
'''

[table] ('and other custom transform type')

I don't mean the custom type syntax is a replacement of standard types. I am just wondering, maybe the exploration of de facto standards, will facilitate the development of standard types, with less discussion which hard to decide, and avoid these requirement become a dialect which will conflict with spec in the future?

@eksortso
Copy link
Contributor

It's an intriguing idea (definitely post-1.0). May I make a suggestion, though?

Custom types would need to be expected by parsers. Perhaps it would be better to put the custom type after the key's name, to associate that type with the key?

[server1]
timeout (seconds) = 300
header (toDOM) = '''
  <div>
    <span></span>
  </div>
'''
guys (csv) = [
  ['name', 'age', 'sex'],# head
  ['Jack', '10', 'male'],# item 1
  ['Max', '20', 'male'],# item 2
]

The parser, given the smarts to handle them, would produce converted output and handle specified constraints. For instance, header with the (DOM) type would validate the embedded string against a particular HTML standard, and guys with (csv) would hold a string in CSV format equivalent to the 2D array.

One example that intrigues me is the use of units, given this. (See #514.) timeout (seconds) = 3m, for instance, could assign a value of 180 to timeout, which expects seconds but can convert minutes appropriately.

@LongTengDao
Copy link
Contributor Author

LongTengDao commented Mar 13, 2019

@eksortso Oh, I think that's better and thorough!


Emmm... How would you deal with inline array item type? Array item are same type, I know, but inner tables/arrays in inline array could be different...

@eksortso
Copy link
Contributor

eksortso commented Mar 13, 2019

@LongTenDao Well, we could have both types of syntax. Consider two different things: keys with custom type, and values with custom type.

Here's an example that might please a few people, and for good reason. Theguys example above required that all elements be strings. That's a little painful to accept, because ages are numeric. But we can't have numbers in arrays with strings in TOML v0.5.0. But suppose our parser allows an (m) array to be heterogenous ("m" for "mixed type"). We could use something like guy1 = (m)["Jack", 10, "male"], and then use the following for a table that will be converted to a CSV string:

guys (csv) = [
  ['name', 'age', 'sex'],
  (m)['Jack', 10, 'male'],
  (m)['Max', 20, 'male'],
]

Key types and value types could have different meanings for the same type name, although in practice those meanings would be related. For instance, without using units, we could write (with modified syntax; we could allow value types to go before or after the value, but not both):

timeout (s) = 3 (m)

The (s) here means that timeout expects seconds in context, and the (m) here means the value of 3 has a dimension measured in minutes. The parser with appropriate extension logic will see both types, convert 3 minutes to 180 seconds, and assign 180 to timeout.

All of these uses of custom type would be application specific, but their widespread adoption would suggest updates to the TOML standard in the future.

The type expression in parentheses would not conflict with either the key names or the values. Both would be expressed using traditional TOML syntax, unless that type significantly modifies allowed syntax.

This is far from complete, but it's a start. We just need to remember that for configurations, good documentation and proper templates that include custom key types would need to be written for those special types to carry minimal, obvious meaning to naïve readers.

Update: I realized too late that the SI abbreviation for minutes is "min", not "m". Can't help it, though. My point was that key tags may mean something different from value tags with the same names.

@LongTengDao
Copy link
Contributor Author

LongTengDao commented Mar 14, 2019

@eksortso

Well, we could have both types of syntax. Consider two different things: keys with custom type, and values with custom type.

This's what I thought, with scruple, but —

we could allow value types to go before or after the value, but not both

This's really a good inspiration! It makes things look O&M!


Then what do you think about syntax in ()?

We can simply specify (bare-key-rule), but I think we need more, at least ("single line string just like key"), and (any expression) may be better, to allow custom constructors called with arguments:

guys = (''' csv ( 'name', 'age', 'sex' ) ''') [
  ['Jack', '10', 'male'],
  ['Jack', '10', 'male'],
  ['Jack', '10', 'male'],
]

guys = ( [ 'name', 'age', 'sex' ] ) [
  ['Jack', '10', 'male'],
  ['Jack', '10', 'male'],
  ['Jack', '10', 'male'],
]

How do you feel?

I'm suddenly a little afraid of things going towards:

guys (md) = '''
| name | age | sex |
| ---- | --- | --- |
| 'Jack' | 100 | 'male' |
| 'Jim' | 200 | 'female' |
'''

table (yaml) = '''
a:
  - 1
  - 2
b: null
'''

@pradyunsg
Copy link
Member

This is definitely a post-1.0 discussion. It's definitely intriguing.

@eksortso
Copy link
Contributor

eksortso commented Mar 15, 2019

@LongTengDao What syntax would () use? The simplest syntax would allow for a single bare-key-style identifier and that's all. Parentheses surrounding a bare word with no whitespace inside.

;; Part of a naively revised ABNF might look like this.
key = key-name [ ws type ]
type = "(" type-name ")"
type-name = 1*( ALPHA / DIGIT / %x2D / %x5F ) ; A-Z / a-z / 0-9 / - / _

key-name = simple-key / dotted-key

This syntax would allow tables, elements of table arrays, and keys within inline tables to have custom types.

[dictionary (ordered)]

To be generous, we could permit multiple type phrases with literal-string-like syntax (except no parentheses and no commas), separated with commas and whitespace.

sign-text (red bg, white interior) = "WRONG WAY"

Value type syntax can be done in a similar fashion.

We probably should just stick to keylike strings for custom type names. Also, I don't think parameters are strictly necessary. If they're needed, then tables can be used to provide them. Here's an example of how we can put a CSV table value (i.e. a string) in the key guys, using a specially constructed table and the (csv) type:

[guys (csv)]
header = ['name', 'age', 'sex']
rows = [
  ['Manny', '100', 'male'],
  ['Moe', '100', 'male'],
  ['Jack', '100', 'male'],
]

Stuff like the (yaml) example could always happen, and I look forward to the future's "Obfuscated TOML" contests. But a sensible template maintainer would not use such a key type. Much less likely would we find parser developers willing to implement YAML in TOML! So we shouldn't worry too much about these sorts of things. Any popular custom type extensions would be promoted with good documentation, useful examples, and plenty of obviousness and minimalism.

To this end, perhaps we can use pragmas, or some variant thereof (see #522) to specify which types are to be accepted by the parser. In combination, these things make specifying the behavior of the types more objective. For instance, imagine over time that units of measure are gradually accepted into the standard. Say that there's an external standard called "units-of-measure" that most parsers will acknowledge. We could see documents written like this:

# TOML v1.n
# units-of-measure v0.1
timeout (s) = 3 (min)

Then later, in a nifty parallel universe after unit dimensions are adopted and variant unit type names are included in the spec:

# TOML v2.n
timeout (seconds) = 3 min

Thoughts?

Edit: Fixed the tag for minutes.

@LongTengDao
Copy link
Contributor Author

sign-text (red bg, white interior) = "WRONG WAY"

@eksortso You can always get good idea and beautiful example! Wow.

[guys (csv)]
header = ['name', 'age', 'sex']
rows = [
  ['Manny', '100', 'male'],
  ['Moe', '100', 'male'],
  ['Jack', '100', 'male'],
]

@ChristianSi
Copy link
Contributor

In my humble opinion, this is far too complex for TOML, post-1.0 or not. Remember what the 'M' in the name stands for?

@LongTengDao
Copy link
Contributor Author

In my humble opinion, this is far too complex for TOML, post-1.0 or not. Remember what the 'M' in the name stands for?

@ChristianSi This may depend on whether the complexity is caused by this syntax, or is inherent in actual use. If the latter, then the main purpose of this syntax is precisely to avoid TOML becoming more complex. :)

@pradyunsg
Copy link
Member

In my humble opinion, this is far too complex for TOML, post-1.0 or not. Remember what the 'M' in the name stands for?

To be clear, I see this as being similar to equivalent to YAML's tags so I am fairly weary of this. I don't want to block any discussion on this but, I do think it'll be a not-so-easy task to convince me on this FWIW.

@eksortso
Copy link
Contributor

@pradyunsg, But these sorts of tags (I prefer that name, personally) are not defined the way that YAML's tags are. In all cases covered so far, custom types' usages are all parser-dependent. I'd be fine if that's all they ever are. They serve one specific purpose, defined by the app with the parser, and that's it.

But I hope that during our discourse, you can see value in some of these use cases. I'm pretty pleased with the unit-of-measure tags, and those weren't using types other than TOML's integers. Is it not simpler to say timeout (seconds) = 3 (minutes) to set timeout = 180? Is it not self-explanatory (at least, for those who can read English)? Does it not save time? Does it prevent abuse?

But I would never shove (seconds) and (minutes) into the TOML standard. Just a custom type syntax. Or tag syntax. Let others define what (seconds) and (minutes) mean. All that the TOML project would have to do with popular sets of tags is to refer to them in the wiki, or if they're really popular, register them so they can be used in tag pragmas that willing parsers can rely upon. All the heavy lifting would be done away from the core syntax.

@LongTengDao
Copy link
Contributor Author

LongTengDao commented Mar 16, 2019

@eksortso Let's discuss some edge cases?

A. How do we tag on the array self, not the item table?

[[ array-of-table (on array of the key) ]] (on first table value)

[[ array-of-table ]] (on second table value)

# just like:

key (on key) = (on value, which same here) 'value'

[ table (on table) ]

[ table ] (on table too, same as above)

# Or opposite for aligning reason:

[[ array-of-table (on 1st table) ]] (on array)

[[ array-of-table (on 2nd table) ]]

[ table (and just allow this) ]

[ table ] (not allow this)

# Or, just give the parser:

[[ array-of-table (tag-a which parser known for table, tag-b which parser known for array) ]]

How do you like this?

I also want to know why you prefer [ key.key (tag) ] than [ key.key ] (tag)?

I think whatever the final choice, our basis should be consistent firstly: being intuitive is the premise, then is reasonable and unified?

B. How do we tag on table which not directly appeared?

a.b (plan a)
a.b (plan b).x = 1
a.b.y.z = 2

Or just forbidden it, request a.b (plan c) = { x = 1, y.z = 2 } instead?

C. How do we tag on the root table?

(initial lone tag before all expression) # Assume there is a [] root table open statement before the tag?

D. What's the order of tag processing?

key (2nd) = 'value' (1st)
array (5th) = (4th) [
    (3rd) 'item' (2nd),
] (1st)
array = [
    (3rd) [
        'item' (2nd),
    ] (1st),
]

Just in the reverse order they occur like above? Or always from inner to outer, like below?

array (4th) = [
    (2nd) 'item' (1st),
] (3rd)
array = [
    (4th) [
        (2nd) 'item' (1st),
    ] (3rd),
]

Or, from inner to outer, but give the same level tags to the parser at the same time which refer to different meanings:

key (key tag) = 'value' (value tag)

array (key tag) = (invalid -- or value tag for long value) [
    (index tag) 'value' (value tag)
    # but what if here needs a pre value tag for long value
] (value tag)

Sample in JS parser:

function tagProcessorForEach(parent, keyOrIndex, keyOrIndexTag, value, valueTag) { }

Or, only these (key tags) are valid:

[ table (tag) ]

key (tag) = 'value'

array (tag) = [
    (tag) 'value' # tag at where index ("key" for array item) should be, which also nice for long value
]

# Or:

[ table ] (tag) # before content start; and "tag" is just like "comment", after the "[]" statement

key = (tag) 'value'

array = (tag) [
    (tag) 'value' # tag at the front of value always. same result, just another explanation of consistency
]

# Perhaps the post-tag is pretty for primitive values
# (including Boolean, Integer, Float, String, Date-time,
# because parentheses are usually post-tagged in natural language),
# but it can be overlooked for longer strings,
# so should we disable post-tags for strings?
# Or limit the length?
# Or is it better to just allow post-tags for Integer?

key = 1 (s)

# Or the easiest way to do is to leave it out,
# because combinatorial usage is a little less O&M?
# After all, in the first explanation, it's almost always the latter.

And then the order is always from inner to outer.

@eksortso
Copy link
Contributor

@LongTengDao You gave me a lot to think about. Here's my take on the subjects that you raised.

TL;DR

  • Tag key names and values only, but allow a special exception for table arrays' tag syntax.
  • No tagging allowed on the insides of dotted keys.
  • No tagging allowed on the root table.
  • Handle key tags before value tags.
  • Handle tags on collections, like arrays and inline tables, before handling what's inside them.
  • Tags on collections ought to come before, not after, they are specified.
  • For all other value types, put tags before or after the value. But not both.
  • Put tags after keys. (I don't discuss that here, but just remember this.)
  • I prefer tags that look like single bare keys, for a few reasons.
  • A tag-set registry would still be a good idea.

A. How do we tag on the array self, not the item table?

My preference is to bind a key tag to its key's name. The syntax [table-name (with-tag)] does just that.

That said, an exception needs to be made for table arrays. The table array syntax defines the name of the array, but there's no simple way to explicitly separate the array from its elements. The [[]] line starts an element table belonging to the array, and it's expected that the line will appear more than once.

If it's needed, perhaps we allow a tag after the double brackets on the first element table of the array. It would be invalid after any element table line beyond the first one.

# My preference:
[[array-of-tables]] (array-tag) #permitted on 1st [[array-of-tables]]
#...
[[array-of-tables]] #tag not permitted here
#...
[[array-of-tables (table-tag)]] #we can still tag individual elements
#...

This is very much an exception to the norm, as you're about to see.

B. How do we tag on table which not directly appeared?

My preference is, if we can't refer to the key, we can't tag it. So something like a.b.x = 1 would not permit b to be tagged. Something like a.b (plan c) = { x = 1, y.z = 2 } ought to be fine.

By the way, despite my earlier slightly enthusiastic comments, I've come to prefer single bare-key-like tag names. So I'd actually prefer a.b (plan-c) over a.b (plan c).

C. How do we tag on the root table?

My preference: We can't. The root table is never explicitly specified, so per B., it can't be tagged. Besides, the application defines the top level's significance, so that shouldn't change.

D. What's the order of tag processing?

I would prefer key (1st) = 'value' (2nd). The example timeout (seconds) = ... requires timeout to expect either a numeric value representing seconds, or a number tagged with a unit of time measurement. I would rather have that expectation in place before the value is processed. If you can think of examples where it would make more sense to handle the value's tag first, definitely share them.

Collection types may affect how the contents are processed. Consider the example [dictionary (sorted)], which implies that if the keys of dictionary were read into an array, they'd be in order, so they would need to be sorted as they're added. The parser would need to know that ahead of time. Also consider the example of the hypothetical mixed-type array (m)[(and-then-there_s) 'Maude', 58, 'female']. That (m) needs to be recognized first, because for a standard TOML array (as of v0.5.0), the 58 would throw an error.

So I would prefer array (1st) = [ (3rd) 'item', ] (2nd). Or rather, array (1st) = (2nd) ['item' (3rd),] which reads a little more nicely. The same would apply to tables, both standard and inline.

Tag ordering should not change depending on whether a tag appears before or after a value. Please recall: tags can only go on one side of a value. So this is invalid: key = (a) 'item' (b) # INVALID.

In any case, we handle, in the order that they appear, all the tags on each key-value pair or table header as they appear in the document.

[men-with-no-name (1st)]
man-afod = "Joe" (2nd)
man-fafdm (3rd) = "Monco"

[men-with-no-name.man-x (4th)]
who="Blondie" (6th)
whos_who (7th) = (8th) ["OK" (9th), "NG" (10th), "UGH" (11th)]

I think tags on non-collection values can be permitted to go on either side of the value (only on one side per value though). And I wouldn't want to exclude their use.

reason = (good-point) "Like you said, pre-tags make sense when you're dealing with long strings."

But I do think tags on collection values ought to come before the collection, and that's easy to show why.

reason-with-commentary = [
    "There's value in readability.",
    "We wouldn't want to obscure the tag.",
    # Though the reasons aren't always obvious at first glance
    "But when you're dealing with fine details",
    "which may be altered by the presense of the tag,",
    "It's important to put that up front",
    "because you'd need to go through the entire collection",
    "before you realize that there is a tag",
    "and discover that you have to deal with it in a particular",
    "way.",
    "This isn't for the parser so much as it's for the human beings",
    "who have to read the TOML code, even if they don't need to.",

] (TL_DR)    # IF THAT POST-TAG ISN'T INVALID, IT OUGHTA BE!

Other Stuff

Returning to an aside that I made earlier, I mentioned the idea of a tag-set registry. This would include, among other things, an online reference of the meanings of various related tags, blessed by the TOML community for each set's merits. Such a registry would value obviousness, minimalism, clean syntax, a high degree of useability, and very little screwing with stuff that doesn't need screwed with.

Such a registry would use URLs, and I do advocate for bare-key-like tag names, which would require little conversion if they're typed in blindly by human beings, or by IDEs that are just trying to be useful.

Thoughts on any of these things?

@LongTengDao
Copy link
Contributor Author

LongTengDao commented Mar 17, 2019

@eksortso Only discussing one feature is so complex, how difficult it was for Tom to invent TOML! XD

  1. 1. Tag key names and values only, but allow a special exception for table arrays' tag syntax.
  2. 2. No tagging allowed on the insides of dotted keys.
  3. 3. No tagging allowed on the root table.
  4. 4. Handle key tags before value tags.
  5. 5. Handle tags on collections, like arrays and inline tables, before handling what's inside them.
  6. 6. Tags on collections ought to come before, not after, they are specified.
  7. 7. For all other value types, put tags before or after the value. But not both.
  8. 8. Put tags after keys. (I don't discuss that here, but just remember this.)
  9. 9. I prefer tags that look like single bare keys, for a few reasons.
  10. 10. A tag-set registry would still be a good idea.

The items I checked look good to me.

4 & 5

Positive sequence to deal tags maybe not possible, when I try to implement it in my parser (ltd/j-toml/xOptions/tag)... Consider this: (I break lines in example inline table to see clearly)

grand (tag-a) = {
  parent (tag-b) = {
    child (tag-c) = 'value'
  }
}

When process tag for each level, inner level can not read outer level information more than one layer, at least not easy (like dom.parentNode api...), but outer level can easily get any deep inner level information if it need, so I think inner tag is just preliminary preparation, the order to handle tag should be from after to before.

1 & 8 & 9

[[ a (do-a-to-first_do-b-to-first) ]] (do-x-to-array)
[[ a (do-c-to-second) ]]
[[a]] (do a to first, do b to first, do x to every item)
[[a]] (do c to second)

I think the latter one looks more clear (avoid overwhelm conspicuousness of []), which you suggested earlier... and the array/table is send to processor at the same time, which is target depends on the tag content and parser.

10

Did you mentioned the idea of tag-set registry before? Sorry I didn't see it, and can't find it...

I'm not sure what you mean. If it's used for parser, I think that's good; if it's used for .toml file write, I think maybe that's not good... Because tag is a syntax invented to avoid custom conflict with future standard syntax, if using tag will still has spec, it's not tag any more... Would you explain it more?

@drunkwcodes
Copy link

drunkwcodes commented Mar 26, 2019

I would like to have a symbol or a term to determine which lines are manually defined in the case of designing a type syntax.

@LongTengDao
Copy link
Contributor Author

I would like to have a symbol or a term to determine which lines are manually defined in the case of designing a type system.

What do you mean? Currently, all the examples in this issue, use ( ) as the symbol.

@drunkwcodes
Copy link

drunkwcodes commented Mar 27, 2019

I mean it would be good to see hand-written types are distinguishable from annotation types which are auto generated and be written in place.

In that case, we can simplify the generated types more easily.
The conflicts are also easier to be resolved when doing a revision.

@LongTengDao
Copy link
Contributor Author

I mean it would be good to see hand-written types are distinguishable from annotation types which are auto generated and be written in place.

In that case, we can simplify the generated types more easily.
The conflicts are also easier to be resolved when doing a revision.

@drunkwcodes Sorry, I think I need some help.

@eksortso Hi, could you understand what these above mean?

@drunkwcodes
Copy link

drunkwcodes commented Mar 28, 2019

Parenthesis have too many useful meanings besides of noting types.

I got an idea. : for hand-written types and :-> for those calculated types.

@LongTengDao
Copy link
Contributor Author

Parenthesis have too many useful meanings besides of noting types.

I got an idea. : for hand-written types and :=> for those calculated types.

Personally, it's hard for me to distinguish between type and calculating, like below:

[a] (table)
head = ['name','age','sex']
body = [
  ['Tom', '19', 'male'],
  ['Jack', '20', 'male'],
]

It's a type, also calculating

@drunkwcodes
Copy link

drunkwcodes commented Mar 29, 2019

It would be something like this after the first pass.

[a] : table  # unnecessary for it's built-in.
head :-> 1×3 str array = ['name','age','sex']
body :-> 2×3 str matrix = [
  ['Tom', '19', 'male'],
  ['Jack', '20', 'male'],
]

It has canonical types to describe the data.
Having a type declaration(in different lines) is even better.

So we know that it's a 2-by-3 string table with one-line header at the first glance.

A delimiter like , appending units after types sounds more sufficiently.

@TheElectronWill
Copy link
Contributor

TheElectronWill commented Mar 29, 2019

Couldn't we just use (customType)? I fail to see the point of using the uncommon :-> symbol.

In toml parentheses don't have many meanings, so something like this wouldn't be ambiguous.

[a] # table because of the brackets

head (1x3 string array) = [...]
body (2x3 string matrix) = [...]

@LongTengDao
Copy link
Contributor Author

LongTengDao commented Mar 29, 2019

@drunkwcodes Hi

The colon is very close to the semantic status of the equal sign, and data file formats generally avoid using both as much as possible, such as YAML and JSON with the colon and INI/TOML with the equal sign.

But it reminded me of TypeScript, which might help #116 (comment):

config.js
config.d.ts

config.toml
config.d.toml

But it also means that the colon gives me a validator comment sense of "equivalence" rather than "extra transform", which similar to below but with grammar effectiveness:

[a] # table

head = [ ] # 1×3 str array
body = [ ] # 2×3 str matrix

@drunkwcodes
Copy link

drunkwcodes commented Mar 29, 2019

Exactly. It's all about readability.
Not like parentheses, which make me wonder how much meaning it would be in a markup language.

@LongTengDao
Copy link
Contributor Author

LongTengDao commented Mar 29, 2019

@drunkwcodes

Exactly. It's all about readiness.

I think you want to write "readness" which maybe means "readability"?

Not like parentheses, which make me wondering how much meaning it would be in a markup language.

Currently, it's mainly used for exploring new type, which may be not good to wholesale add into spec.

The date-time*4 types are examples, which are obviously differ from other types (primitive types and structure types). It's useful, but time duration is also useful, and there are so many types useful under various situations, which more like syntactic sugar (for { date = 'yyyy-mm-dd', time = 'hh:mm:ss.ddd', offset = 0 } and '5h'), whether support them or not is both with problems. So think between these:

TOML v0.5—without any sugar:

# a moment
attendance = { time = '09:00:00', offset = '+08:00' }

# a duration
rest.from = { time = '12:00:00', offset = '+08:00' }
rest.to = { time = '13:00:00', offset = '+08:00'  }

# a period
work = 28800

TOML v0.5—with custom type syntax:

attendance (moment) = '09:00:00' (+08:00)

rest (duration) = '12:00:00 ~ 13:00:00' (+08:00)

work (period) = '8h' (s)

TOML v50—add all into spec:

attendance = 09:00:00+08:00

rest = 12:00:00+08:00 ~ 13:00:00+08:00

work = 8h0s

@Gin-Quin
Copy link

Gin-Quin commented Mar 30, 2019

Hello there. Here are my thoughts after all this reading.
I agree with the upper comment about bringing complexity to a language which is intended to be simple. I think tags can be an awesome idea, but the usefulness/complexity ratio should be considered.

Type inference

Adding types to a language is something that has been very much thought. The most recent programming languages like Kotlin, Swift, TypeScript, are all typed languages with type inference, and I think there is a good reason why. Types bring stability, clarity. Type inference brings ease of programming for humans.
Writing myNumber = 121 (with type inference) is more readable than myNumber (Integer) = 121, because we are humans behind our computers, and we know what obviously is a number, or an URL, or a date, etc... The present 0.5.0 TOML version uses inference about numbers, dates, strings, and I think that is an awesome feature. (I would think about extending it to URLs)

About the syntax

What about using the same syntax as Swift, TypeScript and Kotlin instead of a C-like syntax?

myAge : Number  =  121
myName : String  =  "Zabu"
myBody : BodyPhysic  =  {
  weight = 12,
  height = 14,
  speed = 37,
  eyes = 'Huge'
}
myFriends: [String]  =  [ "Coco", "Bubu"]

Are explicit types necessary?

Kotlin, Swift, they have type inferences, but also explicit types when it is necessary. But they are programming languages, not configuration/object notation languages. Does TOML need explicit types?
Let's see some concrete examples :

work (period) = '8h' (s)
timeout (s) = 3 (min)

When I look at this code, I have a feeling "That's cool" mixed with another feeling : "That's complicated" :p
There is a left and a right type, which means the TOML parser must know both types and how to convert from one type to another. Plus, OK, that kind of double-type conversions is cool about seconds, and also temperatures (Kelvin/Celsius) and anything like that, but practically, I think it's too much work for just some sugar syntax.

hoursOfWork = 8
timeout = 180  # seconds

That's not as cool, I agree. The human has to convert minutes to seconds himself. But it works fine. There is no ambiguity, thanks to the key name or the comment. And of course the parsing is a lot easier and faster to do.

There is another issue with those kind of conversions : if you create an object from TOML, and then convert back the object to TOML (with a stringify function), you will lose all your type informations.

About that kind of code :

body :-> 2×3 str matrix = [
  ['Tom', '19', 'male'],
  ['Jack', '20', 'male'],
]

I think type inference is the best. For me, the (2×3 str matrix) part - however it is written - is not very useful. Humans know what they see, and so should do the parser. Since all array elements must have the same type, it is not so hard to infere array types.

Now, my favorite point :

[guys] : Csv
header = ['name', 'age', 'sex']
rows = [
  ['Manny', '100', 'male'],
  ['Moe', '100', 'male'],
  ['Jack', '100', 'male'],
]

Ok. Here I see true potential for tags.

User-defined classes

In this example, the user has a CSV object that he wants to convert to/from TOML.
Maybe this CSV object has methods, like addRow or something. Maybe it has a special data organization (maybe not a header and a rows properties).
Plus no all users would need a CSV converter imported with their TOML parser.
The idea is to use the CSV constructor (defined by the user) to create a CSV object.
Then, instead of having this standard map object as a result :

{
  "guys": {
    "header": ["name", "age", "sex"],
    "rows": [...]
  }
}

we would get a true CSV object by passing the resulting Map object to the CSV constructor.
Then the user could also call CSV methods on the result object :

data = TOML.parse(tomlContent)
// data.guys is now a CSV object, so we can call methods :
data.guys.addRow('Hello', '100',  'Toml')

It can work not only with CSV, but with any objects you use in your project, if you've defined a valid constructor. This constructor just has to be accessible by the parser.

Another example, with an user who needs to work with Books :

books : [Book] = [
  { title = "Bees are cool", author = "BeeLover", chapters = 121 },
  { title = "Dogs+me", author = "DogLiker", chapters = 2 },
]

or...

books : [Book]

[[books]]
title = "Bees are cool"
author = "BeeLover"
chapters = 121

Advantages of this idea :

  • The identity of objects is preserved through parsing/stringification, and so are the methods
  • Users can use their own classes with TOML files
  • No need for the parser to deal with many tags/types, as it is the user role to provide the constructors
  • Possibility to easily implement plugins to deal with widely-used types

@drunkwcodes
Copy link

drunkwcodes commented Mar 31, 2019

@Lepzulnag I like this comparison.

TOML is familiar and formal by now, and this is a type syntax which will be superior to those in programming languages right here. But I like it.

I just googled those type syntax. I may be mistaken.

  1. Type annotation mark in Kotlin is as pure as a : as for returned types.

  2. Swift has innovative ? and ! for optional types. (I think it should be just ? in here.)

  3. Typescript is... nothing special.




The unnecessary parts are for having a room to write down the details.




We are going to pick those symbols up, as much formal, informative and readable as possible.

Because it is TOML.

@LongTengDao
Copy link
Contributor Author

LongTengDao commented Mar 31, 2019

Indeed, "type inference" and "custom type" (or "user-defined classes") are two things.

Whether (*) or :*, this issue is discussing "custom type".

BTW: "type inference" is intended for variable deassigning, computing, and passing in api, these only happen in programming language, because configure language is static (and it's complexity is exactly the same with type syntax, they are nothing different to computer):

let a :string = 'abc'; // Without the actions and possible errors below,
                       // there is no need to hand-write a type,
                       // because type syntax is almost the same to value syntax
a = {}; // deassign error
a = a+1; // compute error
! function (p :boolean) { }(a) // passing error

@Lepzulnag So at present, the problem may be: Is the thing whether we call it "custem type" "tag" or "user-defined classes" available for inline element? Inline table and inline array are also an object, and even string literal like date-time and url, is also going to be an object, how to express them if using :?

[block-table] (classA)
inline-array (classB) = [
  (classC) { }
]
inline-primitive (classD) = '/path/to/something'

Do you want one in below?

[block-table] : classA
inline-array :classB = [
  classC: { }
]
inline-primitive :classD = '/path/to/something'
[block-table] : classA
inline-array = classB: [
  classC: { }
]
inline-primitive = classD: '/path/to/something'
[block-table] : classA
inline-array = classB: [
  classC: { }
]
inline-primitive = '/path/to/something' : classD

And limit that custom type must be returned as an object type by the plugins in parser (which is mainly used for configure format), in exchange for better support of stringification (which still has many other untenable things, like an object is whether inline or not, and how to reserve dot keys)?

@eksortso
Copy link
Contributor

A few suggestions for alternatives to parentheses were made. I don't like the colon-based syntax; visually, it's too discreet. Because of their applications, tags ought to stand out!

I do like angle brackets, as it turns out. Something like timeout <seconds> = 180 is appealing to read. Plus, using angle brackets in place of parentheses makes the tags look like HTML or XML, which is widely recognized, though the usage is different.

Tags should follow the same format that keys follow. If you can use a tag's name as if it were a key, then it ought to be good. <bareword> tags would be better than <"complicated or ornate⚝"> tags

Thoughts?

@eksortso
Copy link
Contributor

@LongTengDao, let me split up your last post into a few different posts. A number of things that you mentioned need to be addressed.

With the mentioning of "optional" types, it may be worth revisiting the notion of a null value. Nulls hold no real meaning in configuration; either values are defined or they're not. But in general data description, null means that a value is not defined and that fact is stated explicitly. Since all languages effectively have a "null" object type, it would be painless (post v1.0) to bring null into TOML.

Also, this suggests that tag names ought to allow for a ? suffix, to suggest that the value can possibly be null. It doesn't need to be an explicit requirement for the use of null as a value, but it couldn't hurt.

[bikeshed]
paint <color?> = null

This begs a question of whether tags <color> and <color?> are related. We can leave that question to the application using the tags to answer.

@LongTengDao
Copy link
Contributor Author

LongTengDao commented Apr 29, 2019

@eksortso

  • spec and parser should not consider how the tags will be explained

Sure. It's designed for custom feature.

  • beyond suggesting fundamental types for well-defined syntaxes

I think spec should never interfere the tag feature, to promise custom tag will never be conflict with official feature when upgrade spec version. Unless the spec tell what format tags are reserved in the beginning. But I still suggest official features to use 5h, not to touch <hour> 5.

  • Possibility to easily implement plugins to deal with widely-used types (each such plugin would recommend its own set of tags for use, each defining their own application of TOML with tags)

Yes. In my parser's experimental implementation, it's easy to combined use plugins:

const TOML = require('@ltd/j-toml');
const toml_plugin_a = require('...');
const toml_plugin_b = require('...');

const sourceContent = `
    x = <tag-x> 'value'
    y = <tag-y> 'value'
    z = <tag-z> 'value'
`;

const rootTable = TOML.parse(sourceContent, 0.5, '\n', true, {
    mix: true,
    tag ({ table, key, tag }) {
        switch (tag) {
            case 'tag-x':
            case 'tag-y':
                toml_plugin_a({ table, key, tag });
                break;
            case 'tag-z':
                toml_plugin_b({ table, key, tag });
                break;
            default:
                throw Error('Unknown TOML tag: <'+tag+'>.');
        }
    },
});

  • not use :

Personally, I'm both okay, whether use or not. I leave this point to other discussants.

  • <tag> is better than (tag)

Yeah, if only one. Because () usually used after words, while in some cases must before (before array items; before long value like long string, inline array, inline table). Unless use both of them.

  • <tag> rule is same as bared key

LGTM. In the future when attributes are necessary, it will be <"a {b:c}"> or <a b="c"> or other things looks good. But currently, let's discuss, how to write multi tags ((tag-a, tag-b) in previous discussion)?


  • about <tag?>

I can't see why we need this, is there any relation to tag topic? Tags are intended for type conversion, not validator.


One more question.

This rule is more unified (always before value):

key = <tag> [
    <tag> [
        <tag> 'value',
        <tag> 'value',
    ],
]

Do you still think tag after keys is better, when we stop using () (which is usually after things)?

key <tag> = [
    <tag> [
        <tag> 'value',
        <tag> 'value',
    ],
]

key.key<tag> also makes me associating tags for each node (key<tag>.key<tag>), which has no meaning.

@LongTengDao
Copy link
Contributor Author

LongTengDao commented Apr 30, 2019

Found just now and marked for reference:

Add simple tagging #346

@vagoff Welcome to join the discussion! Time flies and good days come~


Is it possible to append a previously defined value? #612
A reserved syntax for user extension in strings #445

Maybe the custom syntax is also suitable for variable reference requirement:

[tool]
basepath = "~/apps/mytool"
binpath = <mustache> "{{ tool.basepath }}/bin"
[tool]
basepath = "~/apps/mytool"
binpath = <es6> "${ tool.basepath }/bin"

Other related issues collection:

Feature request: Add a duration/timedelta type #514
Artihmetic expression as values #582

@eksortso
Copy link
Contributor

Pardon me, @LongTengDao, for not responding earlier.

  • beyond suggesting fundamental types for well-defined syntaxes

I think spec should never interfere the tag feature, to promise custom tag will never be conflict with official feature when upgrade spec version. Unless the spec tell what format tags are reserved in the beginning.

Well, the spec would set some expectations for how the parser handles the tags. In line with a minimal approach, a parser can assign a tag to a key or a value on a first pass, and then, later on, apply special typing or other features based on how the tags are interpreted by the parser plugin. The assignment part would be part of the TOML standard. The interpretation goes beyond the standard. (I said "plugin" because your experimental parser uses plugins, but a plugin isn't strictly necessary to offer special functionality.)

But I still suggest official features to use 5h, not to touch <hour> 5.

Agreed. That's another issue now.

I'd rather save multi-tagging for later discussion.

  • about <tag?>

I can't see why we need this, is there any relation to tag topic? Tags are intended for type conversion, not validator.

It wasn't my intention to use <tag?> to validate a tagged value. I was saying that a key tag with a question mark at the end of the tag's name could, by convention, suggest that the key can explicitly hold a null value, and in some cases may encourage it.

Some languages have "optional" types that are identical to a simpler type except that they allow null values. A tag application like paint <color?> would mean that a config setting paint would be declared to be an optional Color type. This could be made 100% implicit, and some languages (like SQL, for better or worse) just work that way.

But when someone writes and presents you with a configuration template like paint <color?> =, you can rest assured that you could ignore the question of what color your paint is and just supply a null, knowing the program that would use the filled-out config can handle paint being null.

Do you still think tag after keys is better, when we stop using () (which is usually after things)?

key <tag> = [
    <tag> [
        <tag> 'value',
        <tag> 'value',
    ],
]

My conception of key tags is that they put expectations on the values assigned to those keys. So it's like paint <color?> = tells the user just before the equals sign that, hey, the paint option is gonna be a color, however you define it, or it could be nothing. The phrasing <color?> paint doesn't carry that same semantic weight.

key.key<tag> also makes me associating tags for each node (key<tag>.key<tag>), which has no meaning.

I don't see that at all. Maybe I'm revisiting the same example too much, but bikeshed.paint <color> makes paint colorful, even if the bikeshed isn't.

@jeff-hykin
Copy link

jeff-hykin commented Aug 18, 2021

I think the conversation on this topic is going well.

Just some feedback from an outsider to confirm some things and point out some other things (hopefully this will be helpful).

  • As a user, I just want to attach tags to a value.

    • Nothing complicated like typecasting, validation, or string interpolation, just something like day = '10/15' #birthday #mm-dd #annual. I'm interested in all those other usecases, but a simple tagging system alone would be a huge step forwards.
  • In terms of an MVP for average usecases I see

    • Forward/backward compatibility => critical
    • multi-tags => not critical to MVP (but critical later)
    • inline tagging => (very nice but) not critical MVP
    • method for user applications to read/handle tags => critical MVP
    • maintaining round-trip editability => critical MVP
  • For the user application side:

    • I don't see much discussion on how stringify/write/attach-ing tags to a value will work. (Maybe it's obvious/implicit and I'm just unaware)
    • The reading/parsing/plugin example shown by @LongTengDao looks good to me as an MVP
  • For the syntax of the tags

    • Non-programmers use config files too, and parentheses are used for 'tagging' things in English, but angle brackets for type definitions AFAIK are pretty much exclusive to C++-style languages with static typing. Because of that I strongly believe file (relative-path) = 'blah/blah' is significantly more easy to understand than file <relative-path> = 'blah/blah'.
    • For the same reason as above, I think having the parentheses after a value, 10 (seconds) or /wbiqnwl/ (regex) is much more natural than having tags infront of the value (seconds) 10, (regex) /a$;(2!*-_+#?/
    • The (like-this) vs (like this) vs ("Like this") syntax choice doesn't matter to me.
    • However, I strongly agree: please, please, please do not allow (''' csv ( 'name', 'age', 'sex' ) ''') style tags. Just a tag, not a separate unstandardized externally-parsed sub-syntax
  • misc notes:

    • Please do not make key-tags just a secondary value-tag. For future compability, please have an MVP that allows for proper key tags. Examples of key tags would be such as (builtin) or (inherited) or (css-scope) that have meaning specifically for the key and not the value.
    • As a user I desperately need round-trip parsing (which toml currently does great at). If Toml is too complex, it becomes impractical. If applications are required to handle tags in a one-way/destructive way, round trip could be really hard. And having comments stripped from documents (because of no round-trip editing) is a big deal for config files.

@arp242
Copy link
Contributor

arp242 commented May 18, 2022

There are two things here:

  1. Attach a tag to keys (key (tag) = value)
  2. Attach a tag to values (key = value (tag) or k = (tag) value).

I think the second item makes things too complex, stuff like this:

array (5th) = (4th) [
	(3rd) 'item' (2nd),
] (1st)

Is pretty hard to understand.

Even things like this seems too complex to me:

timeout (s) = 3 (m)

And you can just use a different min tag for the key instead:

timeout (min) = 3

Which I find much more obvious.


Personally I think the key (tag) = value syntax might be a good thing to add, but nothing more. This is:

  1. Still fairly obvious.
  2. Not too complex.
  3. Easy to implement.
  4. Provides enough flexibility for most cases.

Also maybe adding multiple tags might be a good idea:

img1 (base64,png) = '''[..]'''
img2 (base64,jpeg) = '''[..]'''

On the other hand, all of this will be implementation-defined, and an implementation can already do the same with just regular strings:

img = '''base64,png: [..]'''

key = 'compute: 5 * 60 * 60'

timeout = 'min: 3'

Which avoids having to add any syntax. I'm not so sure if the k (t) = v syntax really has much advantage over this, other than looking a bit nicer.

@jeff-hykin
Copy link

jeff-hykin commented May 19, 2022

Personally I think the key (tag) = value syntax might be a good thing to add, but nothing more. This is:

1. Still fairly obvious.

2. Not too complex.

3. Easy to implement.

4. Provides enough flexibility for most cases.

I agree with all this.

On the other hand, all of this will be implementation-defined, and an implementation can already do the same with just regular strings:

img = '''base64,png: [..]'''

key = 'compute: 5 * 60 * 60'

timeout = 'min: 3'

But this creates a big string escaping problem. What if we want the value to literally be the string 'compute: 5 * 60 * 60' we then have to define an escaping mechanism like 'string: "compute: 5 * 60 * 60"' and now basically all strings that contain colons need to use that syntax: which can be a painfully sharp non-standard edge case.

That problem^ is the main reason I'm advocating for tags. Because without tags there are hard-coded assumption and painful edgecases, on top of a lack of standards and custom/manual parsing

@arp242
Copy link
Contributor

arp242 commented May 19, 2022

But this creates a big string escaping problem. What if we want the value to literally be the string 'compute: 5 * 60 * 60' we then have to define an escaping mechanism like 'string: "compute: 5 * 60 * 60"' and now basically all strings that contain colons need to use that syntax: which can be a painfully sharp non-standard edge case.

Ah yeah, that's a good point; and every application that wants something like this will have to figure out escaping as well.

@tintin10q
Copy link

tintin10q commented Jan 6, 2023

Nearly all security problems come from user input.

I don't think the ability for the TOML parser to parse further according to tags should be in the specification.

In theory the syntax is obvious and minimal but in real life it won't be. In my opinion, after the parser has parsed the TOML file it should simply return primitive values native to most programming languages to the application. The application can then decide how to interpret and use these values. The application will know what it needs to do. If a TOML parser implementation is allowed to automatically parse input further based on tags, you are taking control away from the main application and tie/lock it to the implementation of the parser. This actually makes everything infinitely more complex and brings potential security vulnerabilities.

With tags, you could trick parsers to parse complicated things and every parsers' implementation will support different tags. For example, with a #png tag you could force a parser to start parsing a png image or html with #html and these are not trival at all to safely parse. To parse these languages, the TOML parser implementations will likely use other peoples parsers. This means that the TOML parser function could actually call many different parsers for different languages. Even if the application doesn't expect a #png value, but it was supplied anyway. All, these other parses for other languages have to be trusted.

What if the parser implementation decides that if you put a #png tag that it has to read it as a filename and then read a file from disk? Someone could put the name of a password file? You could say parser implementers won't do this. But they could if the tags exist because it allows for it. The implementers and users of TOML have to be protected from this. TOML doesn't know enough about where it is used to make this decision, but that doesn't mean people won't attempt to make
these decisions.

Tags will have different behaviours between different programming languages and different implementations. A #base64 tag as bytes in python but maybe in JavaScript it makes a bytes array. One is mutable one is not. This means you get different behaviour and outcomes for the same tags and for this to be secure in your application you have to look at the documentation of the parser for what could happen. You really should only have to look at the TOML spec to know what your TOML file will do.

This proposal opens up the ability for the TOML parser to arbitrary call other parsers (or any code) based on what the TOML parser you're using implemented. That is a terrible and dangerous idea.

What all these points have in common is undefined behaviour. Adding tags by definition adds undefined behaviour to the spec because the implementations can do whatever they want when they encounter a tag. Adding undefined behaviour is a really, really bad idea.

@eksortso
Copy link
Contributor

@tintin10q At the heart of your criticisms is your sentiment, which I find myself agreeing with wholeheartedly:

You really should only have to look at the TOML spec to know what your TOML file will do.

This is appealing because it enforces the principles of obviousness and minimalism. (This actually states your case more strongly than bringing up "undefined behavior," which is arguably bad practice in programming language specs, but TOML is not a programming language. But I digress.)

In our discussions, we talked about arbitrary things that tags could do, which certainly falls outside the scope of TOML and which I must admit I was speculating about at length without security concerns. So I am changing my tone, but I have a different approach now, which I will elaborate on below. But first, let me see if I understand your point of view.

Allowing the possibility for arbitrary behavior in the specification could make it seem like we encourage abuses of the syntax. We certainly don't. However, violations are already possible without changes to the spec, because some parsers read and preserve comments, and some consumers may read those comments and make changes to their configurations. Currently, we do not make explicit that comments ought to be ignored by parsers, because format-preserving parsers and TOML document encoders need that room to maneuver.

I'll be opening a PR which is intended to curb this abuse. But I'll also open a separate issue around the syntax that's been discussed here, because the idea of parenthetical comments may be worth considering. Neither of these things is intended to address notions of type syntax (which in TOML is completely determined by existing value syntaxes), but I will still refer to this issue when I make them since these proposals stem from the exploration conducted here.

@tintin10q
Copy link

TOML is not a programming language

I agree, TOML is not a programming language and it should not be. It is an input language. However, the parsers are written in programming languages. I believe that if arbitrary tags were added, TOML could have become a programming language because with arbitrary behavior it was essentially undefined what a parser should do when encountering a tag which means it was up to the parser to decide and the parser could decide to do anything. But perhaps this is not the same definition as undefined as on the Wikipedia I linked.

Allowing the possibility for arbitrary behavior in the specification could make it seem like we encourage abuses of the syntax.

You understand my view although I don't think encourage is the right word. Even if you would explicitly discourage abuse in the spec, the ability to do so would be there and that will go wrong at some point with people wanting to do 'clever things' with their parsers and then we get to You really should only have to look at the TOML spec to know what your TOML file will do..

About the comments violations. I agree that somehow preserving the comments is nice. Otherwise they would all be removed from the file when you would read and write back a TOML file. However, this is less of an issue than the arbitrary tags because comments do not have to be parsed any further as they are just strings and should stay strings but clearly defining how parsers should deal with comments further is of course a good idea.

and some consumers may read those comments and make changes to their configurations

With consumers do you mean a parser implementation or an application? I think that if you mean an application than this is not that bad. Although I wouldn't that it is a good idea it is still the application making the choices not the parser. If you do mean the parser than I would say that that parser is just not compliant with the TOML spec and being too clever.

With something like a #html tag or #base64 tag or #url tag the parser will probably call other peoples parsers to do this. This increases dependencies and the attack surface as people could also supply tags that the application doesn't actually expect which could be a security issue. Unless specified in the TOML spec it should be up to the application to how parse things further and not up to the parser implementation.

@eksortso
Copy link
Contributor

@tintin10q You said:

You understand my view although I don't think encourage is the right word. Even if you would explicitly discourage abuse in the spec, the ability to do so would be there and that will go wrong at some point with people wanting to do 'clever things' with their parsers and then we get to You really should only have to look at the TOML spec to know what your TOML file will do..

That ability for abuse is still there, but any such abuse would make the abusing parser non-conformant. That's the most that we can do, really. If such abuse persists, then we could either adopt their changes into the standard or refuse to condone them, making appropriate modifications in either case.

We can make it more difficult for "clever" solutions to take root. If #950 gets merged, for instance, then parsers cannot mess with configurations by looking for and reading comments. So any "clever" solution would have to rely on non-standard syntax (like type tags) or unusual naming conventions or some such voodoo to do clever things, for better or worse.

and some consumers may read those comments and make changes to their configurations

With consumers do you mean a parser implementation or an application?

I meant post-parsing end-user applications when I said "consumers."

Let's stop repeating ourselves. I don't know what will happen with the tag discussions posed here. I had an idea which may be more confusing than it needs to be, but it may serve an important purpose. What if we took the parenthetical syntax, the words in round brackets like (min), and just treated them like inline comments? A line like timeout (min) = 3 would just assign 3 to timeout, but the user would be informed that the number assigned refers to a quantity in minutes.

"Clever" users might be tempted to write timeout (seconds) = 30, then complain when their requests take ten times longer to bail. So the inline-comment idea may be lousy. I think they'd have their purpose (and prevent overly long key names like timeout-minutes). But I think the idea may be worth exploring under a new suggestion issue.

@tintin10q
Copy link

tintin10q commented Jan 19, 2023

I don't think timeout-minutes is overly long. I think timeout-minutes=3 makes more sense than the alternative that you propose timeout (min) = 3. The timeout-minutes is only 4 characters longer and one whole concept people have to learn about less. Also if you actually put the same unit name in the comment timeout (minutes) = 3 is actually longer than just having the unit in the value name. I think it is a good practice to put the unit of something in the name whenever possible.
This way in the application will also know better what the unit is.

That's the most that we can do, really. If such abuse persists, then we could either adopt their changes into the standard or refuse to condone them, making appropriate modifications in either case.

I think the best option is just to ignore non-conformant and potentially learn from the ideas they came up with.

Inline comments by themselves might be a good idea. But I would not use another syntax for it with the (). One, because there is already a comment system so why not just extend that. 2 () are very often used for other things so I wouldn't use them for comments.

A better way to do inline comments is to just say that comments end when you encounter another #

So like this:

timeout # seconds # = 60
timeout #minutes# = 1

Although this does make parsing harder because now you have to keep track of when you are in a comment. I also think that timeout = 30 # seconds is equally as clear and doesn't require an extension to the language. But timeout-seconds = 30 is still better.

@eksortso
Copy link
Contributor

Bracketing comments between hash signs is a non-starter because it will break any comment with a # inside it. This is why I proposed a different syntax, and I already acknowledged problems that could arise with that syntax.

I was trying to use a simple example to explain how a template writer could put units as comments after key names. There are more complicated key names than timeout-minutes after all, and my modest proposal (which I've decided not to make a PR for) doesn't prevent users from sticking unit names as suffixes onto key names.

@LongTengDao
Copy link
Contributor Author

I have a meta question.

size (K) = 1 (M)

What should this get?

  1. 1_000_000_000
  2. 1_000

@jeff-hykin
Copy link

jeff-hykin commented Jan 22, 2023

I have a meta question.

size (K) = 1 (M)

What should this get?

If this is asking for the output if the toml parser, even assuming tags were implemented, I would expect/hope that the output structure is still { "size": 1 }.

The point, or what I believe makes tags useful, is precisely that they don't change the structure. A number, that happens to be a unit of time, is still structurally a number (not a table, or a list) so if we want to keep the structure, but add the additional info of "minutes", that's where tags become relevant.

As is true for most current yaml parsers of docs with tags, the program still receives the plain/normal structure by default. For compatibility across toml parsers, it wouldn't make sense for toml to interpret the tags and manipulate the structure. If the program wants non-structural information whether it's tags or comments (for round-trip), it would make sense for that info to be a separate.

E.g.

doc = toml.parseDocument("thing.toml")
doc.data # { "size": 1 }
doc.tagForValue([ "size" ]) # "M"
doc.tagForKey(["size"]) # "K"

Without tags, two programs must "just know" timeout is in seconds. Tags don't change the fundamental need of interpretation, both programs still need to "just know" (e.g. coordinate) that "ms" means milliseconds and not microseconds. But, on top of being human-visible, the difference is that it's easier for two programs to coordinate on what a "ms" tag means compared to coordinating on the interpretation of every single timeout, delay, offset, start time, end time, etc.

What should this get?

So, if this is asking for the program output (instead of toml parser output), its like asking what units should the program get for { timeout = 300 }.

It just doesn't matter, the program could interpret the 300 as an enum value, or as 300 degrees kelvin, or the timeout value could be entirely ignored. Same for the (M) and the (K).

@jeff-hykin
Copy link

jeff-hykin commented Jan 22, 2023

I think the real question is do the toml maintainers want to allow non-structural information?

If yes, then a human-readable syntax can be debated (and probably solved), and a write-with-tag method can be devised.

If no, then this issue should just be closed.

@tintin10q
Copy link

tintin10q commented Jan 23, 2023 via email

@eksortso
Copy link
Contributor

In my opinion allowing non-structural information is not a good idea and the issue should be closed.

@tintin10q I don't entirely agree with your take on non-structural information; my reasons would take too long to explain succinctly here.

Bur with all due respect to @LongTengDao who opened this suggestion, we need to start fresh. Let's close this issue, and any of the various topics that we discussed here, if they're worth reintroducing, can be given better focus with new issues.

@pradyunsg
Copy link
Member

I think the real question is do the toml maintainers want to allow non-structural information?

Based on reviewing the discussion here, I don't think tag-style rich information is a good idea. Quoting from the objectives of the language:

TOML is designed to map unambiguously to a hash table. TOML should be easy to parse into data structures in a wide variety of languages.

Neither of these are feasible with tag information. You need to either (a) modify the serialised data or (b) provide tag-like information via a side-channel. Both of thsoe are no-gos from my perspective.


size (K) = 1 (M)

What should this get?

An error? I think any behaviour other than an error here is going to be non-trivial to explain.


Let's close this issue, and any of the various topics that we discussed here, if they're worth reintroducing, can be given better focus with new issues.

I agree. If someone wants to pick out a specific piece from the discussions here, please open a new issue for that with a specific proposal for what you want to change (or at least specific usecases to focus on) so that we can have a less meandering discussion. :)


As always, thanks for a productive discussion here folks! Even though the conclusion here seems to be "no action, and more discussion", a lot of what has been discussed here is quite useful. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

11 participants