Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit array of table syntax #309

Closed
maxhaz opened this issue Mar 3, 2015 · 77 comments
Closed

Revisit array of table syntax #309

maxhaz opened this issue Mar 3, 2015 · 77 comments

Comments

@maxhaz
Copy link

maxhaz commented Mar 3, 2015

I found the type "array of table" not especially easy to grasp in TOML. This is because the syntax of an array of table is different from an array of, say, integer. It is not the case in JSON. So while I found TOML clearer than JSON regarding table, I found JSON easier to understand regarding array of table.

In the end, I was wondering if the type "array of table" was absolutely necessary in a config file. My point is that the way the data are stored (table of table vs. array of table) might be of low interest for the end-user that just want to modify some parameters.

I understand that, compared to "table of table", "array of table" has:

  • ordered elements
  • unnamed elements.

Among actual TOML usage, is there a situation where "array of table" is much more efficient than "table of table" ?

To be a little more specific, here are the comparison of both syntax (not exactly similar of course):
Array of table:

[[products]]
    name = "Hammer"
    sku = 738594937

[[products]]

[[products]]
    name = "Nail"
    sku = 284758393

Table of table:

[product.hammer]
    name = "Hammer"
    sku = 738594937

[product.empty]

[products.Nail]
    name = "Nail"
    sku = 284758393
    color = "gray"

IMO, the advantage of using table of table only are:

  • a unique syntax (less confusing)
  • each table has a name (less need of referring to the manual)

My question is certainly not if it is relevant to remove the array of table from the specs, but to see if a best practice could encourage to avoid array of table against table of table.

@skystrife
Copy link
Contributor

Arrays of tables are most useful when you don't know ahead of time how many things will be present in the array. If you know you're only ever going to have three things, then I absolutely agree with you: you should just represent them as individual tables.

But that's not always the case. Here's a good, real-world use case: I want to allow my users to define a pipeline in their configuration file. I don't know ahead of time how many things could be in the pipeline (it could be 1, 2, 3, ...), but I do know that each thing in the pipeline might have an arbitrarily complex initialization process (so I would like to have a table for each element in the pipeline so as to be able to flexibly specify each element's parameters). Using an array of tables is the most natural thing here, giving you things like

[[analyzers.filter]]
type = "icu-tokenizer"

[[analyzers.filter]]
type = "lowercase"

[[analyzers.filter]]
type = "length"
min = 2
max = 35

This isn't possible with tables of tables (you very well may lose the ordering of the filters, which is really important in a pipeline, depending on your parser's internal storage implementation). I don't think there's a generic guideline against arrays of tables, other than to think about your data types and make your configuration match. Here, what I'm asking people to configure is indeed an ordered list of things, so it makes sense to represent that as an ordered list inside my configuration file.

@Zero3
Copy link

Zero3 commented Jun 22, 2015

+1 to what @skystrife said. I use TOML for a similar purpose.

However, I also agree with @maxhaz about the table array confusion. I've been playing around with TOML for a while now, and I still find the syntax with the double braces oddly annoying. I don't have a better proposition right now though (besides merging the concepts of tables and table arrays, but that might prove difficult or require significant tradeoffs).

@maxhaz
Copy link
Author

maxhaz commented Jun 23, 2015

Thank you for both answers. This usage is quite convenient indeed, I agree.
I now see an array of table as a way to add instance of an object (in @skystrife example, an instance of a filter). Unless I am mistaken, a similar structure is widely used in xml config files.

Then, the available keys in an instance (e.g. type, min, max) could be defined in a doc or a schema (and self-documented by example in the toml file).

@avakar
Copy link

avakar commented Jun 23, 2015

@skystrife, to play the devil's advocate, I can easily rewrite your file without losing any information while avoiding arrays of tables.

[analyzers.filter.1]
type = "icu-tokenizer"

[analyzers.filter.2]
type = "lowercase"

[analyzers.filter.3]
type = "length"
min = 2
max = 35

Although this requires the user to explicitly number the tables, it also makes it possible to add properties to tables later (which you always could do with all non-array tables) and, if a smart sort was used, to insert tables into the middle of the array (e.g. if you sorted "1_1" in between "1" and "2" or alternatively you could number them in the good old Basic style, "10", "20" and then insert "15").

It's of course not surprising that you can simulate an array with a table (and vice versa), it's just that specifically in TOML, tables can be manipulated more easily than arrays and with more flexibility. And if TOML is to be a minimal format and if json->toml->json need not round-trip (which I think it already doesn't due to null), then I think @maxhaz has a point.

@jodastephen
Copy link

Arrays of tables look horrible, and would be the main thing pushing me away from using TOML. I think the concept is fine, but the syntax is poor.

Alternative 1:

[analyzers.filter]
  [#]
  type = "icu-tokenizer"

  [#]
  type = "lowercase"

  [#]
  type = "length"
  min = 2
  max = 35

Alternative 2:

[analyzers.filter]
  [#1]
  type = "icu-tokenizer"

  [#2]
  type = "lowercase"

  [#3]
  type = "length"
  min = 2
  max = 35

Alternative 3:

[analyzers.filter]#
type = "icu-tokenizer"

[analyzers.filter]#
type = "lowercase"

[analyzers.filter]#
type = "length"
min = 2
max = 35

(edited to add indentation, which would be optional)

@FranklinYu
Copy link

FranklinYu commented Apr 2, 2016

@jodastephen Alternative 3 is "comment at the end of a line"; the second is as good, but I prefer the first one because I can add something in between without incrementing every tag after it.

@dkrikun
Copy link

dkrikun commented Jun 14, 2016

+1 @jodastephen, the syntax for array of tables is indeed counter-intuitive

@yasammez
Copy link

I like variant 1 best too. Also the possibility mentioned by @FranklinYu to have multi-dimensional arrays of tables, which I will shamelessly copy and paste here:

[nested_array_table]
  [#]
    [##]
    value = 1
    [##]
    value = 0

  [#]
    [##]
    value = 0
    [##]
    value = 1
    comment = "bottom right diagonal element"

@dermariusz
Copy link

dermariusz commented Jun 26, 2016

However, alternative 3 enables also multi-dimensional arrays but I think alternative 1 is better.

[nested_array_table#]
    [nested_array_table##]
    value = 1
    [nested_array_table##]
    value = 0

[nested_array_table#]
    [nested_array_table##]
    value = 0
    [nested_array_table##]
    value = 1
    comment = "bottom right diagonal element"

Edit: Fixed Indentions

@FranklinYu
Copy link

@MDickie Hmm, you mean that the second [nested_array_table#] should be indented one step further than the first one?

@dermariusz
Copy link

dermariusz commented Jun 26, 2016

Oh, sorry I fix that. It should be agnostic of indentions, so that unfixed version should also work.

@FranklinYu
Copy link

It's interesting that GitHub currently renders it correctly, since nested_array_table# should not yet be a valid bare table name. For quoted table name in @MDickie's alternative, I guess we can do

[dog."tater.man"#]
    [dog."tater.man"##]
    value = 1
    [dog."tater.man"##]
    value = 0

[dog."tater.man"#]
    [dog."tater.man"##]
    value = 0
    [dog."tater.man"##]
    value = 1
    comment = "bottom right diagonal element"

@BurntSushi
Copy link
Member

We're on the cusp of 1.0. Arrays of table syntax isn't changing.

@yasammez
Copy link

So no multidimensional arrays of tables then? This would mean, everything which starts in JSON with

[[{

will still not be representable in TOML, which is kind of a pity.

@dejlek
Copy link

dejlek commented Jun 27, 2016

I propose you avoid usage of # for this... It is only going to make parsing complicated.

How about you use ? (or *)? Example:

[analyzers.filter.?]
type = "icu-tokenizer"

[analyzers.filter.?]
type = "lowercase"

[analyzers.filter.?]
type = "length"
min = 2
max = 35

@dermariusz
Copy link

dermariusz commented Jun 27, 2016

A star would be reasonable because Markdown already uses them in lists.

@FranklinYu
Copy link

FranklinYu commented Jun 27, 2016

@dejlek I guess you mean that parser need to distinguish # in array, from # indicating the begin of comment? Then I prefer * over ? for same reason mentioned by @MDickie.

@mojombo
Copy link
Member

mojombo commented Jan 5, 2017

I'm also not madly in love with the current syntax for complicated scenarios, but it does an admiral job for simple ones. TOML 1.0 is imminent, so things aren't going to change at this point, but we can definitely talk about some changes in this area when it's time to think about 2.0.

@mojombo mojombo closed this as completed Jan 5, 2017
@Zero3
Copy link

Zero3 commented Jan 10, 2017

@mojombo I have full respect for your decision about this. I think it is a shame that you are closing this issue though, since it has not been solved and you are hiding/losing the useful information posted by the commenters above.

@mojombo
Copy link
Member

mojombo commented Jan 10, 2017

That's a fair point. I'll reopen and label appropriately.

@mojombo mojombo reopened this Jan 10, 2017
@silasdavis
Copy link

silasdavis commented Jan 28, 2017

The situation is particularly bad with recursive data structures. Take the following recursive go struct:

SinkConfig struct {
	Transform *TransformConfig
	Sinks     []*SinkConfig
	Output    *OutputConfig
}

Here's a TOML representation of a value in this recursive schema:

[Transform]
  TransformType = ""

[[Sinks]]
  [Sinks.Transform]
    TransformType = ""

  [[Sinks.Sinks]]
    [Sinks.Sinks.Transform]
      TransformType = ""

    [[Sinks.Sinks.Sinks]]
      [Sinks.Sinks.Sinks.Transform]
        TransformType = "Prune"
      [Sinks.Sinks.Sinks.Output]
        OutputType = "Stdout"
    [Sinks.Sinks.Output]
      OutputType = "Stderr"

Beautiful. Here the repetition of the array field name and it's ancestors really hurt readability. YAML does slightly better:

sinks:
- transform:
    transformtype: ""
  sinks:
  - transform:
      transformtype: ""
    sinks:
    - transform:
        transformtype: Prune
      output:
        outputtype: Stdout
    output:
      outputtype: Stderr

I understand it is a design aim of TOML to include the full path of keys to a table value, but for an arrays of tables the same path may appear not only at every element of the same array but at different locations in the file in different structures that share the same route. I think either it needs to include a specific index, which is verbose and annoying when editing file, or we have to lose the context when we enter an array of tables, so that the table naming looks like we started a new root, as if we are in a new TOML file.

This would look something like this (although note these are all 1-element table arrays):

[Transform]
  TransformType = ""

Sinks = [
  [Transform]
    TransformType = ""

  Sinks = [
    [Transform]
      TransformType = ""

    Sinks = [
      [Transform]
        TransformType = "Prune"
      [Output]
        OutputType = "Stdout"
    ]
    [Output]
      OutputType = "Stderr"
  ]
]

There could be a different/better syntax. But I think accepting that elements of a table array are anonymous is a way out of this ugliness for certain cases. Or at least to allow a context-free syntax...

@skystrife
Copy link
Contributor

Using inline tables almost gets you there:

Sinks = {Transform = {TransformType = ""}, Sinks = [
  {Transform = {TransformType = ""}, Sinks = [
    {Transform = {TransformType = ""}, Sinks = [
      {Transform = {TransformType = "Prune"}, Output = {OutputType = "Stdout"}}
    ], Output = {OutputType = "Stderr"}}
  ]}
]}

but I agree this is overly ugly and is a sort of hacky workaround for the "inline tables must have no newlines" rule. If you relax that and allow multi-line inline tables, you can get the following:

Sinks = {
  Transform = {TransformType = ""},
  Sinks = [{
    Transform = {TransformType = ""},
    Sinks = [{
      Transform = {TransformType = ""},
      Sinks = [{
        Transform = {TransformType = "Prune"},
        Output = {OutputType = "Stdout"}
      }],
      Output = {OutputType = "Stderr"}
    }]
  }]
}

which I think, while still ugly, is at least serviceable.

@mbyio
Copy link

mbyio commented Jan 29, 2017

So I don't really understand why people are strongly against the array of tables syntax, or why they would prefer to use # symbols. To me, it's simple, easy to read, and easy to write.

While it is unfortunate that it might require some explanation before people know what the double bracket syntax means when reading a config file, reading and understanding the whole TOML spec still only takes 5-10 minutes, which IMO is good enough that it doesn't really need to be immediately understandable. Especially since it's a relatively niche use case which most people can just ignore anyway.

@silasdavis
Copy link

silasdavis commented Jan 29, 2017

@skystrife you're quite right that does get enough of the way there for me, particularly with relaxed newlines.

@michael-younkin the syntax is not the issue. The issue is that as soon as the key.subkey.subsubkey identifiers become ambiguous as in nested arrays then they lose their value and obfuscate rather than clarify where we are in the structure. I'm not suggesting '#' signs would be any better. And I also don't think that it's a niche use case when TOML is thought of as a JSON/YAML substitute and recursive data structures are frequently used.

@BurntSushi
Copy link
Member

TOML is though of as a JSON/YAML substitute and recursive data structures are frequently used.

It's not. TOML is a configuration file format. Sometimes, JSON or YAML are used for configuration files, so there are overlapping use cases. TOML is not a general purpose replacement for JSON or YAML.

@ChristianSi
Copy link
Contributor

I like @eksortso 's proposal, even if it was based on a misunderstanding. I think that enclosing just those dotted name parts that are actually tables in an additional pair of brackets is easy to grasp and read, and fairly easy to write.

@tw1nk
Copy link

tw1nk commented Feb 15, 2020

I might be completely off here, but instead of repetition why not have the separator be first characters? At least for me it makes it a bit easier to read, even if I have to backtrack

[analyzers]
    [[.filter]]
        type = "icu-tokenizer"
    [[.filter]]
        type = "lowercase"
    [[.filter]]
        type = "length"
        min = 2
        max = 35

[Transform]
    TransformType = ""
    
[[Sinks]]
    [.Transform]
        TransformType = ""
        
    [[.Sinks]]
        [.Output]
            OutputType = "Stderr"
        [.Transform]
            TransformType = ""
        [[.Sinks]]
            [.Transform]
                TransformType = "Prune"
            [.Output]
                OutputType = "Stdout"              


[[fruit]]  # 1st fruit element
  name = "apple"

  [.physical]  # physical subtable, in 1st fruit element
    color = "red"
    shape = "round"

  [[.variety]]  # 1st variety element, in 1st fruit
    name = "red delicious"

  [[.variety]]  # 2nd variety in 1st fruit
    name = "granny smith"

[[fruit]]  # 2nd fruit
  name = "banana"

  [[.variety]]  # 1st variety in 2nd fruit
    name = "plantain"

[foo]
    [[.bar]]
    a = 1
    [[.bar]]
    b = 2

@tw1nk
Copy link

tw1nk commented Feb 15, 2020

oh... i see the problem now... :(

it's impossible to know if the varietyis a property of fruit or fruis.physical :(

@eksortso
Copy link
Contributor

@tw1nk That is true. Folks have suggested these sorts of nested dot notations before, but each variant raises this sort of confusion.

But the repetition of keys serves an actual purpose. Over time I've come to accept the idea that, even though deep nesting in TOML is possible, the syntax encourages flattening complex data structures. A relatively flat, hand-written configuration structure makes sense. Deeply nested data types are another story though. So the amount of name repetition is acceptable for practical concerns, even for simple data exchange.

@nixpulvis
Copy link

nixpulvis commented Mar 16, 2020

Let me see if I'm on the right track here.

For reasons unrelated to this issue, I've decided to set package.autoexamples to false in my Cargo.toml. In Rust's package manager, this means that I now must create an array of tables for each example. So I currently have the following:

[[example]]
name = "gcd"

[[example]]
name = "merge_sort"

[[example]]
name = "quick_sort"

# ...

However, this is quite ugly, as others here have noted. I also understand that TOML has both arrays, and inline tables, which immediately made me think I could implement this in a more natural way already, with something like:

example = [
    { name = "gcd" },
    { name = "merge_sort" },
    { name = "quick_sort" },
    # ...
]

However, there are seemingly two issues, one preventing me from doing this at all, and one minor naming thing:

  1. This doesn't work because I can't seem to assign to the top level (is this a correct understanding of the situation?)
  2. The plurality of the name is now wrong (although depending on convention, this may be a Cargo issue)

I think I'd be perfectly happy with TOML's arrays of tables if I could just use arrays of inline tables like this at the top level. Thoughts?

@marzer
Copy link
Contributor

marzer commented Mar 16, 2020

@nixpulvis The TOML given in your examples should be valid, and should result in the same data structures. There's only going to be issues assigning if you mix methods, e.g.:

example = [
    { name = "merge_sort" },
    { name = "quick_sort" },
]

[[example]] # boom
name = "gcd"

Mind you, by "your examples should be valid", I mean "should" as per the spec. Some parsers treat inline tables vs. regular tables differently, same for [[array of tables]] vs [{table}, {table}]. Ideally they shouldn't be treated differently (because that runs counter to TOML's unambiguous design), but YMMV. I've no idea how cargo handles these things but it might just be a Quality-Of-Implementation thing.

@nixpulvis
Copy link

nixpulvis commented Mar 16, 2020

@marzer interesting, I didn't try putting the example literally at the top level. 🤣 When I put it before everything else it works!

The issue is now that when I do example = [...] anywhere after a [...] (as I very much would like to do) it treats it as an entry into that table.

I do not have a solution offhand, but at least in Rust this is very close to what I want.

@marzer
Copy link
Contributor

marzer commented Mar 16, 2020

Ah, well then what you're experiencing is correct TOML behaviour. [tables] and [[array tables]] effectively delimit sections of the document; whatever appears underneath them is a part of them, except another [table] or [[array table]], which starts a new section. A key = value pair will always be a member of whatever the current 'section' is, so the only way to have stuff be a member of the top level of the document (the 'root' table) is to simply list it before any other table headers.

So really the only solution is to re-structure your document.

@nixpulvis
Copy link

@marzer global state bites again ☹️

I generally really like TOML, however this is unfortunate. I think I'd personally solve this with commas and semicolons. For example:

[package]
name = "foo"
version = "0.0.1"

would become:

[package]
name = "foo",
version = "0.0.1";

Although, a bad parser may make this confusing to people, I can imagine.

It just really sucks that I'm forced to move my array to the very top of my document, just because I want to change the format I write it in. This is counterintuitive, and forces a poor configuration structure upon me.

@SoniEx2
Copy link

SoniEx2 commented Mar 16, 2020

I'm half tempted to suggest [] as the "root":

[package]
name = "foo"

[]
example = [
...
]

[thing]
etc

but I think this is invalid toml:

[package]
name = "foo"

[other]
thing = "bar"

[package]
version = "0.0.0"

edit: another option would be to allow all bare keys to go at the end of the TOML, using a separator similar to markdown's hr:

[stuff]
[things]
---
extra = {}

but disallow headers there. and if this is used, you can't have bare keys at the start.

@marzer
Copy link
Contributor

marzer commented Mar 16, 2020

What? It's not counter-intuitive at all. Things belong to whatever header they appear under, which is how headings generally work in just about any type of document ever.

It's true that if you're going for a more JSON-like representation then it's a bit awkward in TOML, but that's because TOML is meant to be 'flat'. If you fight against that it will get complex, but that's true of all formats- trying to make them something they're not meant to be is asking for trouble. If you think of TOML more like "INI but less shit" you will have an easier time with it.

@nixpulvis
Copy link

Forcing some keys to be at the top (for stylistic reasons), is very counterintuitive to me. I mean, it makes sense when you think about the details of TOML, but it's not how one would expect a config format to behave in my opinion.

Perhaps a better word would be, counterproductive, or just gross.

@marzer
Copy link
Contributor

marzer commented Mar 16, 2020

It's not arbitrarily "forcing some keys to be at the top", it's just the top-level keys go literally at the top-level of the document.

@mkerost
Copy link

mkerost commented Oct 22, 2020

The current syntax is the only thing I find unintuitive about TOML. What about:

[products[]]
    name = "Hammer"
    sku = 738594937

[products[]]

[products[]]
    name = "Nail"
    sku = 284758393

This is similar to array initialization in many languages and would at least give people some hint of what is going on here. PHP looks to be the only language that uses this syntax for appending to arrays.

@cxw42
Copy link

cxw42 commented Oct 22, 2020

I was thinking about this just the other day myself :) . Along similar lines, but with different syntax, what about a verb-noun structure in section headers? E.g.:

[next products]
name=foo
[next products]
name=bar

resulting in products = [{name => "foo"}, {name => "bar"}].

next ... would add an array element. We could catch errors this way, too. For example:

[product]
foo=bar   # now product is a table {foo=>'bar'}
[next product]
bat=baz   # now product is an array of tables [{foo=>'bar'}, {bat=>'baz'}]
# ... much later
[product]   # fatal error: trying to turn an array back into a single table
            # Issue an error message says "Please use '[next product]' to add to the 'product' array"

This would also make room in the syntax for future expansion, by expanding the verb set.

(Apologies if someone already suggested this and I missed it in my review of the thread!)

@ChristianSi
Copy link
Contributor

ChristianSi commented Oct 25, 2020

@mkerost: I don't see that as an improvement. The syntax would be very similar to the current one and it would be harder to remember than the simple rule: "just double the opening and closing bracket". Also, every [products[]] does not initialize an array, but adds a member to it. I don't think that [] is used for that purpose in any (reasonable) programming language – except for PHP, as you say, but

$cart[] = "foo";  // add "foo" to $cart

is terrible and certainly not a model to follow!! is strange and doesn't suggest itself as a good and intuitive model to follow.

@ChristianSi
Copy link
Contributor

ChristianSi commented Oct 25, 2020

@cxw42: Your proposal is appreciated, but I'd say it's bad for several reasons. First, it makes arrays of tables look like tables:

[product]  # This seems to be a table
foo="bar"

But later (maybe much later) in the same document:

[next product]  # But now it has been turned into an array. SURPRISE SURPRISE!!!
bat="baz"

Also, TOML is not a programming language and should not look like one. Hence no keywords, please!

Finally, keywords would tie TOML to one specific natural language (English), but it should be language-neutral.

@mkerost
Copy link

mkerost commented Oct 25, 2020

@ChristianSi : You summed up the downside to this approach, but terrible is in the eye of the beholder. I find double brackets surrounding a key to be "terrible" because there is absolutely no intuition what it means. My proposal, to quote myself, "at least give people some hint of what is going on here".

The current table array syntax and any alternative table array syntax that doesn't use 0,1,2...n labeling is never going to be completely intuitive. The reason is single bracket table keys refer to a single thing and can only be defined once, while table array keys refer to multiple things and will be defined identically multiple times. All new people will look at this syntax confusingly and need to go to the TOML reference guide to understand what is going on and why some table definitions can be defined once while other table definitions can be defined multiple times.

The difference here is that, for the current double bracket syntax, this is completely novel looking and there is nothing a programmer has to go off of in its relation to other programming languages to remember what it means. If anything, double brackets looks like a templating/substitution syntax and not related to arrays. So, I'll understand for a minute what it means, but it's likely that I'll come back a week later and have forgotten, because novel patterns are harder to put into long term memory.

With the syntax I proposed, most programmers will understand the syntax has something to do with arrays. You are right that they may be confused when they see this syntax used multiple times ("hey wait, you can only initialize something once..."). But like I said before, table syntax will never be completely intuitive. A person will always need to go to the TOML reference guide to be certain about what the syntax means. At least with my proposal, the syntax conjures association with arrays and offers a foothold into remembering what it means.

I am only offering my outsider thoughts here and don't mean to get in a back and forth. I've put as much as I want into my argument and am quite OK if you think it has major holes or there's just no way to get around the syntax feeling "terrible". If you feel this way, I don't think it is a good use of your time beyond just saying "nope, terrible".

@eksortso
Copy link
Contributor

@mkerost Thanks for sharing your thoughts on this. I can certainly sympathize with any effort to make table arrays more approachable. I've never had to deal with arrays in PHP, so I'm not allergic to a postfix-[] syntax to introduce an array element, just so you know.

But the use of double brackets, in the context of the rest of TOML, does makes sense, and users can differentiate between single and double brackets. So I can no longer recommend making an effort to refine the existing syntax when it already does what it should be doing.

One objection you have is having to go back to the reference. I don't believe that users would go back to the reference to remember what double brackets do if they've seen them before. But they could. There's no shame in looking things up if they're not familiar. Just now I went to the spec and found the first instance of [[. It took me straight to the Array of Tables section. That's the essence of Obviousness. In fact, it's our job to make that spec so clear that once you've looked something up, it sticks. I'll come back to that.

The problem is, when dealing with more complicated concepts, we can only make things so clear. A complex data structure, to a newcomer, would need to be revisited from time to time to be fully understood, no matter what. With repeat exposure and with repeated usage, that complex form becomes commonplace, and the pain goes away. But that pain won't go away any faster if we switched to a different syntax. The current syntax can do this job alright. And if we keep hashing out new syntax to use for this complex concept when there's already sharply defined syntax for it then, well, all we're doing is bikeshedding.

I could be wrong. But arrays of tables can be described to users in a way that they can understand what they do and how they work. Maybe that's where we could use some help. If you've got some ideas for describing table arrays more succinctly in the documentation, we'd love for you to share them with us. An alternate syntax won't help much, but an alternate description sure could.

@pradyunsg
Copy link
Member

My understanding is that the foundational problem here is that folks are trying to use the array of tables syntax for deeply nested data structures.

This has been discussed in #781, which has also laid out a path forward for addressing this. I don't think we'd be changing the array-of-table syntax now, but there is ongoing discussions on providing a better way to represent nested data structures.

@brandonchinn178
Copy link

brandonchinn178 commented Jul 24, 2022

@pradyunsg I apologize in advance for commenting on a closed issue, but one thing I don't think I've seen mentioned in any of these issues yet is specifically arrays-of-table. To avoid commenting on two closed issues, this proposal involves both this issue and #781/#744. Just wanted to have this documented in the discussion; don't necessarily need any followup.

Alias existing table

[foo.bar.baz]
test = "asdf"
[[foo.bar.baz.servers]]
host = "google.com"
[[foo.bar.baz.servers]]
host = "github.com"
[[foo.bar.baz.servers]]
host = "apple.com"

With #516, it'd be a little better:

[foo.bar.baz]
test = "asdf"
servers = [
  {
    host = "google.com",
    ...
  },
  { host = "github.com" },
  { host = "apple.com" },
]

which is fine in this case, but I liked that the original version separated out each array element as an independent section, especially if the tables per array are much bigger. I like how the solution in #744 would keep these as separate sections

[foo.bar.baz]
test = "asdf"
[[*.servers]]
host = "google.com"
[[*.servers]]
host = "github.com"
[[*.servers]]
host = "apple.com"

[[a]] vs [a[]]

I think this discussion is important separate from inline-tables-solving-nested-structures; this discussion is also useful in situations that are not necessarily deeply nested. For example, I'm trying to encode an if statement like:

[[if]]
  condition = "..."
  [if.then]
  result = "..."
  [[if.elif]]
  condition = "..."
  result = "..."
  [[if.elif]]
  condition = "..."
  result = "..."
  [if.else]
  result = "..."

I like that this keeps the then/elif/else blocks separate, but it's confusing that it goes single/double/single brackets in [if.then]/[[if.elif]]/[if.else]. IMO it's also confusing to see [[if]] then [if.then]; it's not intuitive to me personally that the single-bracketed section is nested inside the double-bracketed section. In general, the property denoted by single or double brackets only applies to the last key in the path, but since it wraps the entire path, it makes it seem like the single/double property applies to the whole path

With multiline inline tables:

if = [
  "...",
  {
    condition = "...",
    then = { result = "..." },
    elif = { condition = "...", result = "..." },
    elif = { condition = "...", result = "..." },
    else = { result = "..." },
  },
]

it's a bit better, but if there are more keys than just result, it can get more crowded. I also liked the separate sections, which this eliminates.

With different syntax:

[if[]]
condition = "..."
[if[].then]
result = "..."
[if[].elif[]]
condition = "..."
result = "..."
[if[].elif[]]
condition = "..."
result = "..."
[if[].else]
result = "..."

to me, this syntax is vastly superior, as it's very clear which keys in the path are arrays.

It's also possible to make it backwards-compatible with:

  1. Making intermediate []s in the path optional (e.g. after bar in foo.bar[].baz)
  2. [[a.b.c]] being sugar for [a.b.c[]]

@3ynm
Copy link

3ynm commented Oct 24, 2022

This would feel intuitive to me:

[[table]]
key = 1
other_key = 'a'

key = 2
other_key = 'a'

key = 3
other_key = 'a'

The good thing is that this syntax it isn't a breaking change, more an abbreviated form of the same syntax :)

@eksortso
Copy link
Contributor

eksortso commented Oct 25, 2022

@hacktivista But it is a breaking change. As soon as the key named key shows up a second time, an error for duplicated key definitions would be produced.

If it were adopted though, you'd still have problems. Blank lines outside of strings are ignored in TOML, so you'd be adding significance to how documents are spaced out. You couldn't put blank lines between key/value pairs intended for the same table.

In addition, this sort of syntax would make empty tables on the array impossible to define!

@3ynm
Copy link

3ynm commented Oct 25, 2022

Duplicated key definitions are currently an error, thus it wouldn't break existing usage.

Indeed you couldn't put blank lines within a single array of tables. That is indeed a breaking change. Never thought of that :(

Regarding empty tables impossible to define, you still would be able, because old syntax would still be available. This is just a "shortcut".

@robertlagrant
Copy link

to me, this syntax is vastly superior, as it's very clear which keys in the path are arrays.

To resolve this, what do think of using an array of inline tables? So your syntax

if = [
  "...",
  {
    condition = "...",
    then = { result = "..." },
    elif = { condition = "...", result = "..." },
    elif = { condition = "...", result = "..." },
    else = { result = "..." },
  },
]

becomes

if = [
  "...",
  {
    condition = "...",
    then = { result = "..." },
    elif = [ { condition = "...", result = "..." },
             { condition = "...", result = "..." } ]
    else = { result = "..." },
  },
]

@brandonchinn178
Copy link

I actually dont remember why I suggested two elif keys in the same table. I probably meant your second snippet. And it'd certainly be nicer than what we have now.

But my comment continued on to say that I would prefer the sections were still kept separate, as it would get crowded in one inline array/table, especially if the table had more keys than just "result".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests