-
Notifications
You must be signed in to change notification settings - Fork 856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revisit array of table syntax #309
Comments
Arrays of tables are most useful when you don't know ahead of time how many things will be present in the array. If you know you're only ever going to have three things, then I absolutely agree with you: you should just represent them as individual tables. But that's not always the case. Here's a good, real-world use case: I want to allow my users to define a pipeline in their configuration file. I don't know ahead of time how many things could be in the pipeline (it could be 1, 2, 3, ...), but I do know that each thing in the pipeline might have an arbitrarily complex initialization process (so I would like to have a table for each element in the pipeline so as to be able to flexibly specify each element's parameters). Using an array of tables is the most natural thing here, giving you things like [[analyzers.filter]]
type = "icu-tokenizer"
[[analyzers.filter]]
type = "lowercase"
[[analyzers.filter]]
type = "length"
min = 2
max = 35 This isn't possible with tables of tables (you very well may lose the ordering of the filters, which is really important in a pipeline, depending on your parser's internal storage implementation). I don't think there's a generic guideline against arrays of tables, other than to think about your data types and make your configuration match. Here, what I'm asking people to configure is indeed an ordered list of things, so it makes sense to represent that as an ordered list inside my configuration file. |
+1 to what @skystrife said. I use TOML for a similar purpose. However, I also agree with @maxhaz about the table array confusion. I've been playing around with TOML for a while now, and I still find the syntax with the double braces oddly annoying. I don't have a better proposition right now though (besides merging the concepts of tables and table arrays, but that might prove difficult or require significant tradeoffs). |
Thank you for both answers. This usage is quite convenient indeed, I agree. Then, the available keys in an instance (e.g. |
@skystrife, to play the devil's advocate, I can easily rewrite your file without losing any information while avoiding arrays of tables. [analyzers.filter.1]
type = "icu-tokenizer"
[analyzers.filter.2]
type = "lowercase"
[analyzers.filter.3]
type = "length"
min = 2
max = 35 Although this requires the user to explicitly number the tables, it also makes it possible to add properties to tables later (which you always could do with all non-array tables) and, if a smart sort was used, to insert tables into the middle of the array (e.g. if you sorted "1_1" in between "1" and "2" or alternatively you could number them in the good old Basic style, "10", "20" and then insert "15"). It's of course not surprising that you can simulate an array with a table (and vice versa), it's just that specifically in TOML, tables can be manipulated more easily than arrays and with more flexibility. And if TOML is to be a minimal format and if json->toml->json need not round-trip (which I think it already doesn't due to null), then I think @maxhaz has a point. |
Arrays of tables look horrible, and would be the main thing pushing me away from using TOML. I think the concept is fine, but the syntax is poor. Alternative 1:
Alternative 2:
Alternative 3:
(edited to add indentation, which would be optional) |
@jodastephen Alternative 3 is "comment at the end of a line"; the second is as good, but I prefer the first one because I can add something in between without incrementing every tag after it. |
+1 @jodastephen, the syntax for array of tables is indeed counter-intuitive |
I like variant 1 best too. Also the possibility mentioned by @FranklinYu to have multi-dimensional arrays of tables, which I will shamelessly copy and paste here: [nested_array_table]
[#]
[##]
value = 1
[##]
value = 0
[#]
[##]
value = 0
[##]
value = 1
comment = "bottom right diagonal element" |
However, alternative 3 enables also multi-dimensional arrays but I think alternative 1 is better. [nested_array_table#]
[nested_array_table##]
value = 1
[nested_array_table##]
value = 0
[nested_array_table#]
[nested_array_table##]
value = 0
[nested_array_table##]
value = 1
comment = "bottom right diagonal element" Edit: Fixed Indentions |
@MDickie Hmm, you mean that the second |
Oh, sorry I fix that. It should be agnostic of indentions, so that unfixed version should also work. |
It's interesting that GitHub currently renders it correctly, since [dog."tater.man"#]
[dog."tater.man"##]
value = 1
[dog."tater.man"##]
value = 0
[dog."tater.man"#]
[dog."tater.man"##]
value = 0
[dog."tater.man"##]
value = 1
comment = "bottom right diagonal element" |
We're on the cusp of 1.0. Arrays of table syntax isn't changing. |
So no multidimensional arrays of tables then? This would mean, everything which starts in JSON with [[{ will still not be representable in TOML, which is kind of a pity. |
I propose you avoid usage of How about you use
|
A star would be reasonable because Markdown already uses them in lists. |
I'm also not madly in love with the current syntax for complicated scenarios, but it does an admiral job for simple ones. TOML 1.0 is imminent, so things aren't going to change at this point, but we can definitely talk about some changes in this area when it's time to think about 2.0. |
@mojombo I have full respect for your decision about this. I think it is a shame that you are closing this issue though, since it has not been solved and you are hiding/losing the useful information posted by the commenters above. |
That's a fair point. I'll reopen and label appropriately. |
The situation is particularly bad with recursive data structures. Take the following recursive go struct: SinkConfig struct {
Transform *TransformConfig
Sinks []*SinkConfig
Output *OutputConfig
} Here's a TOML representation of a value in this recursive schema: [Transform]
TransformType = ""
[[Sinks]]
[Sinks.Transform]
TransformType = ""
[[Sinks.Sinks]]
[Sinks.Sinks.Transform]
TransformType = ""
[[Sinks.Sinks.Sinks]]
[Sinks.Sinks.Sinks.Transform]
TransformType = "Prune"
[Sinks.Sinks.Sinks.Output]
OutputType = "Stdout"
[Sinks.Sinks.Output]
OutputType = "Stderr" Beautiful. Here the repetition of the array field name and it's ancestors really hurt readability. YAML does slightly better: sinks:
- transform:
transformtype: ""
sinks:
- transform:
transformtype: ""
sinks:
- transform:
transformtype: Prune
output:
outputtype: Stdout
output:
outputtype: Stderr I understand it is a design aim of TOML to include the full path of keys to a table value, but for an arrays of tables the same path may appear not only at every element of the same array but at different locations in the file in different structures that share the same route. I think either it needs to include a specific index, which is verbose and annoying when editing file, or we have to lose the context when we enter an array of tables, so that the table naming looks like we started a new root, as if we are in a new TOML file. This would look something like this (although note these are all 1-element table arrays): [Transform]
TransformType = ""
Sinks = [
[Transform]
TransformType = ""
Sinks = [
[Transform]
TransformType = ""
Sinks = [
[Transform]
TransformType = "Prune"
[Output]
OutputType = "Stdout"
]
[Output]
OutputType = "Stderr"
]
] There could be a different/better syntax. But I think accepting that elements of a table array are anonymous is a way out of this ugliness for certain cases. Or at least to allow a context-free syntax... |
Using inline tables almost gets you there: Sinks = {Transform = {TransformType = ""}, Sinks = [
{Transform = {TransformType = ""}, Sinks = [
{Transform = {TransformType = ""}, Sinks = [
{Transform = {TransformType = "Prune"}, Output = {OutputType = "Stdout"}}
], Output = {OutputType = "Stderr"}}
]}
]} but I agree this is overly ugly and is a sort of hacky workaround for the "inline tables must have no newlines" rule. If you relax that and allow multi-line inline tables, you can get the following: Sinks = {
Transform = {TransformType = ""},
Sinks = [{
Transform = {TransformType = ""},
Sinks = [{
Transform = {TransformType = ""},
Sinks = [{
Transform = {TransformType = "Prune"},
Output = {OutputType = "Stdout"}
}],
Output = {OutputType = "Stderr"}
}]
}]
} which I think, while still ugly, is at least serviceable. |
So I don't really understand why people are strongly against the array of tables syntax, or why they would prefer to use # symbols. To me, it's simple, easy to read, and easy to write. While it is unfortunate that it might require some explanation before people know what the double bracket syntax means when reading a config file, reading and understanding the whole TOML spec still only takes 5-10 minutes, which IMO is good enough that it doesn't really need to be immediately understandable. Especially since it's a relatively niche use case which most people can just ignore anyway. |
@skystrife you're quite right that does get enough of the way there for me, particularly with relaxed newlines. @michael-younkin the syntax is not the issue. The issue is that as soon as the |
It's not. TOML is a configuration file format. Sometimes, JSON or YAML are used for configuration files, so there are overlapping use cases. TOML is not a general purpose replacement for JSON or YAML. |
I like @eksortso 's proposal, even if it was based on a misunderstanding. I think that enclosing just those dotted name parts that are actually tables in an additional pair of brackets is easy to grasp and read, and fairly easy to write. |
I might be completely off here, but instead of repetition why not have the separator be first characters? At least for me it makes it a bit easier to read, even if I have to backtrack
|
oh... i see the problem now... :( it's impossible to know if the |
@tw1nk That is true. Folks have suggested these sorts of nested dot notations before, but each variant raises this sort of confusion. But the repetition of keys serves an actual purpose. Over time I've come to accept the idea that, even though deep nesting in TOML is possible, the syntax encourages flattening complex data structures. A relatively flat, hand-written configuration structure makes sense. Deeply nested data types are another story though. So the amount of name repetition is acceptable for practical concerns, even for simple data exchange. |
Let me see if I'm on the right track here. For reasons unrelated to this issue, I've decided to set [[example]]
name = "gcd"
[[example]]
name = "merge_sort"
[[example]]
name = "quick_sort"
# ... However, this is quite ugly, as others here have noted. I also understand that TOML has both arrays, and inline tables, which immediately made me think I could implement this in a more natural way already, with something like: example = [
{ name = "gcd" },
{ name = "merge_sort" },
{ name = "quick_sort" },
# ...
] However, there are seemingly two issues, one preventing me from doing this at all, and one minor naming thing:
I think I'd be perfectly happy with TOML's arrays of tables if I could just use arrays of inline tables like this at the top level. Thoughts? |
@nixpulvis The TOML given in your examples should be valid, and should result in the same data structures. There's only going to be issues assigning if you mix methods, e.g.: example = [
{ name = "merge_sort" },
{ name = "quick_sort" },
]
[[example]] # boom
name = "gcd" Mind you, by "your examples should be valid", I mean "should" as per the spec. Some parsers treat inline tables vs. regular tables differently, same for |
@marzer interesting, I didn't try putting the The issue is now that when I do I do not have a solution offhand, but at least in Rust this is very close to what I want. |
Ah, well then what you're experiencing is correct TOML behaviour. So really the only solution is to re-structure your document. |
@marzer global state bites again I generally really like TOML, however this is unfortunate. I think I'd personally solve this with commas and semicolons. For example: [package]
name = "foo"
version = "0.0.1" would become: [package]
name = "foo",
version = "0.0.1"; Although, a bad parser may make this confusing to people, I can imagine. It just really sucks that I'm forced to move my array to the very top of my document, just because I want to change the format I write it in. This is counterintuitive, and forces a poor configuration structure upon me. |
I'm half tempted to suggest [package]
name = "foo"
[]
example = [
...
]
[thing]
etc but I think this is invalid toml: [package]
name = "foo"
[other]
thing = "bar"
[package]
version = "0.0.0" edit: another option would be to allow all bare keys to go at the end of the TOML, using a separator similar to markdown's hr:
but disallow headers there. and if this is used, you can't have bare keys at the start. |
What? It's not counter-intuitive at all. Things belong to whatever header they appear under, which is how headings generally work in just about any type of document ever. It's true that if you're going for a more JSON-like representation then it's a bit awkward in TOML, but that's because TOML is meant to be 'flat'. If you fight against that it will get complex, but that's true of all formats- trying to make them something they're not meant to be is asking for trouble. If you think of TOML more like "INI but less shit" you will have an easier time with it. |
Forcing some keys to be at the top (for stylistic reasons), is very counterintuitive to me. I mean, it makes sense when you think about the details of TOML, but it's not how one would expect a config format to behave in my opinion. Perhaps a better word would be, counterproductive, or just gross. |
It's not arbitrarily "forcing some keys to be at the top", it's just the top-level keys go literally at the top-level of the document. |
The current syntax is the only thing I find unintuitive about TOML. What about:
This is similar to array initialization in many languages and would at least give people some hint of what is going on here. PHP looks to be the only language that uses this syntax for appending to arrays. |
I was thinking about this just the other day myself :) . Along similar lines, but with different syntax, what about a verb-noun structure in section headers? E.g.:
resulting in
This would also make room in the syntax for future expansion, by expanding the verb set. (Apologies if someone already suggested this and I missed it in my review of the thread!) |
@mkerost: I don't see that as an improvement. The syntax would be very similar to the current one and it would be harder to remember than the simple rule: "just double the opening and closing bracket". Also, every $cart[] = "foo"; // add "foo" to $cart
|
@cxw42: Your proposal is appreciated, but I'd say it's bad for several reasons. First, it makes arrays of tables look like tables: [product] # This seems to be a table
foo="bar" But later (maybe much later) in the same document: [next product] # But now it has been turned into an array. SURPRISE SURPRISE!!!
bat="baz" Also, TOML is not a programming language and should not look like one. Hence no keywords, please! Finally, keywords would tie TOML to one specific natural language (English), but it should be language-neutral. |
@ChristianSi : You summed up the downside to this approach, but terrible is in the eye of the beholder. I find double brackets surrounding a key to be "terrible" because there is absolutely no intuition what it means. My proposal, to quote myself, "at least give people some hint of what is going on here". The current table array syntax and any alternative table array syntax that doesn't use 0,1,2...n labeling is never going to be completely intuitive. The reason is single bracket table keys refer to a single thing and can only be defined once, while table array keys refer to multiple things and will be defined identically multiple times. All new people will look at this syntax confusingly and need to go to the TOML reference guide to understand what is going on and why some table definitions can be defined once while other table definitions can be defined multiple times. The difference here is that, for the current double bracket syntax, this is completely novel looking and there is nothing a programmer has to go off of in its relation to other programming languages to remember what it means. If anything, double brackets looks like a templating/substitution syntax and not related to arrays. So, I'll understand for a minute what it means, but it's likely that I'll come back a week later and have forgotten, because novel patterns are harder to put into long term memory. With the syntax I proposed, most programmers will understand the syntax has something to do with arrays. You are right that they may be confused when they see this syntax used multiple times ("hey wait, you can only initialize something once..."). But like I said before, table syntax will never be completely intuitive. A person will always need to go to the TOML reference guide to be certain about what the syntax means. At least with my proposal, the syntax conjures association with arrays and offers a foothold into remembering what it means. I am only offering my outsider thoughts here and don't mean to get in a back and forth. I've put as much as I want into my argument and am quite OK if you think it has major holes or there's just no way to get around the syntax feeling "terrible". If you feel this way, I don't think it is a good use of your time beyond just saying "nope, terrible". |
@mkerost Thanks for sharing your thoughts on this. I can certainly sympathize with any effort to make table arrays more approachable. I've never had to deal with arrays in PHP, so I'm not allergic to a postfix- But the use of double brackets, in the context of the rest of TOML, does makes sense, and users can differentiate between single and double brackets. So I can no longer recommend making an effort to refine the existing syntax when it already does what it should be doing. One objection you have is having to go back to the reference. I don't believe that users would go back to the reference to remember what double brackets do if they've seen them before. But they could. There's no shame in looking things up if they're not familiar. Just now I went to the spec and found the first instance of The problem is, when dealing with more complicated concepts, we can only make things so clear. A complex data structure, to a newcomer, would need to be revisited from time to time to be fully understood, no matter what. With repeat exposure and with repeated usage, that complex form becomes commonplace, and the pain goes away. But that pain won't go away any faster if we switched to a different syntax. The current syntax can do this job alright. And if we keep hashing out new syntax to use for this complex concept when there's already sharply defined syntax for it then, well, all we're doing is bikeshedding. I could be wrong. But arrays of tables can be described to users in a way that they can understand what they do and how they work. Maybe that's where we could use some help. If you've got some ideas for describing table arrays more succinctly in the documentation, we'd love for you to share them with us. An alternate syntax won't help much, but an alternate description sure could. |
My understanding is that the foundational problem here is that folks are trying to use the array of tables syntax for deeply nested data structures. This has been discussed in #781, which has also laid out a path forward for addressing this. I don't think we'd be changing the array-of-table syntax now, but there is ongoing discussions on providing a better way to represent nested data structures. |
@pradyunsg I apologize in advance for commenting on a closed issue, but one thing I don't think I've seen mentioned in any of these issues yet is specifically arrays-of-table. To avoid commenting on two closed issues, this proposal involves both this issue and #781/#744. Just wanted to have this documented in the discussion; don't necessarily need any followup. Alias existing table[foo.bar.baz]
test = "asdf"
[[foo.bar.baz.servers]]
host = "google.com"
[[foo.bar.baz.servers]]
host = "github.com"
[[foo.bar.baz.servers]]
host = "apple.com" With #516, it'd be a little better: [foo.bar.baz]
test = "asdf"
servers = [
{
host = "google.com",
...
},
{ host = "github.com" },
{ host = "apple.com" },
] which is fine in this case, but I liked that the original version separated out each array element as an independent section, especially if the tables per array are much bigger. I like how the solution in #744 would keep these as separate sections [foo.bar.baz]
test = "asdf"
[[*.servers]]
host = "google.com"
[[*.servers]]
host = "github.com"
[[*.servers]]
host = "apple.com"
|
This would feel intuitive to me:
The good thing is that this syntax it isn't a breaking change, more an abbreviated form of the same syntax :) |
@hacktivista But it is a breaking change. As soon as the key named If it were adopted though, you'd still have problems. Blank lines outside of strings are ignored in TOML, so you'd be adding significance to how documents are spaced out. You couldn't put blank lines between key/value pairs intended for the same table. In addition, this sort of syntax would make empty tables on the array impossible to define! |
Duplicated key definitions are currently an error, thus it wouldn't break existing usage. Indeed you couldn't put blank lines within a single array of tables. That is indeed a breaking change. Never thought of that :( Regarding empty tables impossible to define, you still would be able, because old syntax would still be available. This is just a "shortcut". |
To resolve this, what do think of using an array of inline tables? So your syntax if = [
"...",
{
condition = "...",
then = { result = "..." },
elif = { condition = "...", result = "..." },
elif = { condition = "...", result = "..." },
else = { result = "..." },
},
] becomes if = [
"...",
{
condition = "...",
then = { result = "..." },
elif = [ { condition = "...", result = "..." },
{ condition = "...", result = "..." } ]
else = { result = "..." },
},
] |
I actually dont remember why I suggested two But my comment continued on to say that I would prefer the sections were still kept separate, as it would get crowded in one inline array/table, especially if the table had more keys than just "result". |
I found the type "array of table" not especially easy to grasp in TOML. This is because the syntax of an array of table is different from an array of, say, integer. It is not the case in JSON. So while I found TOML clearer than JSON regarding table, I found JSON easier to understand regarding array of table.
In the end, I was wondering if the type "array of table" was absolutely necessary in a config file. My point is that the way the data are stored (table of table vs. array of table) might be of low interest for the end-user that just want to modify some parameters.
I understand that, compared to "table of table", "array of table" has:
Among actual TOML usage, is there a situation where "array of table" is much more efficient than "table of table" ?
To be a little more specific, here are the comparison of both syntax (not exactly similar of course):
Array of table:
Table of table:
IMO, the advantage of using table of table only are:
My question is certainly not if it is relevant to remove the array of table from the specs, but to see if a best practice could encourage to avoid array of table against table of table.
The text was updated successfully, but these errors were encountered: