Possible positional restrictions on table definitions #446

alexcrichton · 2017-02-03T00:19:21Z

Hello! I'm currently in the process of rewriting a TOML parser and have come
across a curious aspect of TOML. I believe that a document such as this is
valid:

[a]
key = 1

[unrelated-table]
key = 1

[a.b]
key = 1

That is, you can define a table across many portions of a document. There's
certainly nothing wrong with this per-se, but it means that a parser decoding
this document is forced to keep all intermediate tables in-memory or in some
side data structure. For example if you'd like to decode the table a from
this document you're forced to parse unrelated-table and store it in memory
elsewhere to come back to later (if requested).

This is definitely a super niche concern, of course! TOML is intended for
configuration, which should never be something like gigabytes large. On the
other hand though this does make writing a parser a little bit more difficult
in some situations, and to me at least I'd subjectively find such a definition
relatively wonky.

I wonder, what would others think about adding a clause such as this to the
spec?

Although one table may be defined across multiple sections (if its keys are
themselves tables) all sections that define a table must be contiguous. For
example this is invalid:
[a]
key = 1

[unrelated-table]
key = 1

[a.b]
key = 1
Similarly, this is also invalid
[a.sub-table]
key = 1

[a.unrelated-table]
key = 1

[a.sub-table.b]
key = 1

This does make the otherwise very simple spec somewhat more complicated, and also somewhat more arbitrarily. If this were only about parser performance I don't think I'd bother with the suggestion, but subjectively interpreting this as odd prompted me to file this report!

The text was updated successfully, but these errors were encountered:

BurntSushi · 2017-02-03T00:22:57Z

I think probably the biggest concern I'd have with this is that it's a breaking change. It could make an otherwise valid TOML file invalid. :-(

alexcrichton · 2017-02-03T00:25:08Z

Ah yes good point, I forgot to mention that!

I wasn't quite sure whether the 0.4 -> 1.0 transition (assuming that happens) is just a clarification as opposed to a breaking release. If no breaking changes are being made then I think the ship has sailed!

lmna · 2017-02-03T22:18:11Z

Suggested change would make TOML more obvious and friendly to a human reader, and more consistent as well.

"Obvious". Table-followed-by-some-other-completely-unrelated-table is similar to a topic change in a talk: human would naturally conclude that her companion has nothing more to say on the first topic. Thus it is surprising that related parts of the first table are allowed to be scattered all over the document.
"Friendly". How would a human make sure that she has read all contents of a specific table? The easiest way is to find the first entry of the table and then read through the document till unrelated stuff. Unfortunately, this way is unreliable now, thus one has to use search in routine and excessive manner. Users of TOML 1.0 could be free of this annoyance!
"Consistent". Let’s take a look on example of invalid document from the spec:
```
 # Doing so is invalid
 
 [a]
 b = 1
 
 [a]
 c = 2
```
This example could be interpreted like "reopening a table is forbidden". So it would be consistent to forbid "reopening" a table with a sub-table as well.

pradyunsg · 2017-11-17T20:56:11Z

If TOML 1.0 shouldn't be breaking stuff that's already there in the wild, I suggest making this a post-1.0 item.

Personally, I like this idea and getting it in 1.0 might be a good idea too.

alexcrichton · 2017-11-25T14:52:18Z

@pradyunsg yeah I'd be totally fine making this post-1.0, I wouldn't consider this critical at all!

pradyunsg · 2017-11-29T06:23:39Z

@mojombo Your thoughts on this?

mojombo · 2017-12-06T05:33:02Z

I'd very much like to keep 1.0 backwards compatible with 0.4. In the likely event that dotted keys go in, allowing for this type of looseness might actually be desirable in some cases. I think we need more experience with it before we should lock it down, so let's roll with the status quo for now and re-evaluate for later versions.

pradyunsg · 2018-05-11T02:44:46Z

(noting this here so that I don't fail to mention it; this issue can be reopened as a post-1.0 issue)

I can imagine that TOML changing table tracking logic to be something like:

a table can be "open", "closed" or "unexplored".
- valid transitions: "unexplored" -> "open" -> "closed"
- i.e. "closed" a table can not be opened again
BOF opens root table, EOF closes any open tables
- root is the super-table for top level tables
- can open a new table only if it's super-table is open
a.b.c.d opens the table a, a.b, a.b.c (if they aren't already open)
a.b = {c = 1, d = 2} opens a, a.b and then immediately closes a.b because that's defined inline.

The exact same rules apply regardless of the use of dotted keys.

A demo:

# 'root' table is opened
key = "value"
assignment.first.key = "value"   # opens assignment, assignment.first
assignment.second.key = "value"  # closes assignment.first; opens assignment.second
# INVALID here: assignment.first.another = "value"
mytable.key = "value"  # closes assignment.second, assignment; opens mytable
# INVALID here: assignment.second.another = "value"
# closes mytable and 'root' table

eksortso · 2018-05-14T15:49:01Z

@pradyunsg How would "unexplored" table status be used? The only application that I could imagine it being used for are undefined supertables. That is to say, if a.b.c.d is specified before a, a.b and a.b.c are mentioned, then wouldn't it be clearer to say that a and a.b are unexplored, rather than open? (If a.b.c.d is a table, then we'd treat a.b.c the same way that we'd treat a and a.b.)

I do see the value in doing this sort of thing. Since tables can (currently) be specified in almost any order, an unexplored table status acknowledges those tables' existence without applying restrictions on their future application that an open status would suggest.

pradyunsg · 2018-05-15T13:44:22Z

How would "unexplored" table status be used?

Refers to all tables that hasn't been mentioned in the file yet.

That is to say, if a.b.c.d is specified before a, a.b and a.b.c are mentioned, then wouldn't it be clearer to say that a and a.b are unexplored, rather than open?

That'll make assignment.second.another in the example above valid. I'm not sure if that's preferable.

eksortso · 2018-05-15T21:39:07Z

@pradyunsg I don't see how. When assignment.second.key was previously defined, that implicitly opens the table assignment.second. When mytable.key is defined, then implicitly mytable is opened and assignment.second is closed. So assignment.second.another is still invalid there, according to the scoping logic.

I don't know if I've mentioned this before, but I like to think that a table's keys are always defined across a continuous range of lines (not counting parent or child tables). This logic provides for that. If it doesn't become an adopted scoping standard, then I would continue using it as a style preference.

Though I think you're proposing something that is more restrictive. You want it so that you "can open a new table only if its super-table is open." This would only allow subtables to be defined within the scope of their ancestors. I would only apply that rule to inline tables.

pradyunsg · 2018-05-16T08:21:44Z

This is what I get for responding late at night. I meant, "this would allow for an assignment.third.key where assignment.second.another is in the example above" (which might be an incorrect statement too).

That said, I guess we can hold off this discussion until after 1.0 is done with. :)

ChristianSi · 2018-05-24T16:32:35Z

I think it would be a VERY BAD idea to treat this is a "post-1.0" issue. It's a breaking change and hence it should happen as soon as possible – preferably before a new 0.x version is released, and CERTAINLY before 1.0. After 1.0 is released, more and more files might rely no such such restrictions being present, and there will be a great of deal of hesitation (and for good reasons!) about making any non-essential breaking changes.

If we want to insert such restrictions (and I DO think it's a good idea), we should do so SOON. Currently dotted keys are quite new and no tagged TOML version has them, so breakage will be low. But the longer we wait, the harder it becomes.

sgarciac · 2018-05-24T21:03:04Z

I think there are legitimate reasons to want to order tables in such way: you might want to prioritize access to some keys that belong to different tables by keeping them on top of the TOML file.

For example, think of the following configuration file for an hypothetical application that tests a web app:

[api_server]
hostname = 'api.dev.whatever.com'

[frontend_server]
hostname = 'www.dev.whatever.com'

[api_server.details]
auth_path = '.....'
ping_path = '.....'
# potentially very long list of details

[frontend_server.details]
homepage_path = '.....'
login_path = '.....'
# etc...

If I want to reuse the same file to test different deployment environments by changing only the hostname of the servers, it makes sense to keep that information on top of the file, separated from the rest of the server's configurations.

guai · 2018-05-25T17:46:48Z

I'd already create an issue about I could not rewrite an existing config into toml without reordering
I had toplevel declarations at the bottom which is also forbidden in toml right now
That's bad, my users got used to that order, switching to another syntax was not a problem at all, but reordering was.
It also is a step toward toml to be a drop-in replacement for json and yaml and a possibility to stream toml docs
So I totally pro this change

pradyunsg · 2018-06-04T05:12:45Z

@ChristianSi These restrictions would be backwards incompatible. The intention of TOML 1.0 being backwards compatible with 0.4 is noted in the README. This is why I labelled this as a post 1.0.

Personally, it's the sort of thing where if we make this change, it'd mandate a 0.5 release (because this is the sort of change you want feedback on) and then a 1.0. I do think that if we don't do a 0.5 (which is what I'd prefer), 1.0 should mention that "hey there'll be restrictions on these in the future", possibly with some guarantee about what would stay valid.

ChristianSi · 2018-06-05T11:20:31Z

@pradyunsg You're right, the proposed restrictions, when applied to [tablename]-tables ([]-tables) would break 0.4 files and so cannot go into 1.0 My intuition tells me that they won't go into 2.0 either, because the advantage they offer is just too small to justify breakage. Also, @sgarciac has given a reasonable example of why these restrictions might actually be bad idea.

What we could introduce now without introducing breakage are restrictions on the order of dotted keys, if we want to have those. And my intuition again tells me that we should introduce those now if we want to have them, for the same reason as given above.

So what we should decide now is whether to enforce such restrictions, say (informally): "All dotted keys that define a subtable must be placed together" – that would amount to what you have proposed above, but only for dotted keys, not for []-tables. Personally, I'm pretty undecided on whether we need such a restriction or whether we should just consider it a best practice – but I think anyone who wants it should speak out in favor of it now, since after the next tagged version is out, this too would become a breaking change and hence becomes considerably more unlikely to happen at all.

eksortso · 2018-06-05T21:37:16Z

All of the keys defining a table should be kept in one range of lines, in principle. That's the case with tables defined with header rows. That's the case with inline tables, which in nearly all cases take a single line. And that ought to be the case with dotted keys as well. This principle doesn't apply to subtables and non-root supertables, which can be defined in any order (and dotted key paths allow subtables to be defined in the middle of their parent tables' definitions). But for scalar values, arrays, and inline tables, all of their keys should be kept in one continuous range.

eksortso · 2018-06-05T21:38:04Z

Will restrictions be placed on defining subtables of tables defined using inline table values?

pradyunsg · 2018-06-06T20:17:48Z

What we could introduce now without introducing breakage are restrictions on the order of dotted keys, if we want to have those. And my intuition again tells me that we should introduce those now if we want to have them, for the same reason as given above.

My intuition is quite the opposite -- this would mean there's an inconsistency in the ability to declare using dotted keys vs tables and simple keys.

ChristianSi · 2018-06-07T12:39:03Z

@pradyunsg

this would mean there's an inconsistency in the ability to declare using dotted keys vs tables and simple keys.

Good point. And yet, consider this case:

mainkey1 = '...'
subtable.key1 = '...'
subtable.key2 = '...'
mainkey2 = '...'

That's legal with dotted keys and it will remain legal even with the restrictions you've proposed above. Indeed, it has to remain legal, since otherwise sorting all the keys in a table e.g. alphabetically would become impossible. But It's not possible using tables with simple keys.

Hence I'd say we don't have to worry about consistency all that much, since true consistency is impossible anyway.

pradyunsg · 2018-06-11T05:37:17Z

That's indeed an interesting case and a compelling argument. In that case, let's move the discussion for dotted-keys restrictions back to #499.

LongTengDao · 2019-08-17T05:07:55Z

Generally speaking, keys out of order is not wise.
@sgarciac had given a very good sample for proper use, so it should still be valid.
But in any case, below should be invalid, at least warn not to use in the 1.0 spec (then things like "after 1.0 is released, more and more files might rely no such restrictions being present, and there will be a great of deal of hesitationchanges" maybe won't happen), until the users found a good practice and tell us in the issue:

[[a]]
[b]
[[a]]

septatrix · 2023-09-24T00:38:57Z

I am not even sure what a sensible API could be implemented to allow for a TOML parser which does not keep all tables in memory. Returning a dict, object, or whatever it is called in your language of choice as most implementations do it would not work in this case as then all data would have to be kept in memory regardless. So I can think of two possible APIs:

It would be possible to implement an API which asks for specific keys and only then is the document parsed (e.g. toml.get(file, "foo.bar[1].baz"). This, however, is already possible today. One difference is that this restriction would allow for an early return once the table has been "closed" though doing so would in practice also permit parsing of invalid files such as those with keys defined twice. So no memory improvement and only a slight runtime optimization at the cost of a non-conformant implementation. And if one wants to use a non-conformant implementation they might as well already do so today.
An alternative option is to stream out the configuration though even then I am not sure how exactly this would look - you cannot stream out dictionaries because then you would still have to store the inner dictionaries in memory. So the only option I can think of is streaming out the fully qualified key together with its value (e.g. (foo.bar[1].hello, 0), (foo.bar[1].world, 1), ...). Again, we have no real advantage from "closing" a table early. We could signal this to the application which might perform some tasks earlier but that again is not a memory improvement, only a slight runtime optimization while simultaneously waiving the option to warn of invalid configurations when later on duplicate keys are found.
In some rather low-level languages and embedded use-cases there can be advantages from explicitly closing a dictionary, allowing for the removal of some temporary structures, freeing unused reserved space etc. However, these are quite rare and when you need to parse a config file anywhere close to a size where this would be problematic you already took a wrong turn somewhere else. Furthermore it is not uncommon in these cases to have a known schema which would already allow for optimizations in this direction - and often there are libraries in this area which consciously only implement a subset of specifications which could already be done today without hindering the rest of the ecosystem.

If someone can come up with a reasonable API which show that non-negligible memory or performance gains can be achieved with such a restriction I would be happy to hear about it. Until then this is mostly a subjective discussion about what feels more "human" and "obvious". For this aspect there is already a great argument in #446 (comment) of where out-of-order table definitions are desired and advantageous. So restricting this - at the additional cost of breaking compatibility - seems like a no-go to me.

scooter-dangle mentioned this issue Jun 5, 2017

Indentation does not mix well with directly entering arbitrarily nested tables #466

Closed

ordian mentioned this issue Aug 5, 2017

Enforce constraints toml-rs/toml#19

Closed

a-teammate mentioned this issue Oct 14, 2017

Add milestone 1.0 #482

Closed

26 tasks

pradyunsg mentioned this issue Nov 24, 2017

Allow keys in key-value pairs to be paths #499

Closed

ordian mentioned this issue Dec 5, 2017

Alternative AST design LeopoldArkham/Molten#29

Open

mojombo closed this as completed Dec 6, 2017

pradyunsg reopened this May 22, 2018

pradyunsg added post-1.0 new-syntax and removed new-syntax labels May 22, 2018

ehuss mentioned this issue Jul 27, 2018

How should different table types interact? matklad/tom#19

Open

ehuss mentioned this issue May 6, 2019

TOML 1.0: What all do we want to do? #617

Closed

LongTengDao mentioned this issue May 17, 2019

Shouldn't these revisit cases be invalid? #615

Closed

LongTengDao mentioned this issue Aug 18, 2019

Add advice to not define dotted keys out-of-order #646

Merged

Suhoy95 mentioned this issue Feb 14, 2020

Define appropriate implementation behaviors for extreme values #538

Closed

pradyunsg removed the post-1.0 label Apr 17, 2021

pradyunsg added the discussion label Mar 7, 2022

ChristianSi mentioned this issue Oct 28, 2022

TOML 1.1.0 #928

Open

pradyunsg added the backwards-incompatible label Aug 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible positional restrictions on table definitions #446

Possible positional restrictions on table definitions #446

alexcrichton commented Feb 3, 2017

BurntSushi commented Feb 3, 2017

alexcrichton commented Feb 3, 2017

lmna commented Feb 3, 2017 •

edited

Loading

pradyunsg commented Nov 17, 2017

alexcrichton commented Nov 25, 2017

pradyunsg commented Nov 29, 2017

mojombo commented Dec 6, 2017

pradyunsg commented May 11, 2018 •

edited

Loading

eksortso commented May 14, 2018

pradyunsg commented May 15, 2018

eksortso commented May 15, 2018

pradyunsg commented May 16, 2018 •

edited

Loading

ChristianSi commented May 24, 2018 •

edited

Loading

sgarciac commented May 24, 2018 •

edited

Loading

guai commented May 25, 2018 •

edited

Loading

pradyunsg commented Jun 4, 2018

ChristianSi commented Jun 5, 2018

eksortso commented Jun 5, 2018

eksortso commented Jun 5, 2018

pradyunsg commented Jun 6, 2018

ChristianSi commented Jun 7, 2018

pradyunsg commented Jun 11, 2018

LongTengDao commented Aug 17, 2019 •

edited

Loading

septatrix commented Sep 24, 2023

Possible positional restrictions on table definitions #446

Possible positional restrictions on table definitions #446

Comments

alexcrichton commented Feb 3, 2017

BurntSushi commented Feb 3, 2017

alexcrichton commented Feb 3, 2017

lmna commented Feb 3, 2017 • edited Loading

pradyunsg commented Nov 17, 2017

alexcrichton commented Nov 25, 2017

pradyunsg commented Nov 29, 2017

mojombo commented Dec 6, 2017

pradyunsg commented May 11, 2018 • edited Loading

eksortso commented May 14, 2018

pradyunsg commented May 15, 2018

eksortso commented May 15, 2018

pradyunsg commented May 16, 2018 • edited Loading

ChristianSi commented May 24, 2018 • edited Loading

sgarciac commented May 24, 2018 • edited Loading

guai commented May 25, 2018 • edited Loading

pradyunsg commented Jun 4, 2018

ChristianSi commented Jun 5, 2018

eksortso commented Jun 5, 2018

eksortso commented Jun 5, 2018

pradyunsg commented Jun 6, 2018

ChristianSi commented Jun 7, 2018

pradyunsg commented Jun 11, 2018

LongTengDao commented Aug 17, 2019 • edited Loading

septatrix commented Sep 24, 2023

lmna commented Feb 3, 2017 •

edited

Loading

pradyunsg commented May 11, 2018 •

edited

Loading

pradyunsg commented May 16, 2018 •

edited

Loading

ChristianSi commented May 24, 2018 •

edited

Loading

sgarciac commented May 24, 2018 •

edited

Loading

guai commented May 25, 2018 •

edited

Loading

LongTengDao commented Aug 17, 2019 •

edited

Loading