Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify comment preservation behavior in the TOML spec #836

Closed
bradleypeabody opened this issue Jul 15, 2021 · 13 comments
Closed

Specify comment preservation behavior in the TOML spec #836

bradleypeabody opened this issue Jul 15, 2021 · 13 comments
Labels

Comments

@bradleypeabody
Copy link

bradleypeabody commented Jul 15, 2021

An issue that comes up more frequently than I believe is realized is the ability (lack thereof) to preserve comments during edits from automated tools, examples: #284 https://github.com/JuliaLang/TOML.jl/issues/32 . Other configuration languages lack this (YAML, JSON, and to some extent XML), and adding a behavior for this to the specification would give TOML yet another leg up.

The Go programming language is a great example of where I believe they got this right. Its tools can read a source file, make edits, and spit it back out with comments preserved (except where automated modifications removed the corresponding section). I realize it's not a configuration file format, but the use case is pretty much the same - you start by hand editing it, and then later you find that since what you're doing is successful and fantastic you want to make an automated tool to handle various things, and you don't want your comments automatically evaporated when you do this.

Not every single possible comment scenario needs to be preserved exactly. E.g. if an automated tool converts a = 1 # just for now to:

# just for now
a = 1

I think such behaviors are perfectly acceptable and a workable tradeoff for being able to allow automated tools to safely modify TOML files without comment loss.

One of the challenges to different implementations adding this functionality is that there is no definition of how it should be done. I think a simple set of rules could be arrived at, which implementations could refer to and follow if they want to support this feature.

I'm sure a proposal could be worked out with an exact set of such rules if there is agreement/interest.

@arp242
Copy link
Contributor

arp242 commented Jul 16, 2021

I don't disagree that round-tripping comments can be very useful, but I don't really see what the advantage would be to add this to the specification? It's not part of the Go specification either for example: it's just an implementation detail of the Go toolchain.

And for a lot of use case preserving comments isn't important at all.

@pradyunsg
Copy link
Member

pradyunsg commented Jul 16, 2021

I don't think this needs to be added to the specification. FWIW, there is at least https://pypi.org/project/tomlkit/ already, that claims to be a style preserving TOML encoder/decoder (I've never actually used it, to know how good a job it does) -- basically, this is achievable without anything explicit about this in the specification, and as @arp242 has noted already, there's lots of use cases where preserving comments and round-tripping behaviours are not required.

If there's a bunch of such parser libraries and their authors reckon that it would be useful to have a common specification for how style preservation should work with TOML, we can look into this then. Otherwise, we're just specifying things in thin air with no binding effects whatsoever and I'd prefer to defer this till then.

@bradleypeabody
Copy link
Author

Thanks for the feedback. I do agree that adding things to the spec out of thin air doesn't necessarily make sense, but there is also a bit of a chicken and egg problem - if the spec never says anything about it then implementers likely won't think about it either until someone asks for it later. I understand the reasons given against adding it to the spec, but also, even if just to document it, here are some arguments on the other side of it that might help choose a direction for this in the future:

  • The advantages to adding it to the spec is so 1) implementers are aware the concern and 2) implementations can be consistent
  • It does not have to be (and indeed should not be) a required feature for an implementation. Implementations that are focused on just reading configs for example have no need for round-trip comments.
  • If we're making comparisons to Go, while it's true the comment behavior is not in the language specification, it is implemented in the reference implementation of the tools (gofmt) that come with the language. This makes a huge difference. It's also mentioned in many places throughout the documentation. With Go, nobody will forget about it or be unaware that this feature exists and is desirable.
  • Conversely, with TOML, there are many more implementations. Something not included in the spec and at the sole discretion of each implementation is almost guaranteed to be rarely implemented and just generally forgotten about.

Maybe the spec doesn't need something detailed and elaborate, but rather just to mention that implementations that choose to have such behavior can do so and some basic guidance on how that would be done.

Another idea is that perhaps a reference implementation would help? I haven't looked at the C TOML implementations, but maybe something written in C and thus usable from many other languages would help work out the behavior in more detail.

@pradyunsg
Copy link
Member

pradyunsg commented Mar 5, 2022

Closing this since I don't think the specification needs to specify "you can decide whether to preserve comments" or describe the implementation concerns of how parsers can represent the style-preserving structure.

@arizvisa
Copy link

Wish I saw this 2 years ago (when this issue was created), as the lack of comment preservation makes comments in certain files completely useless depending entirely upon whether the application serializes its suggestions back to disk...which many do. Despite TOML thinking that all config files should support comments and is intended to be for humans, TOML has unfortunately become another JSON and is intended to be for just apps. Oh wells...

@marzer
Copy link
Contributor

marzer commented Mar 20, 2023

intended to be for just apps

You have this backwards. Comment preservation is only relevant in TOML serialisation contexts, which are by definition performed by an application. The language has tens, possibly hundreds, of thousands of users at this point, very few of which need this. If you have a decent point to make, then by all means make it, but being salty just for the sake of it isn't going to advance the language in any direction you want.

@bradleypeabody
Copy link
Author

very few of which need this

I wonder if there is a way to gain more objective data on this. I run into this issue (config files that need to be equally editable by humans and code) very frequently, so I would be surprised if this is really that rare. But I don't have any actual data other than my own experience and intuition. (And 9 months later my position is still the same - I believe this is actually really important and I suspect more common than is obvious.) If I do think of a way to get more concrete data on this I will certainly post it here.

@eksortso
Copy link
Contributor

Personally, I'm suspicious of applications that re-write their configuration files. Some just do it to store timestamps of last-run activity. There are better options for this sort of thing; separate text files, SQLite database files, and such. Why roll this sort of thing into your app configuration?

@marzer
Copy link
Contributor

marzer commented Mar 21, 2023

@bradleypeabody

I wonder if there is a way to gain more objective data

I maintain a very popular TOML implementation for C++ and this is not a frequent request.

The people that want it request it very loudly, though.

It's the noisy minority case. Most users just want to read from a config. Far fewer need to write one out.

@arizvisa
Copy link

@bradleypeabody, one example is podman's library (containers/common) which uses XDG_CONFIG_HOME to read its configuration but also exposes an interface that updates the file back on disk. But maybe that gets into semantics over what configuration actually means. There's also a project uses TOML to store reverse-engineering artifacts (essentially comments), but since the output isn't designed for humans (rather git) I guess it doesn't really matter there...since it's being used for data exchange.

@eksortso, 100% agreed, but as you know, you can't convince devers to follow any rules other than abide by the capabilities of their abstractions...and for the capabilities of those abstractions, you only have specifications.

@marzer, no point. i just find it funny that toml self-labels as a format for humans, thinks config files should support comments, and then demotes them to being ignored by implementors (similar to json). two decades ago during the dotnet wars ini-style configuration was being phased out for this new machine-and-human-readable format called xml. back then, xml suggested that comments had value, and many people began doing crazy things like annotations and self-documentation because of the word MAY being written in the first version of the spec. in the end, many lives were lost in those wars...although, not at all related to data exchange. still, here's to the next decade of missed opportunities trying to define what human-friendly software actually means, and to our next generation not repeating this cycle.

@ChristianSi
Copy link
Contributor

@arizvisa, @bradleypeabody: If you think this issue is important, why not bring it up with the implementers of your favorite TOML parser/writer? There is certainly nothing in the spec that prevents them from implementing such behavior.

@arizvisa
Copy link

@ChristianSi, My favorite already does. ;-)

@bradleypeabody
Copy link
Author

Thanks for the feedback. In reply to a few of these points:

I'm suspicious of applications that re-write their configuration files.

If we're talking about regular system daemons and such, I agree. However there are definitely uses cases where it makes sense such as tools that operate before compile time (e.g. code/configuration generators), as well as tools that support UI configuration as well as via config file - in which case having both scenarios use the same TOML file as the source of truth tends to make a lot of sense.

It's the noisy minority case. Most users just want to read from a config. Far fewer need to write one out.

Fair enough, good to know.

If you think this issue is important, why not bring it up with the implementers of your favorite TOML parser/writer? There is certainly nothing in the spec that prevents them from implementing such behavior.

Yeah this is a fair point, and it's actually how this started for me. But then questions immediately came up about the rules for such comment preservation. For example, what about comments not associated with any data (at the very top or bottom of the file, or just in between other lines with no clear association to a non-comment line). Each tool essentially has to figure all of this out on its own and decide on the tradeoffs between how detailed does the comment preservation have to be in order to be workable. Which sort of wraps back around to my original point while writing this: If the spec never says anything about it, then implementations will typically not support it or at best will do it inconsistently.

If I put together a list of some guidelines of what might make sense, would that at least be worth looking at? There is a pretty big gap between preserving comments with byte-for-byte reproduction vs a list of a few rules that would be much easier to implement and result in the desired "won't vaporize your comments" behavior. Which is why I think it's worth exploring and writing up instead of just leaving it open ended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants