Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please consider pushing tomli into stdlib #141

Closed
mgorny opened this issue Nov 13, 2021 · 48 comments
Closed

Please consider pushing tomli into stdlib #141

mgorny opened this issue Nov 13, 2021 · 48 comments
Labels
question Further information is requested

Comments

@mgorny
Copy link

mgorny commented Nov 13, 2021

Long story short, having a TOML parser in Python stdlib would be great and tomli seems to be the best implementation available right now, so also the best candidate for stdlib. Would you be interested in trying to push it?

The relevant CPython bug is: https://bugs.python.org/issue40059

@hukkin
Copy link
Owner

hukkin commented Nov 13, 2021

Long story short, having a TOML parser in Python stdlib would be great

Yep, I agree! Also I have no objections to adding Tomli there.

Would you be interested in trying to push it?

I'm not sure

  • what concrete actions I'd need to take
  • if I'm the right person for this
  • if I have the energy

but yeah generally I don't object me or someone else pushing this forward.

@hukkin hukkin added question Further information is requested help wanted Extra attention is needed labels Nov 14, 2021
@merwok
Copy link

merwok commented Dec 20, 2021

Adding a module usually requires:

  • writing a PEP to propose inclusion
  • module developer agreeing to support it for some time in the standard library (developer usually becomes a python core dev)
  • tests and docs should be integrated (no pytest for tests, sphinx for docs)
  • a backport can be maintained, but it should follow the stdlib version and not diverge (see simplejson-json-simplejson example)
  • the python core team may want to review the API and request changes (see ipaddress module for example)

So the process is not just two minutes, but also not impossible! I would suggest that you start by joining the discuss discussion and state that you’re willing to have your module adopted to the standard library. Other people can help make the case for inclusion (not the whole core team is well versed in packaging matters), and maybe offer to be responsible for maintenance.

(Note that the whole team collectively maintains the whole of Python, but some modules have dedicated maintainers who know the design and details well, and for a new module it’s expected to have one. In this specific case, as a parser for a specified format, it may be less important that you commit to maintain for the long term, and it is possible that someone who is both in pypa and python-dev will offer to be co-maintainer.)

@hukkin
Copy link
Owner

hukkin commented Dec 20, 2021

Thanks for your insight!

  • writing a PEP to propose inclusion

This seems tedious. Any help much appreciated 😄

  • module developer agreeing to support it for some time in the standard library

I can do this.

  • tests and docs should be integrated (no pytest for tests, sphinx for docs)

I can do this.

  • a backport can be maintained, but it should follow the stdlib version and not diverge

Will do this anyways to support non EOL Python versions less than the one where stdlib addition happens.

  • the python core team may want to review the API and request changes

Fair enough.

I would suggest that you start by joining the discuss discussion and state that you’re willing to have your module adopted to the standard library.

I've actually been participating for quite some time already. And have stated I'm willing to have the module adopted to the stdlib.


If I've understood correctly, one additional obstacle is that since moving from BDFL to steering council, the process for adding/removing modules is not standardized yet, see python/steering-council#92

@merwok
Copy link

merwok commented Dec 20, 2021

I've actually been participating for quite some time already. And have stated I'm willing to have the module adopted to the stdlib.

Ah good! Then I think you could send a call for help to write the PEP and make a fork of cpython with tomli added (or maybe not, there is no guarantee that the addition will be accepted).

If I've understood correctly, one additional obstacle is that since moving from BDFL to steering council, the process for adding/removing modules is not standardized yet

It’s not really an obstacle: there is a discussion in progress about the process, but that doesn’t mean additions are blocked, just that they use ad-hoc processes. zoneinfo and graphlib were added to 3.9!

@johnthagen
Copy link

If tomli is added to the standard library, I hope that it can be added using the simple name toml for long term consistency.

Packages supporting older Python versions could do something like:

try:
    import toml
except ImportError:
    import tomli as toml

Also note that there are currently plans to get control of the toml package name on PyPI: uiri/toml#361 (comment)

@hauntsaninja
Copy link
Collaborator

Hello @hukkin ! Happy new year and thank you for tomli and thank you for being so willing to help fill this gap in the Python ecosystem!

Over at https://bugs.python.org/issue40059, Brett indicated that the next step would be making a proposal to the SC.
With that in mind, I wrote up https://gist.github.com/hauntsaninja/9f136a5a60f63d8ca2cdfadb50edba44 as something we could send to python-dev.

I'd love to hear your thoughts :-) Would also appreciate any feedback from @pradyunsg (sorry for the somewhat random tag)!

@pradyunsg
Copy link

FYI: There's a bunch of discussion in https://discuss.python.org/t/adopting-recommending-a-toml-parser/4068.

@hauntsaninja
Copy link
Collaborator

hauntsaninja commented Jan 2, 2022

Of course there's a 90 post thread that I've missed :-) Thanks for linking that.

https://discuss.python.org/t/adopting-recommending-a-toml-parser/4068/84 seems to change Brett's stance to "PEP is necessary", so I can help write up something more formal than my current (discursive and informal) proposal.

Not much new in the proposal that's not there in the thread, but might still be a good summary and the anecdotal evidence from grepping open source code might help sell API changes (particularly if we make breaking changes to uiri/toml). Comforting to know that the fallback package names I jotted down have all been suggested :-)

@hukkin
Copy link
Owner

hukkin commented Jan 2, 2022

Awesome work @hauntsaninja !

I added some notes to the gist's comment section and also pinged @gaborbernat there who showed interest in writing a PEP as well.

@hukkin
Copy link
Owner

hukkin commented Jan 2, 2022

Also tagging @encukou as they said they'd be able to co-maintain in the stdlib. I think we've got a pretty good team now. 😸

There's a few major decisions where the PEP must take a side. I assume it's best to plan these out either here or in the PEPs PR and return to https://discuss.python.org with the first draft complete?

Here's some decisions that come to mind:

A) What to do with future TOML spec versions?

  1. TOML v1.0.0 forever
  2. Keep up-to-date with the spec unless it makes breaking syntax changes (probably meaning TOML v2.0.0)
  3. Keep up-to-date even if syntax breaking changes occur.

B) Which name?

  1. toml
  2. tomli
  3. tomllib
  4. tomlparser
  5. toml, but under some other namespace. E.g. parser.toml, decoder.toml, formats.toml (didn't check if any of these are available).

Option 1. is special because the name is currently in use by https://github.com/uiri/toml so being able to use it requires that pradyunsg is able to claim the name. It also means we have to make breaking changes to the toml PyPI package and will break projects that use a pinned toml version (thanks encukou for pointing this out).

C) Should the standard library be able to dump(s) TOML?

  1. No.
  2. Yes, but keep it simple. Copy Tomli-W's API as is.
  3. Yes, but keep it even simpler: remove the multiline_strings keyword argument from Tomli-W's API.
  4. Yes, and allow output customization. Add a json-esque overridable TOMLEncoder class.

My personal favorites are probably: A2, B1 (if possible, else B3 or B4), and C1.

It should be noted that if we choose B1 (use toml name) then not adding dump capability means we need to break the existing toml package more.

EDIT: I know that @pradyunsg would want to keep write capability and TOMLEncoder API in the toml PyPI package, so not sure if they are happy to hand over the name to standard library if it ends up being read-only?

@merwok
Copy link

merwok commented Jan 2, 2022

tomlparser is a nice name for a reading-only module. No expectations of writing and no name conflict.
A namespace was tried for concurrent.futures and it was a mistake (nothing else went there, and adjective for package name is always awkward to me).

It should implement the current spec at v1, could follow minor revisions as bugfixes, and add support for a possible v2 with a separate parser class (or parameters in the existing one if that’s clean and easy). Once in the stdlib, it should keep support for v1 working.

@encukou
Copy link
Collaborator

encukou commented Jan 2, 2022

PRs in the PEP repo aren't a good place for discussion; it's better to only open one when the draft is ready.

A) I'd say 2, and handle TOML v2.0.0 when/if it comes. (Maybe it'll never come – just like JSON v2.0 seems unlikely. Or it'll be possible to detect – for example, pickle.dump has a version argument but pickle.load doesn't need one.)

C) No. The docs should point to tomli-w, but please keep at least the initial implementation read-only.

B) Using toml would break existing code, so I'm against that.
If toml stayes as a third-party library, it can break compatibility without affecting (pinned) consumers, but that' not possible in stdlib. (BTW, I think it wouldn't be bad if tomli+tomli-w were joined and renamed to toml 2.0, and recommended to anyone who needs more than the stdlib. Like how the shorter-named attrs is a more powerful version of stdlib's dataclasses.)

As for the other B) options: tomlparser sounds nice now, but not so nice if a writer eventually does get added. tomllib is better in that regard. Then again, urllib.parse.urlunparse exists and no one is shocked. The name doesn't matter that much.
I agree that the nested namespace is icky, especially since json, pickle, marshal, html etc. can't join.

@hauntsaninja
Copy link
Collaborator

hauntsaninja commented Jan 2, 2022

I've converted my gist to something more PEP sounding, please see: https://github.com/hauntsaninja/peps/blob/toml-pep/pep-9999.rst
We can collaborate on my fork of python/peps. If that feels inconvenient, we can also move to Google doc.

Note that while I put several things under "rejected ideas", most of them are still on the table. My goal was just phrasing things normatively, not shutting down discussion :-)

My answers to questions A, B, C are in the PEP draft, but are:
A) Same as encukou's
B) Using toml if we have tolerance for breaking changes (in practice I think not too many people are affected if we include a write API). My read of the room is that we don't have tolerance for breaking changes, hence I vote tomllib (reasons outlined in draft).
C) Include a basic write API (C2 or C3). Reasons outlined in draft, but mainly because toml.dumps? is used 30% as much as toml.loads? and that seems a large enough fraction that it's providing real value to users (also if we exclude a write API it'll be a monthly python-ideas thread till its inclusion or the heat death of the universe).

@hukkin
Copy link
Owner

hukkin commented Jan 2, 2022

(in practice I think not too many people are affected if we include a write API)

FWIW, I think this is incorrect. By using toml name we'll be breaking literally every Python 3 user that uses toml.load regardless of whether write API is included or not. Tomli's load API only supports binary file objects. Nobody uses binary file objects with the toml library though. I know this because unfortunately they are broken in toml library (a TypeError is raised, this is a Python 3 specific bug, they work in Python 2). Packages that use loads not load will still work fine.

@hukkin
Copy link
Owner

hukkin commented Jan 2, 2022

We can collaborate on my fork of python/peps. If that feels inconvenient, we can also move to Google doc.

Yeah your fork works great! Would be great if you opened a PR to e.g. your own fork's main branch for better review experience.

@hauntsaninja
Copy link
Collaborator

hauntsaninja commented Jan 2, 2022

hauntsaninja/peps#1 is a PR from toml-pep branch to main, for people to review.
https://github.com/hauntsaninja/peps/blob/toml-pep/pep-9999.rst continues to show a rendered version.

Oh oops, totally forgot about the load incompatibilities (I'm sure plenty of people also pass paths to toml.load). Updated the PEP draft in hauntsaninja/peps#2

@johnthagen
Copy link

By using toml name we'll be breaking literally every Python 3 user that uses toml.load regardless of whether write API is included or not.

Note that @pradyunsg is hoping to get maintainership transfer for toml, so if that happened, it would be theoretically be possible to have a unified/non-breaking API under toml.

@hauntsaninja
Copy link
Collaborator

hauntsaninja commented Jan 3, 2022

As mentioned in the Packaging discuss thread and the PEP draft, this would still break users who have pinned versions of the current toml and upgrade Python versions.

@encukou
Copy link
Collaborator

encukou commented Jan 3, 2022

Just to be clear, I will not sponsor write API. In the stdlib there is only one chance to do things right, and there are too many open questions and degrees of freedom around TOML writing.

toml.dumps? is used 30% as much as toml.loads?

That is not a good metric, IMO. One of the reasons to use toml itself is write/roundtrip support.


For other questions my views aren't that strong.

Is the parse_float argument necessary? TOML floats are "IEEE 754 binary64 values", which is float (on all architectures that have reasonable support for binary64). Using Decimal to get extra precision sounds like an extension of the TOML format.

@hukkin
Copy link
Owner

hukkin commented Jan 3, 2022

Is the parse_float argument necessary? TOML floats are "IEEE 754 binary64 values", which is float (on all architectures that have reasonable support for binary64). Using Decimal to get extra precision sounds like an extension of the TOML format.

I don't think it's absolutely necessary, and agree that it is a bit like an extension to TOML. I'm fine with not adding it to the stdlib.

But my anecdotal experience is that this is extremely valuable to anyone dealing with money, probably science too. The alternative is to use TOML strings for fractional numbers, which isn't that great.

When a user sees a TOML float, they don't think "oh that's an IEEE 754 binary64 value", they think "that's a fractional number". Many TOML-facing users are probably not developers, and especially not aware of what a double-precision float is, so I do know that this solves a real-world problem.

@encukou
Copy link
Collaborator

encukou commented Jan 3, 2022

That sounds reasonable.

But my anecdotal experience is that this is extremely valuable to anyone dealing with money, probably science too.

It would be great to put quotes from such users in the PEP, though.

@pradyunsg
Copy link

pradyunsg commented Jan 3, 2022

I'm confused: why do people have compatibility concerns w.r.t. toml? The existing API is not too expansive and is a roughly 1:1 map to how json does this.

The only thing that's slightly off is that there's a few too many encoders in the package, which can be dropped and externalized before moving it into the standard library. That is potentially the only disruptive change, and I don't think that's too disruptive.

@gaborbernat
Copy link

If the stdlib and a 3rd party lib shares name you're bound to get weird bugs when they go out of sync with each other 🤔

@pradyunsg
Copy link

pradyunsg commented Jan 3, 2022

  1. They won't?
  2. There's precedence for this, as "backports"?

It'll basically be a similar story to how existing backport packages work? There's a PyPI package with the same name for existing standard library packages, like dataclasses, that can be used on older versions of Python that don't have it: https://pypi.org/project/dataclasses/

@layday
Copy link
Contributor

layday commented Jan 3, 2022

uiri/toml predates the would-be stdlib toml module and if a package which depends on uiri/toml is not updated to handle the stdlib module, an import toml would import "toml" from the stdlib which would have an incompatible API.

@hauntsaninja
Copy link
Collaborator

hauntsaninja commented Jan 4, 2022

Based on recent discussion, I've made several updates to the PEP draft.
Reviewable: hauntsaninja/peps#1
Rendered: https://github.com/hauntsaninja/peps/blob/toml-pep/pep-9999.rst

The PEP draft continues to suggest tomllib as the package name. I've updated things to help clarify that the issue is breaking users with pinned versions of uiri/toml. I've also tried to describe the differences between uiri/toml and the proposed API in a new Appendix A (mainly based off of a post from @hukkin in another thread).

Just to be clear, I will not sponsor write API.

@encukou I've removed the proposal for adding a write API in the latest draft. Since your chief reservations seem to be around degrees of freedom, I've tried to capture some of the possible design space questions in Appendix B. Please let me know if there's anything important I'm missing.

That is not a good metric, IMO. One of the reasons to use toml itself is write/roundtrip support.

I tried a couple search terms and that number was representative. The latest PEP draft compares number of files containing "toml.load" + "tomli.load" to "toml.dump" + "tomli_w.dump" and finds a similar ratio (~1300 to ~400).
toml has many more hits on https://grep.app/ than tomli + tomli_w (but also many more reverse dependencies). So the combined number is still mostly talking about toml use. I will note that toml doesn't really have roundtrip support.
"tomli_w.dump" to "tomli.load" is about 10-20%: "tomli_w.dump" is only found in 21 files, 9 of which are the tomli-w repo itself.
Let me know if there's a better comparison I can make (or if the comparison isn't informative and should be removed).

Is the parse_float argument necessary?

This is a good question. For now, I've left it in the proposed API, but added a discussion section. I added a TODO in case @hukkin has good examples of user quotes to include, as suggested by encukou.

@gaborbernat
Copy link

@hauntsaninja thanks a lot for taking the effort of writing this up 👏 can you please open a PR to the https://github.com/python/peps 😊 we can get a PEP number on the draft version you have, and keep polishing it with follow-up PRs 😊 I think the current format is through enough that gives a good understanding of what we want to achieve and would allow us to see it rendered on https://www.python.org/dev/peps for better reading experience. (PS. I think for the first round let's not add write API we can always follow it up later with another PEP to extend it, and I imagine the read only API will be less contentious; and that way Petr Viktorin can be the sponsor of the PR).

@encukou
Copy link
Collaborator

encukou commented Jan 4, 2022

I'm quite busy today, but I'll suggest some edits tomorrow.

@layday
Copy link
Contributor

layday commented Jan 6, 2022

If we could retroactively mark uiri/toml releases as incompatible with python_version >= 3.y where y is the version that ships with toml in the stdlib, we could, potentially, safely use the "toml" name. As I understand it, that would mean making a post release for every published version of uiri/toml. pip would be unable to find a version which satisfies toml and error. Environments which are informally migrated to the new Python release would still break.

Of course, this would fly in the face of everything we've been saying about python_requires (that it should not be used for upper bounds).

@gaborbernat
Copy link

gaborbernat commented Jan 6, 2022

If we could retroactively mark uiri/toml releases as incompatible with python_version >= 3.y where y is the version that ships with toml in the stdlib, we could, potentially, safely use the "toml" name.

And mark everything using the toml library on the new python version broken (if we'd not replicate the toml interface and functionality 100%)? Feels to me not worth it. I think using toml in the standard library ship has sailed. And I'm happy to sidestep the issue and go with tomllib instead.

@layday
Copy link
Contributor

layday commented Jan 6, 2022

We could find direct dependents and help them transition - it all depends on how much time and effort we're willing to expend and if we think securing the "toml" name is worth it. uiri/toml is abandoned and, realistically, any project which is not able to switch to tomli or the stdlib backport in a year or two years' time, will itself have been abandoned.

@gaborbernat
Copy link

We could find direct dependents and help them transition

What about all the enterprise projects that we don't have access to? Surely, these companies will not be happy if we generate countless hours of migration for adopting a new Python version. In the enterprise, world projects are rarely abandoned in two years' time.

@encukou
Copy link
Collaborator

encukou commented Jan 6, 2022

IMO, having a good fully-featured TOML library on PyPI as toml, and a limited tomllib in stdlib, is a good outcome to to aim for.

I'd like to avoid repeating the situation where people were assuming @hynek's attrs was “legacy” after a small part of it was reimagined as dataclasses in stdlib. I'd like to make things very clear in documentation, but leaving the more desirable name to the community feels like a good thing.

@hukkin hukkin removed the help wanted Extra attention is needed label Jan 7, 2022
@hauntsaninja
Copy link
Collaborator

Draft PEP is now visible at https://www.python.org/dev/peps/pep-0680/

@hukkin
Copy link
Owner

hukkin commented Jan 10, 2022

What's the next step? Should we open a new thread on https://discuss.python.org/c/peps/ ?

EDIT: We now have it https://discuss.python.org/t/pep-680-tomllib-support-for-parsing-toml-in-the-standard-library/13040 !

@gaborbernat
Copy link

gaborbernat commented Jan 10, 2022 via email

@encukou
Copy link
Collaborator

encukou commented Jan 26, 2022

What's the status? Is the PEP ready, or is there still something to hash out?

@hukkin
Copy link
Owner

hukkin commented Jan 26, 2022

The PEP seems ready to me.

The only thing I have any concerns about is parse_float. @encukou pretty much already had me convinced we should remove it. But then Paul Moore spotted that the spec uses "should" rather than "must", meaning that double precision float is more a recommendation than a hard requirement.

So now I think it may still be a justified feature. I made a PR to make the PEP acknowledge the use of the word "should": python/peps#2278

@hauntsaninja
Copy link
Collaborator

Yup, agreed that PEP seems ready.

My summary of the discuss thread:

@hukkin
Copy link
Owner

hukkin commented Jan 26, 2022

Thanks @hauntsaninja I think that's a good summary.

I'm curious, how do we proceed when we think the PEP is ready? My understanding is the PEP goes to steering council, but how long should the discuss thread be open before that? What deadlines (for PEP acceptance and merging to CPython) do we have for having a realistic chance of being included in Python 3.11?

@gaborbernat
Copy link

gaborbernat commented Jan 26, 2022

Usually it's considered done if no feedback/discussion happened in the last week.

@encukou
Copy link
Collaborator

encukou commented Jan 27, 2022

One more missing thing is a notice to the python-dev list, like https://mail.python.org/archives/list/python-dev@python.org/thread/G4F3ZMCJRWWRSF7O34Z7RPYQQK7QPGB6/
Do you want to send one? (Sorry for not noticing this earlier!)
Then, if you don't anticipate any more changes to the PEP, open the SC request.

The deadline for merging into 3.11 is the first beta, planned for May 6th.

@hukkin
Copy link
Owner

hukkin commented Jan 27, 2022

Thanks @encukou, let's get the python-dev notice out of our way! And it seems we have plenty time seeing how far we are already (for the curious, we already have a draft PR to CPython).

Seeing that I chose to not add my email address to the PEP, and that an email to python-dev will inevitably be archived and publicized, would @hauntsaninja be kind enough to send the email. I've drafted one already so you can copy-paste (or feel free to edit):

To:

python-dev@python.org

Subject:

RFC on PEP 680 -- tomllib: Support for Parsing TOML in the Standard Library

Body:

This PEP [1] proposes adding the tomllib module to the standard library for parsing TOML (Tom's Obvious Minimal Language, https://toml.io). The main motivation is to address bootstrapping problems with pyproject.toml based builds. The proposed implementation is the Tomli [2] package.

Since feedback has been positive and there is no obvious pushback in the Discuss thread [3], we wanted to get your comments and suggestions before submitting to the Steering Council.

Thanks,
Taneli Hukkinen
Shantanu Jain

[1] https://www.python.org/dev/peps/pep-0680/
[2] https://github.com/hukkin/tomli
[3] https://discuss.python.org/t/13040

@hukkin
Copy link
Owner

hukkin commented Feb 8, 2022

I've submitted the PEP for steering council consideration: python/steering-council#104

@hukkin
Copy link
Owner

hukkin commented Feb 22, 2022

A status update: It seems Python 3.11 will able to parse TOML! 🎉

SC has accepted the PEP.

I've opened a PR to CPython.

@hukkin
Copy link
Owner

hukkin commented Feb 23, 2022

I'm closing this issue seeing that the decision has been made, and Tomli will be in the standard library.

A massive thank you to everyone in this thread for the help, reviews, comments, support etc.! And an especially huge thank you to @hauntsaninja for writing the PEP draft.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

9 participants