From e684d318ef8ebf0a4bce600cfeae887036d89299 Mon Sep 17 00:00:00 2001 From: hauntsaninja <> Date: Sun, 2 Jan 2022 13:42:05 -0600 Subject: [PATCH 01/23] PEP ???: TOML --- pep-9999.rst | 438 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 438 insertions(+) create mode 100644 pep-9999.rst diff --git a/pep-9999.rst b/pep-9999.rst new file mode 100644 index 00000000000..af64b1b9f5c --- /dev/null +++ b/pep-9999.rst @@ -0,0 +1,438 @@ +PEP: 9999 +Title: Support for TOML in the Standard Library +Author: Taneli Hukkinen, Shantanu Jain +Sponsor: TODO +PEP-Delegate: TODO +Discussions-To: TODO +Status: Draft +Type: Standards Track +Content-Type: text/x-rst +Created: 01-Jan-2022 +Python-Version: 3.11 +Post-History: 1900-01-01 + + +Abstract +======== + +This proposes adding a module, ``tomllib``, to the standard library for parsing +and writing TOML. [1]_ + + +Motivation +========== + +The TOML format is the format of choice for Python packaging, as evidenced by +:pep:`517`, :pep:`518` and :pep:`621`. Including TOML support in the standard +library helps avoid bootstrapping problems for Python build tools. Currently +most Python build tools need to vendor a TOML parsing library. + +Python tools are increasingly configurable via TOML, for examples: ``black``, +``mypy``, ``pytest``, ``tox``, ``pylint``, ``isort``. Those that are not, such +as ``flake8``, cite the lack of standard library support as a `main reason why +`_. + +Given the special place TOML already has in the Python ecosystem, it makes sense +for this to be an included battery. + +Finally, TOML as a format is increasingly popular (some reasons for this are +outlined in PEP 518). Hence this is likely to be a generally useful addition, +even looking beyond the needs of Python packaging and Python tooling: various +Python TOML libraries have about 2000 reverse dependencies on PyPI. For +comparison, ``requests`` has about 28k reverse dependencies. + + +Rationale +========= + +This PEP proposes basing the standard library support for TOML on the third party +libraries ``tomli`` [2]_ and ``tomli-w`` [3]_. + +Many projects have recently switched to using ``tomli``, for example, ``pip``, +``build``, ``pytest``, ``mypy``, ``black``, ``flit``, ``coverage``, +``setuptools-scm``, ``cibuildwheel``. + +These libraries are actively maintained and well-tested. ``tomli`` is about 800 +lines of code with 100% test coverage. ``tomli-w`` is about 200 lines of code with +100% test coverage. + + +Specification +============= + +Read API + +.. code-block:: + + def load(fp: SupportsRead[bytes], /, *, parse_float: Callable[[str], Any] = float) -> dict[str, Any]: ... + def loads(s: str, /, *, parse_float: Callable[[str], Any] = float) -> dict[str, Any]: ... + +``tomllib.load`` deserializes a ``.read()``-supporting binary file containing a +TOML document to a Python object. + +``tomllib.loads`` deserializes a str instance containing a TOML document to a +Python object. + +``parse_float`` is a function that takes a string and returns a float, as with ``json.load``. +For example, ``decimal.Decimal`` in cases where precision is important. + +``tomllib.TOMLDecodeError`` is raised in the case of invalid TOML. + +Write API + +.. code-block:: + + def dump(obj: Mapping[str, Any], fp: SupportsWrite[bytes], /, *, multiline_strings: bool = False) -> None: ... + def dumps(obj: Mapping[str, Any], /, *, multiline_strings: bool = False) -> str: ... + + +``tomllib.dumps`` serialize obj as a TOML formatted stream to a +``.write()``-supporting file-like object. + +``tomllib.dump`` serializes an object to a TOML formatted str. + + +``multiline_strings`` controls whether strings containing newlines are written +as multiline strings. This defaults to False in case users wish to ensure +preservation of newline byte sequences. + +TODO: describe types supported + + +Maintenance Implications +======================== + +Stability of TOML +----------------- + +The release of TOML v1 in January 2021 indicates stability. Empirically, TOML +has proven to be a stable format even prior to the release of TOML v1. From the +`changelog `_, we +see TOML has had no major changes since April 2020 and has had two releases in +the last five years. + +In the event of changes to the TOML specification, we could treat minor +revisions as bugfixes and update the implementation in place. In the event of +major breaking changes, we should preserve support for TOML v1. + +Maintainability of proposed implementation +------------------------------------------ + +The proposed implementation (``tomli`` and ``tomli-w``) is in pure Python, well +tested and combined weigh under 1000 lines of code. They are both minimalistic, +offering a smaller API surface area than other TOML implementations. + +The author of ``tomli`` is willing to help integrate ``tomli`` into the standard +library and help maintain it, `as per this +`_. + +There is unlikely to be demand for an extension module, since there is +relatively less need for performance in parsing TOML: it's rare for application +bottleneck to be reading configuration. Users with extreme performance needs can +use a third party library (as is already often the case with JSON, despite a +stdlib extension module). + +TOML support a slippery slope for other things +---------------------------------------------- + +As discussed in motivations, TOML holds a special place in the Python ecosystem. +This chief reason to include TOML in the standard library does not apply to +other formats, such as YAML or MessagePack. + +In addition, the simplicity of TOML can help serve as a dividing line, for +example, YAML is large and complicated. + + +Backwards Compatibility +======================= + +This will have no backwards compatibility issues as it will create a new API. + +Note that a current open issue is whether to use the ``toml`` name for the +package instead of ``tomllib``, in which case there will be backwards +compatibility implications for users who have pinned versions of the current +``toml`` PyPI package. + + +Security Implications +===================== + +Errors in the implementation could cause potential security issues. However, the +implementation will be in pure Python, which reduces surface area of attack. + + +How to Teach This +================= + +The API of ``tomllib`` mimics that of other well-established file format libraries, +such as ``json`` and ``pickle``. + + +Reference Implementation +======================== + +Link to any existing implementation and details about its state, e.g. proof-of-concept. + +https://github.com/hukkin/tomli + +https://github.com/hukkin/tomli-w + + +Rejected Ideas +============== + +Roundtripping style +------------------- + +In general, ``tomllib.dumps(tomllib.loads(x))`` may not equal ``x``, since we +make no effort to preserve comments, whitespace or other stylistic choices. + +Style preservation would allow tools to losslessly edit TOML files. Since TOML +is intended as human-readable and human-editable configuration, it's important +to preserve human markup. + +However, only a relatively small fraction of use cases require losslessly +editing TOML, as judged by reverse dependencies the style preserving ``tomlkit`` +library compared to that of other third party toml libraries. In particular, we +don't need it for the core Python packaging use cases or for tools that merely +need to read configuration. + +Since this would make both the implementation and the API more complex, it seems +better to relegate this additional functionality to third party libraries. + +Basing on another TOML implementation +------------------------------------- + +Potential alternatives include: + +* ``tomlkit``. + ``tomlkit`` is well established, actively maintained and supports TOML v1. + An important difference is that ``tomlkit`` supports style roundtripping. As a + result, it has a more complex API and implementation (about 5x as much code as + ``tomli``). The author does not believe that ``tomlkit`` is a good choice for + the standard library. + +* ``toml``. + ``toml`` is a widely used library. However, it is not actively maintained and + does not support TOML v1. Its API is more complex than that of ``tomli``. + It has some very limited ability and mostly unused ability to preserve style + through an undocumented decoder API. It has the ability to customise output + style through a complicated encoder API. + For more details on API differences, refer to this `discuss thread + `_. + +* ``pytomlpp``. + ``pytomlpp`` is a Python wrapper for the C++ project ``toml++``. Pure Python + libraries are easier to maintain than extension modules. + +* ``rtoml``. + ``rtoml`` is a Python wrapper for the Rust project ``toml-rs`` and hence has + similar shortcomings to ``pytomlpp``. In addition, it does not support TOML v1. + +* Writing from scratch. + It's unclear what we would get from this: ``tomli`` meets our needs and the + author is willing to help with its inclusion in the standard library. + +Only including an API for reading TOML +-------------------------------------- + +There are several reasons to not include an API for writing TOML: + +The ability to write TOML is not needed for the use cases that motivate this +PEP: for core Python packaging use cases or for tools that need to read +configuration. + +As discussed in the previous section, use cases that involve editing TOML (as +opposed to writing brand new TOML) are better served by a style preserving +library. + +Values in TOML can be represented in multiple ways. To the extent that users +want control over how the output TOML ends up being formatted (how to format +strings, when to inline arrays or tables, how much to indent, whether to reorder +contents, etc), they will not be served well by the proposed API. + +The standard library does not need to do everything and if we feel that most +users are better served by more powerful third party write APIs, exclusion is +acceptable (and could be revisited later). + +However, users will likely expect a write API to be available for consistency. +Empirically, writing TOML seems useful, e.g. ``toml.dump`` is used about 30% as +often as ``toml.load`` based on https://grep.app + +Even a simple API is capable of serving common use cases, such as testing code +that loads TOML or writing simple or boilerplate TOML. +TODO: about 1/5 uses of ``toml.dump[s]`` are in tests, estimate other simple use cases + +If we keep feature set narrow, a write API shouldn't be too much additional +burden. The proposed implementation is about 200 lines of code. + +Finally, an open issue is whether we're able to re-use the ``toml`` package name. +If so, having a basic write API will minimise disruption for most affected +users. + + +Assorted API details +-------------------- + +Controlling the type of mappings returned by ``tomllib.load[s]`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This would work similarly to common uses for the ``object_hook`` argument in +``json.load[s]``. + +Such an argument is not necessary for the core use cases outlined in the +motivation section. The absence of this can be pretty easily worked around using +a wrapper class or transformer function. Finally, support could be added later +in a backward compatible way. + +The ``toml`` library on PyPI supports this feature using the ``_dict`` argument. We were +able to find several uses of this on https://grep.app, however, almost all of +them were passing ``_dict=OrderedDict`` which should no longer be necessary +since Python 3.7. There were two instances of legitimate use: in one case, a +custom class was passed for friendlier KeyErrors, in another case, several +lookup and mutation methods to the custom class (e.g. to help resolve dotted +keys). + +Types accepted by the first argument of ``tomllib.load`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The ``toml`` library on PyPI allows passing path-like objects (and lists of +path-like objects, reading the first path that exists). Doing this would be +inconsistent with ``json.load``, ``pickle.load``, etc. If we agree consistency +with other stdlib modules is desirable, this is somewhat out of scope for this +PEP. This can easily and perhaps more explicitly be worked around in user code. + +The proposed API takes a ``SupportsRead[bytes]``, while ``toml.load`` takes a +``SupportsRead[str]`` and ``json.load`` takes ``SupportsRead[str | bytes]``. +While slightly opinionated, this was a recent change in ``tomli`` v1.2 to a) +ensure utf-8 is the encoding used, b) avoid incorrectly parsing single carriage +returns as valid TOML due to universal newlines. + +Allowing users more control over formatting ``tomllib.dump[s]`` output +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +As mentioned, TOML values can be represented in multiple ways, so inevitably, +people will have strong opinions over how to do so. + +The ``toml`` library on PyPI supports this using custom subclasses of +``toml.TomlEncoder``. There are a handful of instances of this that can be found +on https://grep.app. However, the API to do this is not particularly clean. + +A non-exhaustive list of potential options users may want control over: + +* How to format strings +* When to inline arrays or tables +* How much to indent +* Whether to reorder contents +* Whether to use dotted keys + +In several cases, users could enforce TOML formatting by using an autoformatter +of their choice at a later point. + +We acknowledge that supporting ``multiline_strings`` is something of an +exception to this, if controversial we can err on the side of simplicity and +remove it. + +Allowing users more control over ``tomllib.dump[s]`` serialisation +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +It could be useful to add the equivalent of the ``default`` argument in ``json.dump`` +to allow users to specify how custom types should be serialised. + +The ``toml`` library on PyPI supports this using custom subclasses of +``toml.TomlEncoder``. However, we could find two instances of using +``toml.TomlEncoder`` to accomplish this kind of thing on https://grep.app, one +of which was to add support to ``toml`` for dumping ``decimal.Decimal``. + +TOML is used more for configuration than serialisation of arbitrary data, so +users are perhaps less likely to require custom serialisation than with say +JSON. Support for this could be added in a backward compatible way. + +TODO: talk about output validation for ``dump[s]`` + +Open Issues +=========== + +Package name +------------ + +Ideally, we would be able to use the ``toml`` package name. The ``toml`` package +on PyPI is both widely used and not actively maintained. + +If the maintainer of ``toml`` resurfaced and was willing to give up the ``toml`` +name on PyPI, we could repurpose the PyPI package as a stdlib backport. This +would still potentially be breaking for users who have pinned current versions +of the ``toml`` package and have upgraded Python versions. However, based on +https://grep.app, relatively few users would be affected as long as we +include a basic write API. + +This PEP proposes ``tomllib``. This mirrors ``plistlib`` (another file format +module in the standard library), as well as several others such as ``pathlib``, +``graphlib``, etc. + +Other bikesheds include: + +* ``tomlparser``. This mirrors ``configparser``, but is perhaps slightly less + appropriate if we include a write API. +* ``tomli``. This assumes we use ``tomli`` as the basis for implementation. +* ``toml``, but under some namespace, such as ``parser.toml`` or + ``decoder.toml``. However, this is sort of awkward, especially since existing + libraries like ``json``, ``pickle``, ``marshal``, ``html`` etc. will not be + included in the namespace. + +Only including an API for reading TOML +-------------------------------------- + +Currently discussed in rejected ideas but a major open issue. + + +TODO: Random things +=================== + +Previous discussion: + +* https://bugs.python.org/issue40059 +* https://mail.python.org/archives/list/python-ideas@python.org/thread/IWJ3I32A4TY6CIVQ6ONPEBPWP4TOV2V7/ +* https://mail.python.org/pipermail/python-dev/2019-May/157405.html +* https://github.com/hukkin/tomli/issues/141 +* https://discuss.python.org/t/adopting-recommending-a-toml-parser/4068/84 + +Useful https://grep.app searches (note, ignore vendored): + +* toml.load[s] usage https://grep.app/search?q=toml.load&filter[lang][0]=Python +* toml.dump[s] usage https://grep.app/search?q=toml.dump&filter[lang][0]=Python +* TomlEncoder subclasses https://grep.app/search?q=TomlEncoder%29%3A&filter[lang][0]=Python + + +References +========== + +.. [1] + TOML: Tom's Obvious Minimal Language + https://toml.io/en/ + +.. [2] + tomli + https://github.com/hukkin/tomli + +.. [3] + tomli-w + https://github.com/hukkin/tomli-w + + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive. + + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: From 17e4714e6c36aaba4542415966b9848dafe7dc5f Mon Sep 17 00:00:00 2001 From: hauntsaninja Date: Sun, 2 Jan 2022 17:02:59 -0600 Subject: [PATCH 02/23] Fix discussion of breaking `toml` users --- pep-9999.rst | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/pep-9999.rst b/pep-9999.rst index af64b1b9f5c..667c99bc211 100644 --- a/pep-9999.rst +++ b/pep-9999.rst @@ -218,7 +218,7 @@ Potential alternatives include: It has some very limited ability and mostly unused ability to preserve style through an undocumented decoder API. It has the ability to customise output style through a complicated encoder API. - For more details on API differences, refer to this `discuss thread + For more details on API differences, refer to this `discuss post `_. * ``pytomlpp``. @@ -267,8 +267,7 @@ If we keep feature set narrow, a write API shouldn't be too much additional burden. The proposed implementation is about 200 lines of code. Finally, an open issue is whether we're able to re-use the ``toml`` package name. -If so, having a basic write API will minimise disruption for most affected -users. +If so, having a basic write API could reduce disruption for affected users. Assorted API details @@ -304,7 +303,7 @@ PEP. This can easily and perhaps more explicitly be worked around in user code. The proposed API takes a ``SupportsRead[bytes]``, while ``toml.load`` takes a ``SupportsRead[str]`` and ``json.load`` takes ``SupportsRead[str | bytes]``. -While slightly opinionated, this was a recent change in ``tomli`` v1.2 to a) +While slightly opinionated, this was changed in ``tomli`` v1.2 to a) ensure utf-8 is the encoding used, b) avoid incorrectly parsing single carriage returns as valid TOML due to universal newlines. @@ -360,11 +359,14 @@ Ideally, we would be able to use the ``toml`` package name. The ``toml`` package on PyPI is both widely used and not actively maintained. If the maintainer of ``toml`` resurfaced and was willing to give up the ``toml`` -name on PyPI, we could repurpose the PyPI package as a stdlib backport. This -would still potentially be breaking for users who have pinned current versions -of the ``toml`` package and have upgraded Python versions. However, based on -https://grep.app, relatively few users would be affected as long as we -include a basic write API. +name on PyPI, we could repurpose the PyPI package as a stdlib backport. However, +this would still be breaking for users who have pinned current versions of the +``toml`` package and have upgraded Python versions. The two API +incompatibilities that most users of current ``toml`` would run into are a) +different acceptable types to the first argument of ``toml.load``, b) use of +``dump[s]`` if we choose not to include a write API. +For more details on API differences, refer to this `discuss post +`_. This PEP proposes ``tomllib``. This mirrors ``plistlib`` (another file format module in the standard library), as well as several others such as ``pathlib``, From ca4c15c803342fcc20abd5d42a63479cb0b74150 Mon Sep 17 00:00:00 2001 From: hauntsaninja <> Date: Sun, 2 Jan 2022 17:35:33 -0600 Subject: [PATCH 03/23] Fix typo --- pep-9999.rst | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/pep-9999.rst b/pep-9999.rst index 667c99bc211..fdfaaf2dd72 100644 --- a/pep-9999.rst +++ b/pep-9999.rst @@ -284,12 +284,12 @@ motivation section. The absence of this can be pretty easily worked around using a wrapper class or transformer function. Finally, support could be added later in a backward compatible way. -The ``toml`` library on PyPI supports this feature using the ``_dict`` argument. We were -able to find several uses of this on https://grep.app, however, almost all of -them were passing ``_dict=OrderedDict`` which should no longer be necessary -since Python 3.7. There were two instances of legitimate use: in one case, a -custom class was passed for friendlier KeyErrors, in another case, several -lookup and mutation methods to the custom class (e.g. to help resolve dotted +The ``toml`` library on PyPI supports this feature using the ``_dict`` argument. +There are several uses of this on https://grep.app, however, almost all of them +were passing ``_dict=OrderedDict``, which should no longer be necessary post +Python 3.7. There were two instances of legitimate use: in one case, a custom +class was passed for friendlier KeyErrors, in another case, the custom class had +several additional lookup and mutation methods (e.g. to help resolve dotted keys). Types accepted by the first argument of ``tomllib.load`` From 961cb7899e03a4ce1320a85d05f4e1a7feed4e5c Mon Sep 17 00:00:00 2001 From: hauntsaninja <> Date: Sun, 2 Jan 2022 17:35:41 -0600 Subject: [PATCH 04/23] `decoder` is not available --- pep-9999.rst | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/pep-9999.rst b/pep-9999.rst index fdfaaf2dd72..d9a03b0ff10 100644 --- a/pep-9999.rst +++ b/pep-9999.rst @@ -377,10 +377,9 @@ Other bikesheds include: * ``tomlparser``. This mirrors ``configparser``, but is perhaps slightly less appropriate if we include a write API. * ``tomli``. This assumes we use ``tomli`` as the basis for implementation. -* ``toml``, but under some namespace, such as ``parser.toml`` or - ``decoder.toml``. However, this is sort of awkward, especially since existing - libraries like ``json``, ``pickle``, ``marshal``, ``html`` etc. will not be - included in the namespace. +* ``toml``, but under some namespace, such as ``parser.toml``. However, this is + sort of awkward, especially since existing libraries like ``json``, ``pickle``, + ``marshal``, ``html`` etc. will not be included in the namespace. Only including an API for reading TOML -------------------------------------- From 956d3bb43f0b8822ad64ad02f46f3b8b51a2f0a0 Mon Sep 17 00:00:00 2001 From: hauntsaninja <> Date: Sun, 2 Jan 2022 19:04:04 -0600 Subject: [PATCH 05/23] Remove opinionated use of "opinionated" --- pep-9999.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/pep-9999.rst b/pep-9999.rst index d9a03b0ff10..0e053d912e8 100644 --- a/pep-9999.rst +++ b/pep-9999.rst @@ -303,9 +303,9 @@ PEP. This can easily and perhaps more explicitly be worked around in user code. The proposed API takes a ``SupportsRead[bytes]``, while ``toml.load`` takes a ``SupportsRead[str]`` and ``json.load`` takes ``SupportsRead[str | bytes]``. -While slightly opinionated, this was changed in ``tomli`` v1.2 to a) -ensure utf-8 is the encoding used, b) avoid incorrectly parsing single carriage -returns as valid TOML due to universal newlines. +Using ``SupportsRead[bytes]`` allows us to a) ensure utf-8 is the encoding used, +b) avoid incorrectly parsing single carriage returns as valid TOML due to +universal newlines. Allowing users more control over formatting ``tomllib.dump[s]`` output ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ From 7dc32cdfe1486028e5a234baa4df083cd3c94ee6 Mon Sep 17 00:00:00 2001 From: hauntsaninja <> Date: Sun, 2 Jan 2022 19:06:41 -0600 Subject: [PATCH 06/23] Call out TOMLDecodeError in major incompatibilities --- pep-9999.rst | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/pep-9999.rst b/pep-9999.rst index 0e053d912e8..2851a52776c 100644 --- a/pep-9999.rst +++ b/pep-9999.rst @@ -361,12 +361,14 @@ on PyPI is both widely used and not actively maintained. If the maintainer of ``toml`` resurfaced and was willing to give up the ``toml`` name on PyPI, we could repurpose the PyPI package as a stdlib backport. However, this would still be breaking for users who have pinned current versions of the -``toml`` package and have upgraded Python versions. The two API -incompatibilities that most users of current ``toml`` would run into are a) -different acceptable types to the first argument of ``toml.load``, b) use of -``dump[s]`` if we choose not to include a write API. -For more details on API differences, refer to this `discuss post -`_. +``toml`` package and have upgraded Python versions. + +The two API incompatibilities that most users of current ``toml`` would run into +are a) different acceptable types to the first argument of ``toml.load``, b) use +of ``dump[s]`` if we choose not to include a write API, c) +``toml.TomlDecodeError`` vs the PEP 8 compliant ``toml.TOMLDecodeError``. There +are other, comparatively minor API differences; if interested, refer to this +`discuss post `_. This PEP proposes ``tomllib``. This mirrors ``plistlib`` (another file format module in the standard library), as well as several others such as ``pathlib``, From d4626ae1bbec3a1f7f146a009aec519fcb11dcf6 Mon Sep 17 00:00:00 2001 From: hauntsaninja <> Date: Sun, 2 Jan 2022 19:08:03 -0600 Subject: [PATCH 07/23] Fix typo, clarify "comparatively minor" --- pep-9999.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/pep-9999.rst b/pep-9999.rst index 2851a52776c..075b7e412ed 100644 --- a/pep-9999.rst +++ b/pep-9999.rst @@ -363,11 +363,11 @@ name on PyPI, we could repurpose the PyPI package as a stdlib backport. However, this would still be breaking for users who have pinned current versions of the ``toml`` package and have upgraded Python versions. -The two API incompatibilities that most users of current ``toml`` would run into +The three API incompatibilities that most users of current ``toml`` would run into are a) different acceptable types to the first argument of ``toml.load``, b) use of ``dump[s]`` if we choose not to include a write API, c) ``toml.TomlDecodeError`` vs the PEP 8 compliant ``toml.TOMLDecodeError``. There -are other, comparatively minor API differences; if interested, refer to this +are other minor or less widely used API differences; if interested, refer to this `discuss post `_. This PEP proposes ``tomllib``. This mirrors ``plistlib`` (another file format From 5605f824220c24dcaa76998f17442c784845f83b Mon Sep 17 00:00:00 2001 From: hauntsaninja <> Date: Mon, 3 Jan 2022 18:26:28 -0600 Subject: [PATCH 08/23] Fix section headings --- pep-9999.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/pep-9999.rst b/pep-9999.rst index 075b7e412ed..a2f994a0a7b 100644 --- a/pep-9999.rst +++ b/pep-9999.rst @@ -293,7 +293,7 @@ several additional lookup and mutation methods (e.g. to help resolve dotted keys). Types accepted by the first argument of ``tomllib.load`` -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``toml`` library on PyPI allows passing path-like objects (and lists of path-like objects, reading the first path that exists). Doing this would be @@ -333,7 +333,7 @@ exception to this, if controversial we can err on the side of simplicity and remove it. Allowing users more control over ``tomllib.dump[s]`` serialisation -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ It could be useful to add the equivalent of the ``default`` argument in ``json.dump`` to allow users to specify how custom types should be serialised. From fc8a3ed9220a2b519829a9ede6a9a45678871d0a Mon Sep 17 00:00:00 2001 From: hauntsaninja <> Date: Mon, 3 Jan 2022 19:42:48 -0600 Subject: [PATCH 09/23] Mention encukou in maintainability --- pep-9999.rst | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/pep-9999.rst b/pep-9999.rst index a2f994a0a7b..9869b89693c 100644 --- a/pep-9999.rst +++ b/pep-9999.rst @@ -124,7 +124,10 @@ offering a smaller API surface area than other TOML implementations. The author of ``tomli`` is willing to help integrate ``tomli`` into the standard library and help maintain it, `as per this -`_. +`__. At least +one CPython core dev has indicated potential willingness to maintain it, +`as per this +`__. There is unlikely to be demand for an extension module, since there is relatively less need for performance in parsing TOML: it's rare for application From aeddda019baed3649efcec316db99c3420bb9090 Mon Sep 17 00:00:00 2001 From: hauntsaninja <> Date: Mon, 3 Jan 2022 20:52:03 -0600 Subject: [PATCH 10/23] More details on using the name "toml" and the toml API --- pep-9999.rst | 206 ++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 154 insertions(+), 52 deletions(-) diff --git a/pep-9999.rst b/pep-9999.rst index 9869b89693c..05902a8a914 100644 --- a/pep-9999.rst +++ b/pep-9999.rst @@ -151,10 +151,9 @@ Backwards Compatibility This will have no backwards compatibility issues as it will create a new API. -Note that a current open issue is whether to use the ``toml`` name for the -package instead of ``tomllib``, in which case there will be backwards +Note that we avoid using the ``toml`` name for the module, to avoid backwards compatibility implications for users who have pinned versions of the current -``toml`` PyPI package. +``toml`` PyPI package. For more details, see ``_. Security Implications @@ -216,13 +215,12 @@ Potential alternatives include: the standard library. * ``toml``. - ``toml`` is a widely used library. However, it is not actively maintained and - does not support TOML v1. Its API is more complex than that of ``tomli``. - It has some very limited ability and mostly unused ability to preserve style - through an undocumented decoder API. It has the ability to customise output - style through a complicated encoder API. - For more details on API differences, refer to this `discuss post - `_. + ``toml`` is a widely used library. However, it is not actively maintained, + does not support TOML v1 and has several known bugs. Its API is more complex + than that of ``tomli``. It has some very limited and mostly unused ability to + preserve style through an undocumented decoder API. It has the ability to + customise output style through a complicated encoder API. For more details on + API differences, refer to `Appendix A`_. * ``pytomlpp``. ``pytomlpp`` is a Python wrapper for the C++ project ``toml++``. Pure Python @@ -269,13 +267,25 @@ TODO: about 1/5 uses of ``toml.dump[s]`` are in tests, estimate other simple use If we keep feature set narrow, a write API shouldn't be too much additional burden. The proposed implementation is about 200 lines of code. -Finally, an open issue is whether we're able to re-use the ``toml`` package name. -If so, having a basic write API could reduce disruption for affected users. - Assorted API details -------------------- +Types accepted by the first argument of ``tomllib.load`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The ``toml`` library on PyPI allows passing paths (and lists of path-like +objects, reading the first path that exists). Doing this would be inconsistent +with ``json.load``, ``pickle.load``, etc. If we agree consistency with other +stdlib modules is desirable, allowing paths is somewhat out of scope for this +PEP. This can easily and more explicitly be worked around in user code. + +The proposed API takes a ``SupportsRead[bytes]``, while ``toml.load`` takes a +``SupportsRead[str]`` and ``json.load`` takes ``SupportsRead[str | bytes]``. +Using ``SupportsRead[bytes]`` allows us to a) ensure utf-8 is the encoding used, +b) avoid incorrectly parsing single carriage returns as valid TOML due to +universal newlines. + Controlling the type of mappings returned by ``tomllib.load[s]`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -295,21 +305,6 @@ class was passed for friendlier KeyErrors, in another case, the custom class had several additional lookup and mutation methods (e.g. to help resolve dotted keys). -Types accepted by the first argument of ``tomllib.load`` -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -The ``toml`` library on PyPI allows passing path-like objects (and lists of -path-like objects, reading the first path that exists). Doing this would be -inconsistent with ``json.load``, ``pickle.load``, etc. If we agree consistency -with other stdlib modules is desirable, this is somewhat out of scope for this -PEP. This can easily and perhaps more explicitly be worked around in user code. - -The proposed API takes a ``SupportsRead[bytes]``, while ``toml.load`` takes a -``SupportsRead[str]`` and ``json.load`` takes ``SupportsRead[str | bytes]``. -Using ``SupportsRead[bytes]`` allows us to a) ensure utf-8 is the encoding used, -b) avoid incorrectly parsing single carriage returns as valid TOML due to -universal newlines. - Allowing users more control over formatting ``tomllib.dump[s]`` output ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -352,26 +347,39 @@ JSON. Support for this could be added in a backward compatible way. TODO: talk about output validation for ``dump[s]`` -Open Issues -=========== +Alternative names for module +---------------------------- + +Ideally, we would be able to use the ``toml`` module name. + +However, the ``toml`` package on PyPI is widely used, so there are backward +compatibility concerns. Since the standard library takes precedence over third +party packages, users who have pinned versions of ``toml`` would be broken when +upgrading Python versions by any API incompatibilities. -Package name ------------- +Note the importance of "pinned". That is, even if we were able to get control +over the ``toml`` PyPI package and repurpose it as a standard library backport, +we would still break users with pinned packages. This is especially unfortunate, +since pinning is a common response to breaking changes. -Ideally, we would be able to use the ``toml`` package name. The ``toml`` package -on PyPI is both widely used and not actively maintained. +There are several API incompatibilities between ``toml`` and the API proposed in +this PEP. Here are the differences that a significant fraction of users are +likely to run into: -If the maintainer of ``toml`` resurfaced and was willing to give up the ``toml`` -name on PyPI, we could repurpose the PyPI package as a stdlib backport. However, -this would still be breaking for users who have pinned current versions of the -``toml`` package and have upgraded Python versions. +* Use of ``toml.dump`` and ``toml.dumps``, since this PEP proposes to not + include an API for writing TOML. +* ``toml.load`` accepts a non-overlapping set of types from the proposed API for + ``tomllib.load``. See `here `_ for the rationale. +* For invalid TOML, ``toml`` raises ``toml.TomlDecodeError`` vs the proposed + :pep:`8` compliant ``tomllib.TOMLDecodeError``. -The three API incompatibilities that most users of current ``toml`` would run into -are a) different acceptable types to the first argument of ``toml.load``, b) use -of ``dump[s]`` if we choose not to include a write API, c) -``toml.TomlDecodeError`` vs the PEP 8 compliant ``toml.TOMLDecodeError``. There -are other minor or less widely used API differences; if interested, refer to this -`discuss post `_. +There are other minor or less widely used API differences. If interested, refer +to `Appendix A`_. + +Finally, the ``toml`` package on PyPI is not actively maintained and `we have +been unable to contact the author `, +so action here would likely have to be done without the author's consent. This PEP proposes ``tomllib``. This mirrors ``plistlib`` (another file format module in the standard library), as well as several others such as ``pathlib``, @@ -380,16 +388,11 @@ module in the standard library), as well as several others such as ``pathlib``, Other bikesheds include: * ``tomlparser``. This mirrors ``configparser``, but is perhaps slightly less - appropriate if we include a write API. + appropriate if we include a write API in the future. * ``tomli``. This assumes we use ``tomli`` as the basis for implementation. * ``toml``, but under some namespace, such as ``parser.toml``. However, this is - sort of awkward, especially since existing libraries like ``json``, ``pickle``, - ``marshal``, ``html`` etc. will not be included in the namespace. - -Only including an API for reading TOML --------------------------------------- - -Currently discussed in rejected ideas but a major open issue. + awkward, especially so since existing libraries like ``json``, ``pickle``, + ``marshal``, ``html`` etc. would not be included in the namespace. TODO: Random things @@ -426,6 +429,105 @@ References https://github.com/hukkin/tomli-w +.. _Appendix A: + +Appendix A: Differences between proposed API and ``toml`` +========================================================= + +This appendix covers the differences between the API proposed in this PEP and +that of the third party package ``toml``. These differences are relevant to +understanding the amount of breakage we could expect if we used the ``toml`` +name for the standard library module, as well as to better understand the design +space. Note that this list might not be exhaustive. + +#. This PEP currently proposes not to include a write API. That is, there will + be no equivalent of ``toml.dump`` or ``toml.dumps``. + + Discussed at TODO section link. + +#. Different first argument of ``toml.load`` + + ``toml.load`` has the following signature: + + .. code-block:: + + def load( + f: Union[SupportsRead[str], str, bytes, list[PathLike | str | bytes]], + _dict: Type[MutableMapping[str, Any]] = ..., + decoder: TomlDecoder = ..., + ) -> MutableMapping[str, Any]: ... + + This is pretty different from the first argument proposed in this PEP: ``SupportsRead[bytes]``. + + Recapping the reasons for this, previously mentioned at + ``_: + + * Allowing passing of paths (and lists of path-like objects, reading the first + path that exists) is inconsistent with other similar functions in the standard + library. + * Using ``SupportsRead[bytes]`` allows us to a) ensure utf-8 is the encoding used, + b) avoid incorrectly parsing single carriage returns as valid TOML due to + universal newlines. TOML specifies file encoding and valid newline + sequences, and hence is simply stricter format than what text file objects + represent. + +#. ``toml.load[s]`` accepts a ``_dict`` argument + + Discussed at ``_. + + As discussed, almost all usage consists of ``_dict=OrderedDict``, which is + not necessary in Python 3.7 and later. + +#. ``toml.load[s]`` support an undocumented ``decoder`` argument + + It seems the intended use case is for an implementation of comment + preservation. The information recorded is not sufficient to roundtrip the + TOML document preserving style, the implementation has known bugs, the + feature is undocumented and I could only find one instance of its use on + https://grep.app. + + The ``toml.TomlDecoder`` interface exposed is not simple, containing nine methods. + See `here `__. + + Users are probably better served by a more complete implementation of style + preserving parsing and writing. + +#. ``toml.dump[s]`` support an ``encoder`` argument + + Note that we currently propose not to include a write API, however if that + were to change, these differences would likely become relevant. + + This enables two use cases, a) control over how custom types should be + serialised, b) control over how output should be formatted. + + The first use case is reasonable, however, I could only find two instances of + this on https://grep.app. One of these two instances used this ability to add + support for dumping ``decimal.Decimal`` (which a potential standard library + implementation would support out of the box). + + If needed, this use case could be well served by the equivalent of the + ``default`` argument in ``json.dump``. + + The second use case is enabled by allowing users to specify subclasses of + ``toml.TomlEncoder`` and overriding methods to specify parts of the TOML + writing process. The API consists of five methods and exposes a lot of + implementation detail. See `here `__. + + There is some usage of the ``encoder`` API on https://grep.app, however, it + likely accounts for a tiny fraction of overall usage of ``toml``. + +#. Timezones + + ``toml`` uses and exposes custom ``toml.tz.TomlTz`` timezone objects. The + proposed implementation uses ``datetime.timezone`` objects from the standard + library. + +#. Errors + + ``toml`` raises ``TomlDecodeError`` vs the proposed PEP 8 compliant + ``TOMLDecodeError``. + + Copyright ========= From 96aa4fe01caf2d80c7338a46aceeebbe66d7fb72 Mon Sep 17 00:00:00 2001 From: hauntsaninja <> Date: Mon, 3 Jan 2022 22:42:23 -0600 Subject: [PATCH 11/23] Change to propose not including a write API --- pep-9999.rst | 214 +++++++++++++++++++++++++++++---------------------- 1 file changed, 121 insertions(+), 93 deletions(-) diff --git a/pep-9999.rst b/pep-9999.rst index 05902a8a914..92b17b2d7fb 100644 --- a/pep-9999.rst +++ b/pep-9999.rst @@ -1,5 +1,5 @@ PEP: 9999 -Title: Support for TOML in the Standard Library +Title: Support for parsing TOML in the Standard Library Author: Taneli Hukkinen, Shantanu Jain Sponsor: TODO PEP-Delegate: TODO @@ -15,8 +15,8 @@ Post-History: 1900-01-01 Abstract ======== -This proposes adding a module, ``tomllib``, to the standard library for parsing -and writing TOML. [1]_ +This proposes adding a module, ``tomllib``, to the standard library for +parsing TOML. [1]_ Motivation @@ -45,23 +45,21 @@ comparison, ``requests`` has about 28k reverse dependencies. Rationale ========= -This PEP proposes basing the standard library support for TOML on the third party -libraries ``tomli`` [2]_ and ``tomli-w`` [3]_. +This PEP proposes basing the standard library support for reading TOML on the +third party library ``tomli`` [2]_. Many projects have recently switched to using ``tomli``, for example, ``pip``, ``build``, ``pytest``, ``mypy``, ``black``, ``flit``, ``coverage``, ``setuptools-scm``, ``cibuildwheel``. -These libraries are actively maintained and well-tested. ``tomli`` is about 800 -lines of code with 100% test coverage. ``tomli-w`` is about 200 lines of code with -100% test coverage. +``tomli`` is actively maintained and well-tested. ``tomli`` is about 800 +lines of code with 100% test coverage and passes all tests in the official TOML +compliance test suite. Specification ============= -Read API - .. code-block:: def load(fp: SupportsRead[bytes], /, *, parse_float: Callable[[str], Any] = float) -> dict[str, Any]: ... @@ -78,25 +76,8 @@ For example, ``decimal.Decimal`` in cases where precision is important. ``tomllib.TOMLDecodeError`` is raised in the case of invalid TOML. -Write API - -.. code-block:: - - def dump(obj: Mapping[str, Any], fp: SupportsWrite[bytes], /, *, multiline_strings: bool = False) -> None: ... - def dumps(obj: Mapping[str, Any], /, *, multiline_strings: bool = False) -> str: ... - - -``tomllib.dumps`` serialize obj as a TOML formatted stream to a -``.write()``-supporting file-like object. - -``tomllib.dump`` serializes an object to a TOML formatted str. - - -``multiline_strings`` controls whether strings containing newlines are written -as multiline strings. This defaults to False in case users wish to ensure -preservation of newline byte sequences. - -TODO: describe types supported +Note that we currently do not propose ``tomllib.dump`` or ``tomllib.dumps`` +functions, see ``_ for details. Maintenance Implications @@ -118,9 +99,9 @@ major breaking changes, we should preserve support for TOML v1. Maintainability of proposed implementation ------------------------------------------ -The proposed implementation (``tomli`` and ``tomli-w``) is in pure Python, well -tested and combined weigh under 1000 lines of code. They are both minimalistic, -offering a smaller API surface area than other TOML implementations. +The proposed implementation (``tomli``) is in pure Python, well tested and +weighs under 1000 lines of code. It is minimalistic, offering a smaller API +surface area than other TOML implementations. The author of ``tomli`` is willing to help integrate ``tomli`` into the standard library and help maintain it, `as per this @@ -186,8 +167,9 @@ Rejected Ideas Roundtripping style ------------------- -In general, ``tomllib.dumps(tomllib.loads(x))`` may not equal ``x``, since we -make no effort to preserve comments, whitespace or other stylistic choices. +In general, ``tomllib.dumps(tomllib.loads(x))`` may not equal the same string as +``x``, since we make no effort to preserve comments, whitespace or other +stylistic choices. Style preservation would allow tools to losslessly edit TOML files. Since TOML is intended as human-readable and human-editable configuration, it's important @@ -234,8 +216,8 @@ Potential alternatives include: It's unclear what we would get from this: ``tomli`` meets our needs and the author is willing to help with its inclusion in the standard library. -Only including an API for reading TOML --------------------------------------- +Including an API for writing TOML +--------------------------------- There are several reasons to not include an API for writing TOML: @@ -247,25 +229,33 @@ As discussed in the previous section, use cases that involve editing TOML (as opposed to writing brand new TOML) are better served by a style preserving library. -Values in TOML can be represented in multiple ways. To the extent that users -want control over how the output TOML ends up being formatted (how to format -strings, when to inline arrays or tables, how much to indent, whether to reorder -contents, etc), they will not be served well by the proposed API. +There are several degrees of freedom in how to design a write API. For example, +how much control to allow users over output formatting, over serialization of +custom types, and over input and output validation. While there are reasonable +choices on how to resolve these, the nature of the standard library is such that +one only gets one chance to get things right. See `Appendix B`_. for an overview +of some of the design questions. + +Currently no CPython core developers have expressed willingness to maintain a +write API or sponsor a PEP proposing a write API. Since it is hard to change or +remove something in the standard library, it is safer to err on the side of +exclusion and potentially revisit later. + +That said, here are reasons to include an API for writing TOML: -The standard library does not need to do everything and if we feel that most -users are better served by more powerful third party write APIs, exclusion is -acceptable (and could be revisited later). +Users will likely expect a write API to be available for consistency. -However, users will likely expect a write API to be available for consistency. -Empirically, writing TOML seems useful, e.g. ``toml.dump`` is used about 30% as -often as ``toml.load`` based on https://grep.app +Empirically, writing TOML seems useful. On https://grep.app, there are about +1.3k hits for "toml.load" and "tomli.load", compared to about 400 hits for +"toml.dump" and "tomli_w.dump". Even a simple API is capable of serving common use cases, such as testing code -that loads TOML or writing simple or boilerplate TOML. -TODO: about 1/5 uses of ``toml.dump[s]`` are in tests, estimate other simple use cases +that loads TOML or writing out simple or boilerplate TOML. +TODO: estimate prevalence of simple use cases If we keep feature set narrow, a write API shouldn't be too much additional -burden. The proposed implementation is about 200 lines of code. +burden. The fairly minimal implementation in ``tomli-w`` is about 200 lines +of code. Assorted API details @@ -305,48 +295,6 @@ class was passed for friendlier KeyErrors, in another case, the custom class had several additional lookup and mutation methods (e.g. to help resolve dotted keys). -Allowing users more control over formatting ``tomllib.dump[s]`` output -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -As mentioned, TOML values can be represented in multiple ways, so inevitably, -people will have strong opinions over how to do so. - -The ``toml`` library on PyPI supports this using custom subclasses of -``toml.TomlEncoder``. There are a handful of instances of this that can be found -on https://grep.app. However, the API to do this is not particularly clean. - -A non-exhaustive list of potential options users may want control over: - -* How to format strings -* When to inline arrays or tables -* How much to indent -* Whether to reorder contents -* Whether to use dotted keys - -In several cases, users could enforce TOML formatting by using an autoformatter -of their choice at a later point. - -We acknowledge that supporting ``multiline_strings`` is something of an -exception to this, if controversial we can err on the side of simplicity and -remove it. - -Allowing users more control over ``tomllib.dump[s]`` serialisation -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -It could be useful to add the equivalent of the ``default`` argument in ``json.dump`` -to allow users to specify how custom types should be serialised. - -The ``toml`` library on PyPI supports this using custom subclasses of -``toml.TomlEncoder``. However, we could find two instances of using -``toml.TomlEncoder`` to accomplish this kind of thing on https://grep.app, one -of which was to add support to ``toml`` for dumping ``decimal.Decimal``. - -TOML is used more for configuration than serialisation of arbitrary data, so -users are perhaps less likely to require custom serialisation than with say -JSON. Support for this could be added in a backward compatible way. - -TODO: talk about output validation for ``dump[s]`` - Alternative names for module ---------------------------- @@ -443,7 +391,7 @@ space. Note that this list might not be exhaustive. #. This PEP currently proposes not to include a write API. That is, there will be no equivalent of ``toml.dump`` or ``toml.dumps``. - Discussed at TODO section link. + Discussed at ``_. #. Different first argument of ``toml.load`` @@ -498,7 +446,7 @@ space. Note that this list might not be exhaustive. were to change, these differences would likely become relevant. This enables two use cases, a) control over how custom types should be - serialised, b) control over how output should be formatted. + serialized, b) control over how output should be formatted. The first use case is reasonable, however, I could only find two instances of this on https://grep.app. One of these two instances used this ability to add @@ -528,6 +476,86 @@ space. Note that this list might not be exhaustive. ``TOMLDecodeError``. +.. _Appendix B: + +Appendix B: Designing a write API +================================= + +This appendix discusses some of the degrees of freedom in the design space of +write APIs. This list is not exhaustive. + + +Providing users control over formatting +--------------------------------------- + +Values in TOML can be represented in multiple ways. This is a feature of TOML: +it allows users to phrase things to maximize subjective readability. +Inevitably, people will have strong opinions over how to do so. + +Here is a non-exhaustive list of potential options users may want control over: + +* How much to indent +* How to format strings (single-line or multiple-line, basic or literal) +* Whether newline sequences should be normalized (perhaps depending on ``os.linesep``) +* When to inline arrays or tables +* Whether to reorder contents +* Whether to use dotted keys + +This isn't hypothetical, there are several instances in open source code of +users attempting to achieve TOML output with a given formatting. + +The ``tomli-w`` library contains only one option to customise output formatting: +controlling whether strings containing newlines are written as multiline +strings. This option is a little tricky (and so defaults to False), +since it loses semantics that guarantee the bytes in newline sequences, for +instance in the case of ``tomli_w.dumps(tomli.loads(r'''s = "\r\n"'''), +multiline_strings=True)`` + +The ``toml`` library supports output formatting using custom subclasses of +``toml.TomlEncoder``. However, the API exposes a lot of implementation detail, +essentially allowing users to override parts of the TOML writing process. See +`here +`__. + +The ``tomlkit`` library is fully style preserving and allows users to specify +the exact output they want using an imperative document construction API. + +It remains an option to make output mostly non-customizable, which should +maximizes forwards compatibility. In addition, in several cases users could +choose to enforce TOML formatting by using an autoformatter of their choice at a +later point. + + +Providing users control over serialization +------------------------------------------ + +It needs to be determined which types can be serialized to TOML out of the box. +For instance, ``tomli-w`` supports dumping ``decimal.Decimal``, while ``toml`` +does not. + +It could be useful to add the equivalent of the ``default`` argument in ``json.dump`` +to allow users to specify how custom types should be serialized. + +The ``toml`` library on PyPI supports this using subclasses of +``toml.TomlEncoder``. However, this functionality seems not often used in +practice. TOML is used more for configuration than serialization of arbitrary +data, so users are perhaps less likely to require custom serialization than with +say JSON. + +It would be easy to add support for this later in a backward compatible way. + + +Providing users control over validation +--------------------------------------- + +TODO + +Should we guarantee that either output TOML is valid or an error is raised? +(``tomli-w`` does not have this guarantee) + +Should we detect circular references? (``toml``does, but ``tomli-w`` does not) + + Copyright ========= From 4f3d33359c6a5ef2d3a959f34331c42b8f17a70d Mon Sep 17 00:00:00 2001 From: hauntsaninja <> Date: Mon, 3 Jan 2022 23:07:59 -0600 Subject: [PATCH 12/23] Discuss `parse_float` --- pep-9999.rst | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/pep-9999.rst b/pep-9999.rst index 92b17b2d7fb..f258a131a33 100644 --- a/pep-9999.rst +++ b/pep-9999.rst @@ -295,6 +295,18 @@ class was passed for friendlier KeyErrors, in another case, the custom class had several additional lookup and mutation methods (e.g. to help resolve dotted keys). +Removing support for ``parse_float`` in ``tomllib.load[s]`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This option is not strictly necessary, since TOML floats are "IEEE 754 binary64 +values", which is ``float``. Using ``decimal.Decimal`` thus allows users extra +precision not promised by the TOML format. However, in the author of ``tomli``'s +experience, this is useful in scientific and financial applications. Many +TOML-facing users are probably not developers and are not aware of what the +limits of double-precision float. + +TODO: user quotes + Alternative names for module ---------------------------- From cd038699eb2b3a2a349c77cea6a40611b3dd0e08 Mon Sep 17 00:00:00 2001 From: hauntsaninja <> Date: Mon, 3 Jan 2022 23:48:45 -0600 Subject: [PATCH 13/23] Reword discussion of pins --- pep-9999.rst | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/pep-9999.rst b/pep-9999.rst index f258a131a33..880892c737e 100644 --- a/pep-9999.rst +++ b/pep-9999.rst @@ -317,10 +317,12 @@ compatibility concerns. Since the standard library takes precedence over third party packages, users who have pinned versions of ``toml`` would be broken when upgrading Python versions by any API incompatibilities. -Note the importance of "pinned". That is, even if we were able to get control -over the ``toml`` PyPI package and repurpose it as a standard library backport, -we would still break users with pinned packages. This is especially unfortunate, -since pinning is a common response to breaking changes. +To further clarify, the user pins are the specific concern here. Even if we were +able to get control over the ``toml`` PyPI package and repurpose it as a +standard library backport, we would still break users who have pinned to +versions of the current ``toml`` package. This is unfortunate, since pinning +would likely be a common response to breaking changes introduced by repurposing +the ``toml`` package as an (incompatible) backport. There are several API incompatibilities between ``toml`` and the API proposed in this PEP. Here are the differences that a significant fraction of users are @@ -335,7 +337,7 @@ likely to run into: :pep:`8` compliant ``tomllib.TOMLDecodeError``. There are other minor or less widely used API differences. If interested, refer -to `Appendix A`_. +to `Appendix A`_ for a more complete listing. Finally, the ``toml`` package on PyPI is not actively maintained and `we have been unable to contact the author `, From 0c7c0c25784abbfd153d360e9c0bb732fd2c6832 Mon Sep 17 00:00:00 2001 From: Shantanu <12621235+hauntsaninja@users.noreply.github.com> Date: Tue, 4 Jan 2022 11:54:21 -0600 Subject: [PATCH 14/23] Apply suggestions from code review Co-authored-by: Taneli Hukkinen <3275109+hukkin@users.noreply.github.com> --- pep-9999.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/pep-9999.rst b/pep-9999.rst index 880892c737e..6c3760d9113 100644 --- a/pep-9999.rst +++ b/pep-9999.rst @@ -265,7 +265,7 @@ Types accepted by the first argument of ``tomllib.load`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``toml`` library on PyPI allows passing paths (and lists of path-like -objects, reading the first path that exists). Doing this would be inconsistent +objects, merging the documents into a single object). Doing this would be inconsistent with ``json.load``, ``pickle.load``, etc. If we agree consistency with other stdlib modules is desirable, allowing paths is somewhat out of scope for this PEP. This can easily and more explicitly be worked around in user code. @@ -424,8 +424,8 @@ space. Note that this list might not be exhaustive. Recapping the reasons for this, previously mentioned at ``_: - * Allowing passing of paths (and lists of path-like objects, reading the first - path that exists) is inconsistent with other similar functions in the standard + * Allowing passing of paths (and lists of path-like objects, merging the documents + into a single object) is inconsistent with other similar functions in the standard library. * Using ``SupportsRead[bytes]`` allows us to a) ensure utf-8 is the encoding used, b) avoid incorrectly parsing single carriage returns as valid TOML due to From 72b14b88abb4eaa97550fe4345ac2b60fed28b4f Mon Sep 17 00:00:00 2001 From: hauntsaninja <> Date: Tue, 4 Jan 2022 11:56:56 -0600 Subject: [PATCH 15/23] Update test suite language --- pep-9999.rst | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/pep-9999.rst b/pep-9999.rst index 6c3760d9113..bcdea96e86d 100644 --- a/pep-9999.rst +++ b/pep-9999.rst @@ -52,9 +52,11 @@ Many projects have recently switched to using ``tomli``, for example, ``pip``, ``build``, ``pytest``, ``mypy``, ``black``, ``flit``, ``coverage``, ``setuptools-scm``, ``cibuildwheel``. -``tomli`` is actively maintained and well-tested. ``tomli`` is about 800 -lines of code with 100% test coverage and passes all tests in the official TOML -compliance test suite. +``tomli`` is actively maintained and well-tested. ``tomli`` is about 800 lines +of code with 100% test coverage and passes all tests in a test suite `proposed to +be the official TOML compliance test suite `. + +TODO check tomli continues to pass https://github.com/BurntSushi/toml-test Specification From 6942c8f64bd8bd270ccafd283f8aaa4fd50937f5 Mon Sep 17 00:00:00 2001 From: hauntsaninja <> Date: Tue, 4 Jan 2022 12:18:53 -0600 Subject: [PATCH 16/23] Several phrasing nits --- pep-9999.rst | 78 +++++++++++++++++++++++++++++----------------------- 1 file changed, 44 insertions(+), 34 deletions(-) diff --git a/pep-9999.rst b/pep-9999.rst index bcdea96e86d..8abddf24fca 100644 --- a/pep-9999.rst +++ b/pep-9999.rst @@ -53,10 +53,10 @@ Many projects have recently switched to using ``tomli``, for example, ``pip``, ``setuptools-scm``, ``cibuildwheel``. ``tomli`` is actively maintained and well-tested. ``tomli`` is about 800 lines -of code with 100% test coverage and passes all tests in a test suite `proposed to -be the official TOML compliance test suite `. +of code with 100% test coverage and passes all tests in a test suite `proposed as +the official TOML compliance test suite `. -TODO check tomli continues to pass https://github.com/BurntSushi/toml-test +TODO: check tomli continues to pass https://github.com/BurntSushi/toml-test Specification @@ -95,20 +95,20 @@ see TOML has had no major changes since April 2020 and has had two releases in the last five years. In the event of changes to the TOML specification, we could treat minor -revisions as bugfixes and update the implementation in place. In the event of +revisions as bug fixes and update the implementation in place. In the event of major breaking changes, we should preserve support for TOML v1. Maintainability of proposed implementation ------------------------------------------ The proposed implementation (``tomli``) is in pure Python, well tested and -weighs under 1000 lines of code. It is minimalistic, offering a smaller API +weighs under 1000 lines of code. It is minimalist, offering a smaller API surface area than other TOML implementations. The author of ``tomli`` is willing to help integrate ``tomli`` into the standard library and help maintain it, `as per this -`__. At least -one CPython core dev has indicated potential willingness to maintain it, +`__. +One CPython core dev has indicated willingness to maintain a read API, `as per this `__. @@ -204,7 +204,7 @@ Potential alternatives include: than that of ``tomli``. It has some very limited and mostly unused ability to preserve style through an undocumented decoder API. It has the ability to customise output style through a complicated encoder API. For more details on - API differences, refer to `Appendix A`_. + API differences to this PEP, refer to `Appendix A`_. * ``pytomlpp``. ``pytomlpp`` is a Python wrapper for the C++ project ``toml++``. Pure Python @@ -239,8 +239,8 @@ one only gets one chance to get things right. See `Appendix B`_. for an overview of some of the design questions. Currently no CPython core developers have expressed willingness to maintain a -write API or sponsor a PEP proposing a write API. Since it is hard to change or -remove something in the standard library, it is safer to err on the side of +write API or sponsor a PEP that includes a write API. Since it is hard to change +or remove something in the standard library, it is safer to err on the side of exclusion and potentially revisit later. That said, here are reasons to include an API for writing TOML: @@ -267,10 +267,11 @@ Types accepted by the first argument of ``tomllib.load`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``toml`` library on PyPI allows passing paths (and lists of path-like -objects, merging the documents into a single object). Doing this would be inconsistent -with ``json.load``, ``pickle.load``, etc. If we agree consistency with other -stdlib modules is desirable, allowing paths is somewhat out of scope for this -PEP. This can easily and more explicitly be worked around in user code. +objects, ignoring missing files and merging the documents into a single object). +Doing this would be inconsistent with ``json.load``, ``pickle.load``, etc. If we +agree consistency with other stdlib modules is desirable, allowing paths is +somewhat out of scope for this PEP. This can easily and explicitly be worked +around in user code. The proposed API takes a ``SupportsRead[bytes]``, while ``toml.load`` takes a ``SupportsRead[str]`` and ``json.load`` takes ``SupportsRead[str | bytes]``. @@ -291,11 +292,10 @@ in a backward compatible way. The ``toml`` library on PyPI supports this feature using the ``_dict`` argument. There are several uses of this on https://grep.app, however, almost all of them -were passing ``_dict=OrderedDict``, which should no longer be necessary post -Python 3.7. There were two instances of legitimate use: in one case, a custom -class was passed for friendlier KeyErrors, in another case, the custom class had -several additional lookup and mutation methods (e.g. to help resolve dotted -keys). +were passing ``_dict=OrderedDict``, which should be unnecessary as of Python +3.7. There were two instances of legitimate use: in one case, a custom class was +passed for friendlier KeyErrors, in another case, the custom class had several +additional lookup and mutation methods (e.g. to help resolve dotted keys). Removing support for ``parse_float`` in ``tomllib.load[s]`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -303,9 +303,9 @@ Removing support for ``parse_float`` in ``tomllib.load[s]`` This option is not strictly necessary, since TOML floats are "IEEE 754 binary64 values", which is ``float``. Using ``decimal.Decimal`` thus allows users extra precision not promised by the TOML format. However, in the author of ``tomli``'s -experience, this is useful in scientific and financial applications. Many -TOML-facing users are probably not developers and are not aware of what the -limits of double-precision float. +experience, this is useful in scientific and financial applications. TOML-facing +users may include non-developers who are not aware of the limits of +double-precision float. TODO: user quotes @@ -324,14 +324,14 @@ able to get control over the ``toml`` PyPI package and repurpose it as a standard library backport, we would still break users who have pinned to versions of the current ``toml`` package. This is unfortunate, since pinning would likely be a common response to breaking changes introduced by repurposing -the ``toml`` package as an (incompatible) backport. +the ``toml`` package as a backport (that is incompatible with today's ``toml``). There are several API incompatibilities between ``toml`` and the API proposed in this PEP. Here are the differences that a significant fraction of users are likely to run into: -* Use of ``toml.dump`` and ``toml.dumps``, since this PEP proposes to not - include an API for writing TOML. +* Use of ``toml.dump`` and ``toml.dumps``, since this PEP does not propose + an API for writing TOML. * ``toml.load`` accepts a non-overlapping set of types from the proposed API for ``tomllib.load``. See `here `_ for the rationale. @@ -343,7 +343,7 @@ to `Appendix A`_ for a more complete listing. Finally, the ``toml`` package on PyPI is not actively maintained and `we have been unable to contact the author `, -so action here would likely have to be done without the author's consent. +so action here would likely have to be taken without the author's consent. This PEP proposes ``tomllib``. This mirrors ``plistlib`` (another file format module in the standard library), as well as several others such as ``pathlib``, @@ -409,6 +409,12 @@ space. Note that this list might not be exhaustive. Discussed at ``_. + If we included a write API, it would be relatively simple to convert most + code that uses ``toml`` to use the API proposed in this PEP (acknowledging + that that is very different from a compatible API). + + A significant fraction of ``toml`` users rely on this. + #. Different first argument of ``toml.load`` ``toml.load`` has the following signature: @@ -426,15 +432,24 @@ space. Note that this list might not be exhaustive. Recapping the reasons for this, previously mentioned at ``_: - * Allowing passing of paths (and lists of path-like objects, merging the documents - into a single object) is inconsistent with other similar functions in the standard - library. + * Allowing passing of paths (and lists of path-like objects, ignoring missing + files and merging the documents into a single object) is inconsistent with + other similar functions in the standard library. * Using ``SupportsRead[bytes]`` allows us to a) ensure utf-8 is the encoding used, b) avoid incorrectly parsing single carriage returns as valid TOML due to universal newlines. TOML specifies file encoding and valid newline sequences, and hence is simply stricter format than what text file objects represent. + A significant fraction of ``toml`` users rely on this. + +#. Errors + + ``toml`` raises ``TomlDecodeError`` vs the proposed PEP 8 compliant + ``TOMLDecodeError``. + + A significant fraction of ``toml`` users rely on this. + #. ``toml.load[s]`` accepts a ``_dict`` argument Discussed at ``_. @@ -486,11 +501,6 @@ space. Note that this list might not be exhaustive. proposed implementation uses ``datetime.timezone`` objects from the standard library. -#. Errors - - ``toml`` raises ``TomlDecodeError`` vs the proposed PEP 8 compliant - ``TOMLDecodeError``. - .. _Appendix B: From 9d497a22a755479eef02ee406458bf93e5766127 Mon Sep 17 00:00:00 2001 From: Shantanu <12621235+hauntsaninja@users.noreply.github.com> Date: Wed, 5 Jan 2022 01:29:03 -0600 Subject: [PATCH 17/23] Apply suggestions from code review Co-authored-by: Taneli Hukkinen <3275109+hukkin@users.noreply.github.com> --- pep-9999.rst | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/pep-9999.rst b/pep-9999.rst index 8abddf24fca..5e194f5365e 100644 --- a/pep-9999.rst +++ b/pep-9999.rst @@ -54,9 +54,8 @@ Many projects have recently switched to using ``tomli``, for example, ``pip``, ``tomli`` is actively maintained and well-tested. ``tomli`` is about 800 lines of code with 100% test coverage and passes all tests in a test suite `proposed as -the official TOML compliance test suite `. - -TODO: check tomli continues to pass https://github.com/BurntSushi/toml-test +the official TOML compliance test suite `, +as well as `the more established BurntSushi/toml-test suite `. Specification @@ -160,8 +159,6 @@ Link to any existing implementation and details about its state, e.g. proof-of-c https://github.com/hukkin/tomli -https://github.com/hukkin/tomli-w - Rejected Ideas ============== From f14654ce0fc15a2cc26c2825907e0a07ad47bb22 Mon Sep 17 00:00:00 2001 From: Petr Viktorin Date: Fri, 7 Jan 2022 03:30:46 +0100 Subject: [PATCH 18/23] Editing for the tomllib PEP (#3) Co-authored-by: Petr Viktorin Co-authored-by: Taneli Hukkinen <3275109+hukkin@users.noreply.github.com> --- pep-9999.rst | 294 +++++++++++++++++---------------------------------- 1 file changed, 99 insertions(+), 195 deletions(-) diff --git a/pep-9999.rst b/pep-9999.rst index 5e194f5365e..4880dd4b4b3 100644 --- a/pep-9999.rst +++ b/pep-9999.rst @@ -1,9 +1,8 @@ PEP: 9999 -Title: Support for parsing TOML in the Standard Library +Title: tomllib: Support for parsing TOML in the Standard Library Author: Taneli Hukkinen, Shantanu Jain -Sponsor: TODO -PEP-Delegate: TODO -Discussions-To: TODO +Sponsor: Petr Viktorin +Discussions-To: https://discuss.python.org/t/adopting-recommending-a-toml-parser/4068 Status: Draft Type: Standards Track Content-Type: text/x-rst @@ -16,7 +15,8 @@ Abstract ======== This proposes adding a module, ``tomllib``, to the standard library for -parsing TOML. [1]_ +parsing TOML (Tom's Obvious Minimal Language, +`https://toml.io `_). Motivation @@ -46,7 +46,8 @@ Rationale ========= This PEP proposes basing the standard library support for reading TOML on the -third party library ``tomli`` [2]_. +third party library ``tomli`` +(`github.com/hukkin/tomli `_). Many projects have recently switched to using ``tomli``, for example, ``pip``, ``build``, ``pytest``, ``mypy``, ``black``, ``flit``, ``coverage``, @@ -54,30 +55,38 @@ Many projects have recently switched to using ``tomli``, for example, ``pip``, ``tomli`` is actively maintained and well-tested. ``tomli`` is about 800 lines of code with 100% test coverage and passes all tests in a test suite `proposed as -the official TOML compliance test suite `, -as well as `the more established BurntSushi/toml-test suite `. +the official TOML compliance test suite `_, +as well as `the more established BurntSushi/toml-test suite `_. Specification ============= +A new module ``tomllib`` with the following functions will be added: + .. code-block:: def load(fp: SupportsRead[bytes], /, *, parse_float: Callable[[str], Any] = float) -> dict[str, Any]: ... def loads(s: str, /, *, parse_float: Callable[[str], Any] = float) -> dict[str, Any]: ... -``tomllib.load`` deserializes a ``.read()``-supporting binary file containing a -TOML document to a Python object. +``tomllib.load`` deserializes a binary file containing a +TOML document to a Python dict. +The ``fp`` argument must have a ``read()`` method with the same API as +``io.RawIOBase.read()``. -``tomllib.loads`` deserializes a str instance containing a TOML document to a -Python object. +``tomllib.loads`` deserializes a str instance containing a TOML document +to a Python dict. ``parse_float`` is a function that takes a string and returns a float, as with ``json.load``. -For example, ``decimal.Decimal`` in cases where precision is important. +For example, ``decimal.Decimal`` can be used in cases where precision is important. + +The returned object contains only basic Python objects (``str``, ``int``, ``bool``, ``float``, +``datetime.{datetime,date,time}``, ``list``, ``dict`` with string keys), +and the results of `parse_float`. ``tomllib.TOMLDecodeError`` is raised in the case of invalid TOML. -Note that we currently do not propose ``tomllib.dump`` or ``tomllib.dumps`` +Note that this PEP does not propose ``tomllib.dump`` or ``tomllib.dumps`` functions, see ``_ for details. @@ -105,17 +114,16 @@ weighs under 1000 lines of code. It is minimalist, offering a smaller API surface area than other TOML implementations. The author of ``tomli`` is willing to help integrate ``tomli`` into the standard -library and help maintain it, `as per this +library and help maintain it, `as per this post `__. -One CPython core dev has indicated willingness to maintain a read API, -`as per this +Petr Viktorin has indicated willingness to maintain a read API, +`as per this post `__. -There is unlikely to be demand for an extension module, since there is -relatively less need for performance in parsing TOML: it's rare for application -bottleneck to be reading configuration. Users with extreme performance needs can -use a third party library (as is already often the case with JSON, despite a -stdlib extension module). +Rewriting the parser in C is not deemed necessary at this time. It's rare for +TOML parsing to be a bottleneck in applications. Users with higher performance +needs can use a third party library (as is already often the case with JSON, +despite a stdlib extension module). TOML support a slippery slope for other things ---------------------------------------------- @@ -127,13 +135,20 @@ other formats, such as YAML or MessagePack. In addition, the simplicity of TOML can help serve as a dividing line, for example, YAML is large and complicated. +Including an API for writing TOML may, however, be added in a future PEP. + Backwards Compatibility ======================= -This will have no backwards compatibility issues as it will create a new API. +This proposal has no backwards compatibility issues within the stdlib, as it +describes a new module. +Any existing third-party module named ``tomllib`` will break, as +``import tomllib`` will import standard library module. +However, ``tomllib`` is not registered on PyPI, so it is unlikely that such +a module is widely used. -Note that we avoid using the ``toml`` name for the module, to avoid backwards +Note that we avoid using the more straightforward name ``toml``, to avoid backwards compatibility implications for users who have pinned versions of the current ``toml`` PyPI package. For more details, see ``_. @@ -141,8 +156,11 @@ compatibility implications for users who have pinned versions of the current Security Implications ===================== -Errors in the implementation could cause potential security issues. However, the -implementation will be in pure Python, which reduces surface area of attack. +Errors in the implementation could cause potential security issues. +The parser's output is limited to simple data types; inability to load +arbitrary classes avoids security issues common in more "powerful" formats like +pickle and YAML. Also, the implementation will be in pure Python, which reduces +security issues endemic to C, such as buffer overflows. How to Teach This @@ -150,6 +168,9 @@ How to Teach This The API of ``tomllib`` mimics that of other well-established file format libraries, such as ``json`` and ``pickle``. +The lack of a ``dump`` function will be explained in the documentation, +with a link to relevant third-party libraries (``tomlkit``, ``pytomlpp``, +``rtoml``, ``tomli-w``). Reference Implementation @@ -163,26 +184,6 @@ https://github.com/hukkin/tomli Rejected Ideas ============== -Roundtripping style -------------------- - -In general, ``tomllib.dumps(tomllib.loads(x))`` may not equal the same string as -``x``, since we make no effort to preserve comments, whitespace or other -stylistic choices. - -Style preservation would allow tools to losslessly edit TOML files. Since TOML -is intended as human-readable and human-editable configuration, it's important -to preserve human markup. - -However, only a relatively small fraction of use cases require losslessly -editing TOML, as judged by reverse dependencies the style preserving ``tomlkit`` -library compared to that of other third party toml libraries. In particular, we -don't need it for the core Python packaging use cases or for tools that merely -need to read configuration. - -Since this would make both the implementation and the API more complex, it seems -better to relegate this additional functionality to third party libraries. - Basing on another TOML implementation ------------------------------------- @@ -224,37 +225,26 @@ The ability to write TOML is not needed for the use cases that motivate this PEP: for core Python packaging use cases or for tools that need to read configuration. -As discussed in the previous section, use cases that involve editing TOML (as -opposed to writing brand new TOML) are better served by a style preserving -library. +Use cases that involve editing TOML (as opposed to writing brand new TOML) +are better served by a style preserving library. This requires a parser whose +output includes style-related metadata, making it impractical to output plain +Python types like ``str`` and ``dict``. Designing such an API is complicated. -There are several degrees of freedom in how to design a write API. For example, +But even without considering style preservation, there are too many degrees of +freedom in how to design a write API. For example, how much control to allow users over output formatting, over serialization of custom types, and over input and output validation. While there are reasonable choices on how to resolve these, the nature of the standard library is such that -one only gets one chance to get things right. See `Appendix B`_. for an overview -of some of the design questions. +one only gets one chance to get things right. Currently no CPython core developers have expressed willingness to maintain a write API or sponsor a PEP that includes a write API. Since it is hard to change or remove something in the standard library, it is safer to err on the side of exclusion and potentially revisit later. -That said, here are reasons to include an API for writing TOML: - -Users will likely expect a write API to be available for consistency. - -Empirically, writing TOML seems useful. On https://grep.app, there are about -1.3k hits for "toml.load" and "tomli.load", compared to about 400 hits for -"toml.dump" and "tomli_w.dump". - -Even a simple API is capable of serving common use cases, such as testing code -that loads TOML or writing out simple or boilerplate TOML. -TODO: estimate prevalence of simple use cases - -If we keep feature set narrow, a write API shouldn't be too much additional -burden. The fairly minimal implementation in ``tomli-w`` is about 200 lines -of code. +So, writing TOML is left to third-party libraries. +If a good API and relevant use cases for it are found later, it can be added +in a future PEP. Assorted API details @@ -268,37 +258,54 @@ objects, ignoring missing files and merging the documents into a single object). Doing this would be inconsistent with ``json.load``, ``pickle.load``, etc. If we agree consistency with other stdlib modules is desirable, allowing paths is somewhat out of scope for this PEP. This can easily and explicitly be worked -around in user code. +around in user code, or a third-party library. -The proposed API takes a ``SupportsRead[bytes]``, while ``toml.load`` takes a -``SupportsRead[str]`` and ``json.load`` takes ``SupportsRead[str | bytes]``. -Using ``SupportsRead[bytes]`` allows us to a) ensure utf-8 is the encoding used, +The proposed API takes a binary file, while ``toml.load`` takes a +text file and ``json.load`` takes either. +Using a binary file allows us to a) ensure utf-8 is the encoding used, b) avoid incorrectly parsing single carriage returns as valid TOML due to universal newlines. +Type accepted by the first argument of ``tomllib.loads`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +While ``tomllib.load`` takes a binary file, ``tomllib.loads`` takes +a text string. This may seem inconsistent at first. + +Quoting TOML v1.0.0 specification: + +> A TOML file must be a valid UTF-8 encoded Unicode document. + +``tomllib.loads`` does not intend to load a TOML file, but rather the +document that the file stores. The most natural representation of +a Unicode document in Python is ``str``, not ``bytes``. + +It is possible to add ``bytes`` support in the future if needed, but +we are not aware of any use cases for it. + Controlling the type of mappings returned by ``tomllib.load[s]`` -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +---------------------------------------------------------------- -This would work similarly to common uses for the ``object_hook`` argument in -``json.load[s]``. +The ``toml`` library on PyPI supports a ``_dict`` argument, which works +similarly to the ``object_hook`` argument in ``json.load[s]``. There are +several uses of ``_dict`` found on https://grep.app, however, almost all of them +are passing ``_dict=OrderedDict``, which should be unnecessary as of Python +3.7. We found two instances of legitimate use: in one case, a custom class was +passed for friendlier KeyErrors, in another case, the custom class had several +additional lookup and mutation methods (e.g. to help resolve dotted keys). Such an argument is not necessary for the core use cases outlined in the motivation section. The absence of this can be pretty easily worked around using -a wrapper class or transformer function. Finally, support could be added later -in a backward compatible way. +a wrapper class, transformer function, or a third-party library. Finally, +support could be added later in a backward compatible way. -The ``toml`` library on PyPI supports this feature using the ``_dict`` argument. -There are several uses of this on https://grep.app, however, almost all of them -were passing ``_dict=OrderedDict``, which should be unnecessary as of Python -3.7. There were two instances of legitimate use: in one case, a custom class was -passed for friendlier KeyErrors, in another case, the custom class had several -additional lookup and mutation methods (e.g. to help resolve dotted keys). Removing support for ``parse_float`` in ``tomllib.load[s]`` -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +----------------------------------------------------------- This option is not strictly necessary, since TOML floats are "IEEE 754 binary64 -values", which is ``float``. Using ``decimal.Decimal`` thus allows users extra +values", which is ``float`` on most architectures. Using ``decimal.Decimal`` +thus allows users extra precision not promised by the TOML format. However, in the author of ``tomli``'s experience, this is useful in scientific and financial applications. TOML-facing users may include non-developers who are not aware of the limits of @@ -306,6 +313,11 @@ double-precision float. TODO: user quotes +There are also niche architectures where the Python ``float`` is not a IEEE-754 +binary64. The ``parse_float`` argument allows users to achieve correct TOML +semantics even on such architectures. + + Alternative names for module ---------------------------- @@ -324,19 +336,7 @@ would likely be a common response to breaking changes introduced by repurposing the ``toml`` package as a backport (that is incompatible with today's ``toml``). There are several API incompatibilities between ``toml`` and the API proposed in -this PEP. Here are the differences that a significant fraction of users are -likely to run into: - -* Use of ``toml.dump`` and ``toml.dumps``, since this PEP does not propose - an API for writing TOML. -* ``toml.load`` accepts a non-overlapping set of types from the proposed API for - ``tomllib.load``. See `here `_ for the rationale. -* For invalid TOML, ``toml`` raises ``toml.TomlDecodeError`` vs the proposed - :pep:`8` compliant ``tomllib.TOMLDecodeError``. - -There are other minor or less widely used API differences. If interested, refer -to `Appendix A`_ for a more complete listing. +this PEP, listed in `Appendix A`_. Finally, the ``toml`` package on PyPI is not actively maintained and `we have been unable to contact the author `, @@ -346,12 +346,12 @@ This PEP proposes ``tomllib``. This mirrors ``plistlib`` (another file format module in the standard library), as well as several others such as ``pathlib``, ``graphlib``, etc. -Other bikesheds include: +Other considered names include: * ``tomlparser``. This mirrors ``configparser``, but is perhaps slightly less appropriate if we include a write API in the future. * ``tomli``. This assumes we use ``tomli`` as the basis for implementation. -* ``toml``, but under some namespace, such as ``parser.toml``. However, this is +* ``toml`` under some namespace, such as ``parser.toml``. However, this is awkward, especially so since existing libraries like ``json``, ``pickle``, ``marshal``, ``html`` etc. would not be included in the namespace. @@ -374,22 +374,6 @@ Useful https://grep.app searches (note, ignore vendored): * TomlEncoder subclasses https://grep.app/search?q=TomlEncoder%29%3A&filter[lang][0]=Python -References -========== - -.. [1] - TOML: Tom's Obvious Minimal Language - https://toml.io/en/ - -.. [2] - tomli - https://github.com/hukkin/tomli - -.. [3] - tomli-w - https://github.com/hukkin/tomli-w - - .. _Appendix A: Appendix A: Differences between proposed API and ``toml`` @@ -499,86 +483,6 @@ space. Note that this list might not be exhaustive. library. -.. _Appendix B: - -Appendix B: Designing a write API -================================= - -This appendix discusses some of the degrees of freedom in the design space of -write APIs. This list is not exhaustive. - - -Providing users control over formatting ---------------------------------------- - -Values in TOML can be represented in multiple ways. This is a feature of TOML: -it allows users to phrase things to maximize subjective readability. -Inevitably, people will have strong opinions over how to do so. - -Here is a non-exhaustive list of potential options users may want control over: - -* How much to indent -* How to format strings (single-line or multiple-line, basic or literal) -* Whether newline sequences should be normalized (perhaps depending on ``os.linesep``) -* When to inline arrays or tables -* Whether to reorder contents -* Whether to use dotted keys - -This isn't hypothetical, there are several instances in open source code of -users attempting to achieve TOML output with a given formatting. - -The ``tomli-w`` library contains only one option to customise output formatting: -controlling whether strings containing newlines are written as multiline -strings. This option is a little tricky (and so defaults to False), -since it loses semantics that guarantee the bytes in newline sequences, for -instance in the case of ``tomli_w.dumps(tomli.loads(r'''s = "\r\n"'''), -multiline_strings=True)`` - -The ``toml`` library supports output formatting using custom subclasses of -``toml.TomlEncoder``. However, the API exposes a lot of implementation detail, -essentially allowing users to override parts of the TOML writing process. See -`here -`__. - -The ``tomlkit`` library is fully style preserving and allows users to specify -the exact output they want using an imperative document construction API. - -It remains an option to make output mostly non-customizable, which should -maximizes forwards compatibility. In addition, in several cases users could -choose to enforce TOML formatting by using an autoformatter of their choice at a -later point. - - -Providing users control over serialization ------------------------------------------- - -It needs to be determined which types can be serialized to TOML out of the box. -For instance, ``tomli-w`` supports dumping ``decimal.Decimal``, while ``toml`` -does not. - -It could be useful to add the equivalent of the ``default`` argument in ``json.dump`` -to allow users to specify how custom types should be serialized. - -The ``toml`` library on PyPI supports this using subclasses of -``toml.TomlEncoder``. However, this functionality seems not often used in -practice. TOML is used more for configuration than serialization of arbitrary -data, so users are perhaps less likely to require custom serialization than with -say JSON. - -It would be easy to add support for this later in a backward compatible way. - - -Providing users control over validation ---------------------------------------- - -TODO - -Should we guarantee that either output TOML is valid or an error is raised? -(``tomli-w`` does not have this guarantee) - -Should we detect circular references? (``toml``does, but ``tomli-w`` does not) - - Copyright ========= From f364cb63743de471c5aeb6b53cfdbac335070d49 Mon Sep 17 00:00:00 2001 From: hauntsaninja <> Date: Thu, 6 Jan 2022 18:48:55 -0800 Subject: [PATCH 19/23] Fix RST markup --- pep-9999.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pep-9999.rst b/pep-9999.rst index 4880dd4b4b3..bf6c2eb9792 100644 --- a/pep-9999.rst +++ b/pep-9999.rst @@ -82,7 +82,7 @@ For example, ``decimal.Decimal`` can be used in cases where precision is importa The returned object contains only basic Python objects (``str``, ``int``, ``bool``, ``float``, ``datetime.{datetime,date,time}``, ``list``, ``dict`` with string keys), -and the results of `parse_float`. +and the results of ``parse_float``. ``tomllib.TOMLDecodeError`` is raised in the case of invalid TOML. From 643f5eaba3f2a2effc59a7e477184cc308119e97 Mon Sep 17 00:00:00 2001 From: hauntsaninja <> Date: Fri, 7 Jan 2022 21:45:04 -0800 Subject: [PATCH 20/23] Edits to parse_float text, elide default in signature --- pep-9999.rst | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/pep-9999.rst b/pep-9999.rst index bf6c2eb9792..6546a7fada3 100644 --- a/pep-9999.rst +++ b/pep-9999.rst @@ -66,8 +66,8 @@ A new module ``tomllib`` with the following functions will be added: .. code-block:: - def load(fp: SupportsRead[bytes], /, *, parse_float: Callable[[str], Any] = float) -> dict[str, Any]: ... - def loads(s: str, /, *, parse_float: Callable[[str], Any] = float) -> dict[str, Any]: ... + def load(fp: SupportsRead[bytes], /, *, parse_float: Callable[[str], Any] = ...) -> dict[str, Any]: ... + def loads(s: str, /, *, parse_float: Callable[[str], Any] = ...) -> dict[str, Any]: ... ``tomllib.load`` deserializes a binary file containing a TOML document to a Python dict. @@ -77,8 +77,10 @@ The ``fp`` argument must have a ``read()`` method with the same API as ``tomllib.loads`` deserializes a str instance containing a TOML document to a Python dict. -``parse_float`` is a function that takes a string and returns a float, as with ``json.load``. -For example, ``decimal.Decimal`` can be used in cases where precision is important. +``parse_float`` is a function that takes a string representing a TOML float and +returns a Python object (similar to ``parse_float`` in ``json.load``). For +example, a function returning a ``decimal.Decimal`` in cases where precision is +important. By default, TOML floats are represented as ``float`` type. The returned object contains only basic Python objects (``str``, ``int``, ``bool``, ``float``, ``datetime.{datetime,date,time}``, ``list``, ``dict`` with string keys), From f9744cfaedafa31f531562bf993340e872d62745 Mon Sep 17 00:00:00 2001 From: hauntsaninja <> Date: Fri, 7 Jan 2022 21:49:31 -0800 Subject: [PATCH 21/23] Rewrap text, small fixups --- pep-9999.rst | 94 +++++++++++++++++++++++++--------------------------- 1 file changed, 45 insertions(+), 49 deletions(-) diff --git a/pep-9999.rst b/pep-9999.rst index 6546a7fada3..376858b7d1a 100644 --- a/pep-9999.rst +++ b/pep-9999.rst @@ -54,9 +54,11 @@ Many projects have recently switched to using ``tomli``, for example, ``pip``, ``setuptools-scm``, ``cibuildwheel``. ``tomli`` is actively maintained and well-tested. ``tomli`` is about 800 lines -of code with 100% test coverage and passes all tests in a test suite `proposed as -the official TOML compliance test suite `_, -as well as `the more established BurntSushi/toml-test suite `_. +of code with 100% test coverage and passes all tests in a test suite `proposed +as the official TOML compliance test suite +`_, as well as `the more +established BurntSushi/toml-test suite +`_. Specification @@ -82,9 +84,9 @@ returns a Python object (similar to ``parse_float`` in ``json.load``). For example, a function returning a ``decimal.Decimal`` in cases where precision is important. By default, TOML floats are represented as ``float`` type. -The returned object contains only basic Python objects (``str``, ``int``, ``bool``, ``float``, -``datetime.{datetime,date,time}``, ``list``, ``dict`` with string keys), -and the results of ``parse_float``. +The returned object contains only basic Python objects (``str``, ``int``, +``bool``, ``float``, ``datetime.{datetime,date,time}``, ``list``, ``dict`` with +string keys), and the results of ``parse_float``. ``tomllib.TOMLDecodeError`` is raised in the case of invalid TOML. @@ -150,9 +152,10 @@ Any existing third-party module named ``tomllib`` will break, as However, ``tomllib`` is not registered on PyPI, so it is unlikely that such a module is widely used. -Note that we avoid using the more straightforward name ``toml``, to avoid backwards -compatibility implications for users who have pinned versions of the current -``toml`` PyPI package. For more details, see ``_. +Note that we avoid using the more straightforward name ``toml``, to avoid +backwards compatibility implications for users who have pinned versions of the +current ``toml`` PyPI package. For more details, see ``_. Security Implications @@ -168,19 +171,16 @@ security issues endemic to C, such as buffer overflows. How to Teach This ================= -The API of ``tomllib`` mimics that of other well-established file format libraries, -such as ``json`` and ``pickle``. -The lack of a ``dump`` function will be explained in the documentation, -with a link to relevant third-party libraries (``tomlkit``, ``pytomlpp``, -``rtoml``, ``tomli-w``). +The API of ``tomllib`` mimics that of other well-established file format +libraries, such as ``json`` and ``pickle``. The lack of a ``dump`` function will +be explained in the documentation, with a link to relevant third-party libraries +(``tomlkit``, ``tomli-w``, ``pytomlpp``). Reference Implementation ======================== -Link to any existing implementation and details about its state, e.g. proof-of-concept. - -https://github.com/hukkin/tomli +The proposed implementation can be found at https://github.com/hukkin/tomli Rejected Ideas @@ -192,8 +192,8 @@ Basing on another TOML implementation Potential alternatives include: * ``tomlkit``. - ``tomlkit`` is well established, actively maintained and supports TOML v1. - An important difference is that ``tomlkit`` supports style roundtripping. As a + ``tomlkit`` is well established, actively maintained and supports TOML v1. An + important difference is that ``tomlkit`` supports style roundtripping. As a result, it has a more complex API and implementation (about 5x as much code as ``tomli``). The author does not believe that ``tomlkit`` is a good choice for the standard library. @@ -212,7 +212,8 @@ Potential alternatives include: * ``rtoml``. ``rtoml`` is a Python wrapper for the Rust project ``toml-rs`` and hence has - similar shortcomings to ``pytomlpp``. In addition, it does not support TOML v1. + similar shortcomings to ``pytomlpp``. + In addition, it does not support TOML v1. * Writing from scratch. It's unclear what we would get from this: ``tomli`` meets our needs and the @@ -227,26 +228,25 @@ The ability to write TOML is not needed for the use cases that motivate this PEP: for core Python packaging use cases or for tools that need to read configuration. -Use cases that involve editing TOML (as opposed to writing brand new TOML) -are better served by a style preserving library. This requires a parser whose -output includes style-related metadata, making it impractical to output plain -Python types like ``str`` and ``dict``. Designing such an API is complicated. +Use cases that involve editing TOML (as opposed to writing brand new TOML) are +better served by a style preserving library. This requires a parser whose output +includes style-related metadata, making it impractical to output plain Python +types like ``str`` and ``dict``. Designing such an API is complicated. But even without considering style preservation, there are too many degrees of -freedom in how to design a write API. For example, -how much control to allow users over output formatting, over serialization of -custom types, and over input and output validation. While there are reasonable -choices on how to resolve these, the nature of the standard library is such that -one only gets one chance to get things right. +freedom in how to design a write API. For example, how much control to allow +users over output formatting, over serialization of custom types, and over input +and output validation. While there are reasonable choices on how to resolve +these, the nature of the standard library is such that one only gets one chance +to get things right. Currently no CPython core developers have expressed willingness to maintain a write API or sponsor a PEP that includes a write API. Since it is hard to change or remove something in the standard library, it is safer to err on the side of exclusion and potentially revisit later. -So, writing TOML is left to third-party libraries. -If a good API and relevant use cases for it are found later, it can be added -in a future PEP. +So, writing TOML is left to third-party libraries. If a good API and relevant +use cases for it are found later, it can be added in a future PEP. Assorted API details @@ -262,11 +262,10 @@ agree consistency with other stdlib modules is desirable, allowing paths is somewhat out of scope for this PEP. This can easily and explicitly be worked around in user code, or a third-party library. -The proposed API takes a binary file, while ``toml.load`` takes a -text file and ``json.load`` takes either. -Using a binary file allows us to a) ensure utf-8 is the encoding used, -b) avoid incorrectly parsing single carriage returns as valid TOML due to -universal newlines. +The proposed API takes a binary file, while ``toml.load`` takes a text file and +``json.load`` takes either. Using a binary file allows us to a) ensure utf-8 is +the encoding used, b) avoid incorrectly parsing single carriage returns as valid +TOML due to universal newlines. Type accepted by the first argument of ``tomllib.loads`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -289,11 +288,11 @@ Controlling the type of mappings returned by ``tomllib.load[s]`` ---------------------------------------------------------------- The ``toml`` library on PyPI supports a ``_dict`` argument, which works -similarly to the ``object_hook`` argument in ``json.load[s]``. There are -several uses of ``_dict`` found on https://grep.app, however, almost all of them -are passing ``_dict=OrderedDict``, which should be unnecessary as of Python -3.7. We found two instances of legitimate use: in one case, a custom class was -passed for friendlier KeyErrors, in another case, the custom class had several +similarly to the ``object_hook`` argument in ``json.load[s]``. There are several +uses of ``_dict`` found on https://grep.app, however, almost all of them are +passing ``_dict=OrderedDict``, which should be unnecessary as of Python 3.7. We +found two instances of legitimate use: in one case, a custom class was passed +for friendlier KeyErrors, in another case, the custom class had several additional lookup and mutation methods (e.g. to help resolve dotted keys). Such an argument is not necessary for the core use cases outlined in the @@ -307,13 +306,10 @@ Removing support for ``parse_float`` in ``tomllib.load[s]`` This option is not strictly necessary, since TOML floats are "IEEE 754 binary64 values", which is ``float`` on most architectures. Using ``decimal.Decimal`` -thus allows users extra -precision not promised by the TOML format. However, in the author of ``tomli``'s -experience, this is useful in scientific and financial applications. TOML-facing -users may include non-developers who are not aware of the limits of -double-precision float. - -TODO: user quotes +thus allows users extra precision not promised by the TOML format. However, in +the author of ``tomli``'s experience, this is useful in scientific and financial +applications. TOML-facing users may include non-developers who are not aware of +the limits of double-precision float. There are also niche architectures where the Python ``float`` is not a IEEE-754 binary64. The ``parse_float`` argument allows users to achieve correct TOML From 9f0d8bb27da0ebbb4d1442f4d229b7e3f43cc839 Mon Sep 17 00:00:00 2001 From: hauntsaninja <> Date: Fri, 7 Jan 2022 22:06:55 -0800 Subject: [PATCH 22/23] Restore some detail about style preservation --- pep-9999.rst | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/pep-9999.rst b/pep-9999.rst index 376858b7d1a..c675b4bf0aa 100644 --- a/pep-9999.rst +++ b/pep-9999.rst @@ -229,9 +229,11 @@ PEP: for core Python packaging use cases or for tools that need to read configuration. Use cases that involve editing TOML (as opposed to writing brand new TOML) are -better served by a style preserving library. This requires a parser whose output -includes style-related metadata, making it impractical to output plain Python -types like ``str`` and ``dict``. Designing such an API is complicated. +better served by a style preserving library. TOML is intended as human-readable +and human-editable configuration, so it's important to preserve human markup, +such as comments and formatting. This requires a parser whose output includes +style-related metadata, making it impractical to output plain Python types like +``str`` and ``dict``. Designing such an API is complicated. But even without considering style preservation, there are too many degrees of freedom in how to design a write API. For example, how much control to allow From 58ae5f1d6bfb9880bc030b52e4e550933d40a16e Mon Sep 17 00:00:00 2001 From: hauntsaninja <> Date: Mon, 10 Jan 2022 12:30:58 -0800 Subject: [PATCH 23/23] use pep 680 --- pep-9999.rst => pep-0680.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename pep-9999.rst => pep-0680.rst (99%) diff --git a/pep-9999.rst b/pep-0680.rst similarity index 99% rename from pep-9999.rst rename to pep-0680.rst index c675b4bf0aa..1dffa4a00df 100644 --- a/pep-9999.rst +++ b/pep-0680.rst @@ -1,4 +1,4 @@ -PEP: 9999 +PEP: 680 Title: tomllib: Support for parsing TOML in the Standard Library Author: Taneli Hukkinen, Shantanu Jain Sponsor: Petr Viktorin