Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support trailing commas in ion.dump[s]() #370

Open
mavjop opened this issue Sep 24, 2024 · 14 comments
Open

Support trailing commas in ion.dump[s]() #370

mavjop opened this issue Sep 24, 2024 · 14 comments

Comments

@mavjop
Copy link
Contributor

mavjop commented Sep 24, 2024

I have code that I want to read in ion data from a file, make a modification, and then write it back out to a file. The file will be source controlled in a version control system.

I want to include trailing commas, e.g.:

{
    foo: "bar",
    baz: {
        quux: [
            "thing1",
            "thing2",
            "thing3",
        ],
    },
}

That is, a comma after the last item in a list, a comma after the last item in a map, etc.

Ion supports this, which is a huge plus in my book (kudos, Ion!).

This results in diffs that are minimized (if you add or remove a line, there aren't additional lines of diff which only contain addition or removal of a comma), version conflicts are cleaner, and it reduces the likelihood of errors when humans may also edit the files.

I don't think there is any support for asking dump or dumps to include trailing commas (I spent a little time looking for one), so I would love if we could add one. I'm most interested in ion.dumps(), but a flag could be added to any other methods that make sense (dump()? Others?).

- Stephen

P.S. For more on why trailing commas are a good idea, please see On the virtues of the trailing comma.

@rmarrowstone
Copy link
Contributor

rmarrowstone commented Sep 24, 2024

Some notes:

  • We don't have any formatting directives for ion text writer today. Even "pretty printing" (which is what I assume you want), with newlines AFAIK.
  • The pure-python implementation of this would be almost trivial, but we'd need to plumb the flag into the ioncmodule.c so that it worked with the c-extension as well.

@mavjop
Copy link
Contributor Author

mavjop commented Sep 25, 2024

It ... does do pretty printing for me? I'm calling it as:

    ion.dumps(content, binary=False, omit_version_marker=True, indent=u'   ')

I assume the indent=u' ' directs it to do multi-line pretty printing, with a (in our case) three space indent.

I was thinking it could take another optional argument such as trailing_commas=True, defaulted to False for backwards compatibility. I saw that all of the library seemed to be written in C. It's been a few years since I've used C in anger, but I did consider having a look at how one might add code. :)

Is there a concern about maintaining feature parity with ion libraries for other languages?

I agree that it would be relatively easy to implement in pure Python, but potentially a little more tricky in C, though perhaps not horrendously so.

@rmarrowstone
Copy link
Contributor

rmarrowstone commented Sep 25, 2024

oh. you are correct. corrected comment.

I was thinking it could take another optional argument such as trailing_commas=True, defaulted to False for backwards compatibility.

That sounds good to me.

Is there a concern about maintaining feature parity with ion libraries for other languages?

I'd say no. But I will check with the other maintainers.

@popematt
Copy link
Contributor

Is there a concern about maintaining feature parity with ion libraries for other languages?

I'd say no. But I will check with the other maintainers.

API feature parity with other libraries is not a concern. The only real cross-library "parity" concern is that they can all read data written by other implementations. Since trailing commas are allowed by the spec, if any libraries cannot read data with trailing commas, it is a defect in that library and should not block the proposed change here.

TLDR; No concerns—do it!

@mavjop
Copy link
Contributor Author

mavjop commented Oct 9, 2024

I discovered that there is already not feature parity between pure Python and the ion C module. Pretty printing (which trailing commas only makes sense in the context of) is already unsupported. So I'm going to publish a PR without C module support.

I've been trying for some time to run the tests and running into errors no matter how I try. Is there a way to run the tests as they are? With Python 3.12, my default Python, it doesn't find distutils, which I guess is gone from Python 3.12, and when I use pyenv to select 3.8.10 and 3.9.5 like is called for in the ion-python README, when I run tox I get:

...
TypeError: canonicalize_version() got an unexpected keyword argument 'strip_trailing_zero'

and when I run python setup.py test I get:

...
    import setuptools.command.test as orig
ModuleNotFoundError: No module named 'setuptools.command.test'

@mavjop
Copy link
Contributor Author

mavjop commented Oct 10, 2024

New PR: #372

@mavjop
Copy link
Contributor Author

mavjop commented Oct 10, 2024

I discovered that there is already not feature parity between pure Python and the ion C module. Pretty printing (which trailing commas only makes sense in the context of) is already unsupported. So I'm going to publish a PR without C module support.

I've been trying for some time to run the tests and running into errors no matter how I try. Is there a way to run the tests as they are? With Python 3.12, my default Python, it doesn't find distutils, which I guess is gone from Python 3.12, and when I use pyenv to select 3.8.10 and 3.9.5 like is called for in the ion-python README, when I run tox I get:

...
TypeError: canonicalize_version() got an unexpected keyword argument 'strip_trailing_zero'

and when I run python setup.py test I get:

...
    import setuptools.command.test as orig
ModuleNotFoundError: No module named 'setuptools.command.test'

The above errors were solved by:

python -m pip install 'setuptools<71.0.0'

setuptools.command.test was apparently removed in setuptools 72, but 71 also had the canonicalize_version() got an unexpected keyword argument 'strip_trailing_zero' error, so I went back to before 71.0.0 (70.3.0) which did not have the problem.

@rmarrowstone
Copy link
Contributor

Sorry for your troubles, we need to prioritize getting off of our legacy build setup:
#305

@mavjop
Copy link
Contributor Author

mavjop commented Oct 10, 2024

No worries, Rob. :)

@mavjop
Copy link
Contributor Author

mavjop commented Oct 10, 2024

There are 3 unit tests still failing, with all of my changes in #372 ... but they appear to be entirely unrelated to my changes. I'm wondering if a Python / library version difference changed the limit for string parsing of integer values.

The 3 test failures are all because of extremely long integer strings in good/subfieldVarInt.ion.

I've elided some of the enormous integer string in this:

_ test_roundtrips[GOOD_ROUNDTRIP - good/subfieldVarInt_ion (binary) Decimal('0_012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789...012345678')] _

p = <tests.test_vectors._Parameter object at 0x116772bf0>

    @parametrize(*chain(
        _comparison_params(_T.GOOD_ROUNDTRIP, _GOOD_SUBDIR),
        _comparison_params(_T.GOOD_ROUNDTRIP, _GOOD_TIMESTAMP_SUBDIR),
        _comparison_params(_T.GOOD_EQUIVS_TIMESTAMP_INSTANTS_ROUNDTRIP, _EQUIVS_TIMELINE_SUBDIR),
        _comparison_params(_T.GOOD_EQUIVS_ROUNDTRIP, _EQUIVS_SUBDIR),
        _comparison_params(_T.GOOD_EQUIVS_ROUNDTRIP, _EQUIVS_UTF8_SUBDIR),
        _comparison_params(_T.GOOD_NONEQUIVS_ROUNDTRIP, _NONEQUIVS_SUBDIR),
    ))
    def test_roundtrips(p):
        """Tests roundtrips of good files for equality. Pre-parsing the streams is necessary to generate the comparison
        pairs, so there will necessarily be failures in parameter generation for this test if any good files fail to parse.
        """
>       p.test_thunk()

tests/test_vectors.py:401:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/test_vectors.py:343: in roundtrip
    assert ion_equals(value, _roundtrip(value, is_binary))
tests/test_vectors.py:337: in _roundtrip
    roundtripped = load(out)
amazon/ion/simpleion.py:274: in load
    return load_python(fp, catalog=catalog, single_value=single_value, parse_eagerly=parse_eagerly)
amazon/ion/simpleion.py:405: in load_python
    _load(out, reader)
amazon/ion/simpleion.py:472: in _load
    if event.value is None or ion_type is IonType.NULL or event.ion_type.is_container:
amazon/ion/core.py:286: in value
    self.cached_value = self[2]()
amazon/ion/reader_managed.py:111: in value_thunk
    value = ion_event.value
amazon/ion/core.py:286: in value
    self.cached_value = self[2]()
amazon/ion/reader_binary.py:532: in parse_decimal
    return _parse_decimal(BytesIO(data))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

buf = <_io.BytesIO object at 0x11b89c720>

    def _parse_decimal(buf):
        """Parses the remainder of a file-like object as a decimal."""
        exponent = _parse_var_int(buf, signed=True)
        sign_bit, coefficient = _parse_signed_int_components(buf)

        if coefficient == 0:
            # Handle the zero cases--especially negative zero
            value = Decimal((sign_bit, (0,), exponent))
        else:
            coefficient *= sign_bit and -1 or 1
            with localcontext() as context:
                # Adjusting precision for taking into account arbitrarily
                # large/small numbers
>               context.prec = len(str(coefficient))
E               ValueError: Exceeds the limit (4300) for integer string conversion; use sys.set_int_max_str_digits() to increase the limit

@rmarrowstone
Copy link
Contributor

Those tests are pulled in through a submodule: https://github.com/amazon-ion/ion-tests/commits/master/
It looks like there were some recent additions there that are likely the cause.

While fixing it would be good, there should be some skip lists in that test_vectors that you could add that test to for now.

@mavjop
Copy link
Contributor Author

mavjop commented Oct 10, 2024

Issue #229 ("CI/CD failed due to large int<->str conversions") looks like it's still open (since December 2022!) and might be talking about the same issue (i.e. issue is preexisting?).

Or perhaps not, since it's PYPY-specific, which isn't applicable.

@mavjop
Copy link
Contributor Author

mavjop commented Oct 10, 2024

It is in various skip lists, but not in the global one (that applies to the pure Python implementation).

I found the problem!

My Python version is likely a newer update of 3.10 than other devs are using (I'm on 3.10.15).

Python 3.10.7 introduced a breaking change for type conversion, in order to prevent DOS attacks.

So the tests won't fail for Python 3.10 before 3.10.7, and I assume probably won't fail for 3.9.5 or 3.8.10 that are called out in https://github.com/amazon-ion/ion-python/blob/master/README.md.

Verified

I verified with a build/test on Python 3.9.5 that all tests now pass with flying colours.

TL;DR:

The tests failing for me shouldn't actually cause test failures in validation analyzers, if they're not using a newer Python than called for in the README.

@mavjop
Copy link
Contributor Author

mavjop commented Oct 10, 2024

I created a separate PR to update the README about setuptools version needs. #373

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants