Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for down-converting ion to JSON #310

Merged
merged 3 commits into from
Dec 6, 2022

Conversation

nirosys
Copy link
Contributor

@nirosys nirosys commented Dec 1, 2022

Issue #, if available: n/a

Description of changes:
This PR adds functionality to the ion text writer to allow for a basic lossy conversion to JSON encoding. The down-conversion is done in the same manner described in the cookbook, with a couple minor differences that will be corrected soon.

The most noticeable difference is the ranges in which unicode is escaped. Currently, any non-ascii characters are emitted as unicode escape sequences. I'll follow up with a fix for that soon. I have identified the issue, and just need to implement the fix and test it.

An existing issue that needs some consensus is how multiple top-level values are handled. With this commit multiple top-level items are serialized with no delimeter. This is supported by some JSON parsers, such the one used by the tool jq, but may be an issue with others. A simple hacky fix can be added to place all top-level values into a list, but I didn't feel comfortable with the implementation unless it is truly needed. Another option might be to use JSON Lines (website seems to be down).

The changes add a new option to ION_WRITER_OPTIONS called json_downconvert which, when true will format any output written to the writer using the JSON friendly formatting described in the cookbook link (minus the above notes).

An example of converting ion data to json, can be found in test/test_ion_text.cpp in the convert_to_json function.

This PR also extends the CLI to allow the output formats json and json-pretty, which perform the down conversion with pretty printing off, and on, respectively. Examples of using ion process to perform downconversion has also been added to the help.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

This commit adds functionality to the ion text writer to allow for
a basic lossy conversion to JSON encoding.

An existing issue that needs some consensus is how multiple top-level
values are handled. With this commit multiple top-level items are
serialized with no delimeter. This is supported by some JSON parsers,
such the one used by the tool jq, but may be an issue with others. A
simple hacky fix can be added to place all top-level values into a list.
@nirosys
Copy link
Contributor Author

nirosys commented Dec 1, 2022

The appveyor build is failing with:

[00:02:08] [ RUN      ] IonTextDownconvert.IntsFloatsAndDecimals
[00:02:08] C:\projects\ion-c-3akm7\test\test_ion_text.cpp(1408): error:       Expected: IERR_OK
[00:02:08]       Which is: 0
[00:02:08] To be equal to: convert_to_json("1.0e0", json_text, sizeof(json_text))
[00:02:08]       Which is: 7
[00:02:08] [  FAILED  ] IonTextDownconvert.IntsFloatsAndDecimals (2 ms)

The error returned (value: 7) is IERR_UNRECOGNIZED_FLOAT.

MacOS build with gcc-11 failed due to an assertion in the linker.. which is.. neat.

[ 62%] Linking CXX shared library libion_events.dylib
0  0x106059ffa  __assert_rtn + 139
1  0x105e8d28d  mach_o::relocatable::Parser<x86_64>::parse(mach_o::relocatable::ParserOptions const&) + 4989
2  0x105e7df8f  mach_o::relocatable::Parser<x86_64>::parse(unsigned char const*, unsigned long long, char const*, long, ld::File::Ordinal, mach_o::relocatable::ParserOptions const&) + 207
3  0x105ef49d4  ld::tool::InputFiles::makeFile(Options::FileInfo const&, bool) + 2036
4  0x105ef7fa0  ___ZN2ld4tool10InputFilesC2ER7Options_block_invoke + 48
5  0x7ff81a1e234a  _dispatch_client_callout2 + 8
6  0x7ff81a1f4c45  _dispatch_apply_invoke_and_wait + 213
7  0x7ff81a1f4161  _dispatch_apply_with_attr_f + 1178
8  0x7ff81a1f4327  dispatch_apply + 45
9  0x105ef7e2d  ld::tool::InputFiles::InputFiles(Options&) + 669
10  0x105e68d48  main + 840
A linker snapshot was created at:
	/tmp/libion_events.dylib-2022-12-01-112417.ld-snapshot
ld: Assertion failed: (_file->_atomsArrayCount == computedAtomCount && "more atoms allocated than expected"), function parse, file macho_relocatable_file.cpp, line 2061.
collect2: error: ld returned 1 exit status

@nirosys
Copy link
Contributor Author

nirosys commented Dec 1, 2022

MacOS issue was due to a bug in Xcode 14.0.1. I'm running 14.1 and I was unable to reproduce the issue, so I've added a build step for macos builds in the GH Actions workflow to set the default xcode to 14.1 (it is installed in the image, just not used by default). That seems to have fixed the MacOS issue.. now on to the windows issue..

@nirosys
Copy link
Contributor Author

nirosys commented Dec 2, 2022

Reproduced the issue in the windows build, tracked it down to a misplaced #endif. Everything should be green now.

@nirosys nirosys marked this pull request as ready for review December 2, 2022 08:06
@nirosys nirosys requested a review from tgregg December 2, 2022 08:07
Copy link
Contributor

@tgregg tgregg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Minor comments inline.

Regarding how multiple top-level values should be handled, I'm fine with the behavior as-is. Namely, that newlines will separate top-level values when pretty is enabled, but not when pretty is disabled. If users need to use the output with a JSON parser that doesn't support multiple top-level values, they can easily accommodate that manually, e.g. by wrapping everything in a single list.

We could add a lines format option, but I see that as orthogonal to this change, as it could apply to both text Ion and JSON. I'm aware of use cases where "Ion lines" has come in handy.

Comment on lines +733 to +737
if (json_downconvert)
ION_PUT(pwriter->output, '"');
IONCHECK(_ion_writer_text_append_ascii_cstr(pwriter->output, temp));
if (json_downconvert)
ION_PUT(pwriter->output, '"');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't defined a formal style guide for this repo, but let's use braces for every if (for all new code anyway).


if (pwriter->depth == 0 && pwriter->annotation_count == 0 && pstr->value[0] == '$'
&& _ion_symbol_table_parse_version_marker(pstr, NULL, NULL)) {
// The text $ion_<int>_<int> is reserved for the IVMs. This is a no-op.
SUCCEED();
}
else {
char quote = ION_TEXT_WRITER_IS_JSON() ? '"':'\'';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use down_convert here for consistency.

{
iENTER;
SIZE written;
char quote_char = (pwriter->options.json_downconvert) ? '"' : '\'';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ION_TEXT_WRITER_IS_JSON() for consistency

ION_PUT(poutput, '\'');
if (as_ascii) {
IONCHECK(_ion_writer_text_append_escaped_string(poutput, p_str, '\''));
if (_ion_symbol_needs_quotes(p_str, system_identifiers_need_quotes) || pwriter->options.json_downconvert) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ION_TEXT_WRITER_IS_JSON() for consistency

Comment on lines +1262 to +1265
if (!down_convert)
image = _ion_writer_get_control_escape_string(c);
else
image = _ion_writer_get_control_escape_string_json(c);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a minor preference to swap the bodies and lose the negation. Disregard if this style was chosen deliberately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, that wasn't done for any particular reason, definitely reads better without the negation.

IONJSON_CMP("-inf", "null");
IONJSON_CMP("1.0e0", "1");
IONJSON_CMP("1.5e0", "1.5");
IONJSON_CMP("1e-5", "1e-05");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's... interesting? Is 05 necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exponent of 5 is needed to force the float to get rendered in exponential notation. The literal 05 in the string isn't required by anything, it's just the way ion-c is currently rendering exponents to text.

IONJSON_CMP("1d0", "1");
IONJSON_CMP("1.", "1");
IONJSON_CMP("1d-0", "1");
IONJSON_CMP("0d1", "0E+1");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we're using lowercase e for some, and uppercase E for others?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, the way decimals are rendered differs from how floats are. The decNumber library handles the decimals, while we're using snprintf for floats. I'm adding a task to get those following the same rules to the plan. Negligible priority I'd imagine, but annoying none the less.

@nirosys
Copy link
Contributor Author

nirosys commented Dec 6, 2022

Thank You @tgregg!

I'm going to merge this tonight and follow up with a PR for the changes mentioned above if no one has any arguments against that.

@nirosys nirosys merged commit 460c67f into amazon-ion:master Dec 6, 2022
nirosys added a commit to nirosys/ion-c that referenced this pull request Jan 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants