-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Syntactical Ordering of Object Keys #53
Syntactical Ordering of Object Keys #53
Conversation
Merge master
Merging the latest version
Preserving order of fields: Alternative
This looks reasonable, but could you run benchmarks (I think we have some in the test suite) to see if this has a significant impact on performance? Our Jsonnet compilation at work is moderately slow, so I want to make sure this doesn't negatively impact performance too much. If using |
One more thing is I think we should have a slightly more thorough test suite, e.g. to verify the behavior when:
There are probably more cases you can come up with, but we should err on the side of thoroughness when for this PR, since otherwise I'm sure this functionality will be accidentally broken in future |
Thanks for taking a look, I'll work on the performance tests and increased test coverage and hopefully have updates for you soon! |
…ent preserveOrdering implementation
I've now updated this PR with all the suggested tests and a number of additional tests, it went pretty smoothly other than a brief sidetracking when I accidentally left off a comma between objects and had them combine on me instead, which is syntax I'd forgotten about. It did uncover one additional code change needed, since objectFields and objectFieldsAll did their own sorting of keys, which is done. In addition to adding tests for all methods that look plausible, I searched for places where sorting happens, and did not discover any more that this PR should change the behavior of. I also discovered one (possible?) bug, around manifestXmlJsonml, where the example in the jsonnet docs errors in SJSonnet due to numerical attribute values. I've filed that as #56. I say possible because, while the example works on jsonnet.org, the function is pretty underspecified. I believe the JsonML spec does allow numbers in that particular place, though. |
I hope to have benchmarking results soon. |
…rrently fails when using a key function
…nnet == operation
…son logic in key function branches
@lihaoyi-databricks We've run the benchmarks, and the results suggest any speed differences due to the changes are very very small, so I'm hopeful we'll be able to merge with this approach. Two of us ran the benchmarks, on different but basically identical computers. Once was before a small number of the changes due to tests were in, and the other time was after. In both cases, we ran the version of SjsonnetTestMain at commit 2108b89, which is one commit behind master. The most recent version is unrunnable due to including several kubernetes configs that do not exist in the repository. We ran the tests with The two codebases were modusintegration:master (first run 6366b0a, next run 06da4df) and databricks:master (83cabbe) Results are the number of times a set of tests can be run in 20 seconds. Higher numbers are better. First runsIn the first set of runs we hadn’t yet established a formal procedure, but while no applications were quit, nothing was being done on the computer during the tests. No burn in run was done. Ten runs were recorded for each codebase. The ten test results, in order for modusintegration:master (6366b0a) 2936 The ten test results, in order, for databricks:master 2808 The means are very close. The mean without ordering code is about 85 higher than the mean with. This about about 3/4 the standard deviation of each sample, and the standard deviation is low relative to the values, so not very large. Specifically, a one-sided t-test says there’s a slightly greater than 5% chance they’re both from the same distribution, so the null hypothesis should be rejected and we should assume there’s no difference. Second runsIn the second set of runs, non-involved applications were quit, and nothing was done on the computer during test runs. For each codebase, the benchmark was run first once without recording the result, then ten runs that were recorded. The ten run results, in order, for modusintegration:master (06da4df) 2880 The ten run results, in order, for databricks:master 2990 The means were virtually identical. In fact, the mean for with ordering is a very tiny bit higher. The standard deviation for each is much higher than the mean, though about twice as high for the with ordering runs. In both cases the standard deviation is modest relative to the mean. The t-test probability that they are from same the distribution is very high, so we should again assume there is no difference. |
@fugu13 this looks good to me! Happy to merge as is. Going forward I'll be leaning heavily on your test suite to make sure future changes do not violate any ordering-related properties that are important to you. Hopefully it is enough to catch regressions, otherwise we can add further tests when any issues turn up in future |
Thanks! And yeah, I feel pretty good about the test suite maintaining ordering everywhere that makes sense, but we're also going to be measured about what we tell people is definitely not going to change. Would it be possible to have a released pushed to the maven repos soon with the updates? By the way, thank you so much for creating SJSonnet, it's made integrating JSonnet with the java codebases common in the enterprise service bus world feasible. We've really enjoyed working with the SJSonnet codebase, too, it's worked well for our needs even though it wasn't written with our somewhat different data transformation scenario in mind. |
tagged and uploaded sjsonnet 0.2.4 to maven central |
This adds a flag that makes object key ordering based on JSonnet syntactical ordering, by using ordered maps internally and skipping the sort step when the flag is provided.
I'm part of a team reusing SJSonnet to support data transformation (https://datasonnet.com, and we're still very early days). Our main audience is highly technical business analysts who are experts in the data schemas and formats they're manipulating, but infrequent programmers. The reordering behavior of JSonnet is one of the most confusing things to them about JSonnet programs. Additionally, some of the code we work with taking JSonnet output is ordering-sensitive (unfortunately).
Beyond our use case, I've noticed that requests for an ordering option have popped up several times in general JSonnet discussions. That does not look likely to happen across all implementations, but I think having it available would likely still be of interest to many SJSonnet users.
While this PR goes beyond that, since ordering flows naturally once ordered maps are used, the only two well-defined ordering guarantees this PR attempts to provide when using the flag are:
Since there are many places output can be generated, and in most/all of those cases consistent behavior is important (such as objects rendered during errors vs in output), we've made that a global flag on the evaluator.
We're happy to revise or do other further work as needed if it will help include this capability in SJSonnet.
Thanks for looking,
Russell