Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roundtrip tests are too strict #33

Open
nlohmann opened this issue Jan 30, 2016 · 6 comments
Open

Roundtrip tests are too strict #33

nlohmann opened this issue Jan 30, 2016 · 6 comments
Assignees

Comments

@nlohmann
Copy link
Contributor

There is an issue in the roundtrip tests: The current tests check that the output of a library exactly match the input. However, the JSON specification has several ways to specify a number. For instance, number 1e3 could also be written as 1E3, 1e+3, 1E+3, or 1000. Therefore, any library which, reading 1e3 would perform any of these outputs should pass the roundtrip test. Requiring the exact way the number is returned is arbitrary and would be similar to requiring that the order of object keys should remain the same or even requiring whitespace to be preserved.

The current tests are in favor of those libraries which (by chance) decide to serialize numbers with letter "e" rather than with letter "E". This affects files roundtrip24.json--roundtrip27.json. When the exponent is changed (e to E) or the optional + is added, the test results differ.

More details can be found in the original discussion at nlohmann/json#187.

@miloyip
Copy link
Owner

miloyip commented Jan 30, 2016

There are some discussion on #22 as well.

The current rountrip test is:

d = Parse(json)
json2 = Serialize(d)
Test(json == json2)

To make it better, what if we relax it in this way:

d = Parse(json);
json2 = Serialize(R)
d2 = Parse(json2);
json3 = Serialize(d2)
Test(json2 == json3)

So that it just make sure that a library will maintain its own format. This is actually need in some situations.

@gregmarr
Copy link

I would say that the equality of the numerical form is more important than equality of the string form for floathing point values.

@miloyip
Copy link
Owner

miloyip commented Jan 31, 2016

@gregmarr A problem is that, when the parser firstly parses the source JSON, the parsed number may already be incorrect. My proposal above will also have this problem.

@miloyip
Copy link
Owner

miloyip commented Jan 31, 2016

@nlohmann Proposal 2: explicitly check the output to handle e/E, and oprtional + in exponent.

@miloyip
Copy link
Owner

miloyip commented Jan 31, 2016

I just checked the result of @nlohmann 's library:

roundtrip24: false
Expect: [5e-324]
Actual: [4.94065645841247e-324]

roundtrip25: false
Expect: [2.225073858507201e-308]
Actual: [2.2250738585072e-308]

roundtrip26: false
Expect: [2.2250738585072014e-308]
Actual: [2.2250738585072e-308]

roundtrip27: false.
Expect: [1.7976931348623157e308]
Actual: [179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.0]

The problem is not related to e/E and optional + in exponent.

These strings are correctly stringified doubles. By "correctness" it means they are convertible back to the exact value of IEEE-754 double, without ambiguity. It is also to be the shortest textual representation.

For example, 2.225073858507201e-308 and 2.2250738585072014e-308 represents two distinct double values 0x000FFFFFFFFFFFFF and 0x0010000000000000. But the library stringify them to a single textual representation 2.2250738585072e-308. Can check these converisons on http://babbage.cs.qc.edu/courses/cs341/IEEE-754.html.

In https://github.com/miloyip/dtoa-benchmark, different conversions routine has been tested.

void dtoa_ostringstream(double value, char* buffer) {
    std::ostringstream oss;
    oss << std::setprecision(17) << value;
    strcpy(buffer, oss.str().c_str());
}

and

void dtoa_sprintf(double value, char* buffer) {
    sprintf(buffer, "%.17g", value);
}

should give "correct" result.

It may be argued that, the JSON spec did not say about subnomal double so it should not be checked as "conformance".

@ColinH
Copy link
Contributor

ColinH commented Feb 13, 2016

Regarding the setup of the roundtrip tests, as Milo wrote it's currently like this:

json_object = parse( original_json )
generated_json = serialize( json_object )
assert( original_json == generated_json)

Or, using roundtrip( x ) as shortcut for serialise( parse( x ) ), we can write this as:

assert( json == roundtrip( json ) )

Which works pretty well, but as noted, this could fail if some library encoded e.g. floating point exponents differently, i.e. in an allowed way that doesn't change the meaning. Milo also suggested the following solution:

assert( roundtrip( json ) == roundtrip( roundtrip( json ) ) )

This fixes the issue, but let's take a step back. A roundtrip test should check that, when some json input is "round tripped" through a library, the resulting json output is in the same equivalence class as the input, assuming we define two json representations as equivalent if they "encode the same data" (without diving into a formal definition...)

In order to check whether two inputs are equivalent, we can use a roundtrip through a reference implementation as a kind of normal-form generator that always yields the same output representation for each equivalence class of json inputs. In other words, we could use the following approach for the roundtrip conformance test:

assert( reference_roundtrip( json ) == reference_roundtrip( testee_roundtrip( json ) ) )

Unlike the other suggestion, this can't be fooled by a "stupid" or "malicious" library that always serialises to null. However it does not test whether a library keeps to its own format, so perhaps both should be implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants