-
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integer conversion to unsigned #178
Comments
It is possible that the value was unsigned in the json, and was converted to a negative value for storage in the json object. It would have to be a really big unsigned number with the default storage size being int64_t, but it's still possible. The json object would need to store additional data to know whether it was negative or not originally. |
It would be better to check integer size and signedness in parser. Successfuly parsed json should contain valid data. I forgot that NumberIntegerType can be changed, and even be unsigned, condition for conversion check should be like this: if(std::is_unsigned<T>::value && std::is_signed<NumberIntegerType>::value && res < 0) Another possibility is to remove static_cast and let the user handle conversions. But I'm not sure that it will not break API. Anyway, silent incorrect conversion is not a good thing. |
I think this also has the possibility to break existing programs, though at runtime rather than compile time. Currently, if someone populates a json object with an unsigned integer value larger than the max size for NumberIntegerType, it will be encoded in the JSON string as signed. Parsing that string and extracting the value as unsigned will currently produce the same result as the original input. This change would make it throw an exception instead. |
If it really ignores integer overflow and does not differentiate between signed and unsigned. At least this behavior should be noted in documentation. For now i removed static_cast to get compiler errors and made validating wrapper, but considering above, it is possible that value would be in fact undefined in result of overflow or signedness conversion. Simple example - some 3rd party app can send my app json with integer which need 128 bits, I expect deserialization of this value to fail, not to get invalid value. |
@gregmarr, I might be missing something but why would a large unsigned value in the JSON be stored as a negative number in the object? Wouldn't this (in // check if conversion loses precision
const auto int_val = static_cast<number_integer_t>(float_val);
if (approx(float_val, static_cast<long double>(int_val)))
{
// we would not lose precision -> return int
result.m_type = value_t::number_integer;
result.m_value = int_val;
}
else
{
// we would lose precision -> return float
result.m_type = value_t::number_float;
result.m_value = static_cast<number_float_t>(float_val);
}
break; My understanding, and I might be wrong, is that the parser casts from a NumberFloatType to an NumberIntegerType and back again and checks if there is a significant difference. If the number was too large for the NumberIntegerType won't it therefore fail the loss of precision test and just be stored as a NumberFloatType, with the sign preserved? @Furcube, what are you trying to do exactly? Normally it is perfectly legal to assign a negative value to an unsigned variable. For example: unsigned int val = -1;
std::cout << std::hex << val; compiles without error, runs without exception, and simply produces FFFFFFFF i.e. the unsigned representation of -1. This is because the compiler does an implicit conversion. I might be misunderstanding however but if you are trying to do something like this (where unsigned int val = doc["val"].get<unsigned int>(); Then I would expect it to work because internally the library has a function in the form of the following (only in a class and not just returning a literal but the actual value with a static_cast): template<typename T>
T get() { return -1; } Which if called in the form I apologize in advance if I am way off with this. I haven't been using the library long so I might be misunderstanding how it works. |
@twelsby I wasn't referring to parsing, I was referring to creating a json object to be formatted as a JSON string:
The static cast causes the large positive number, such as uint64_t(-1), to be stored and then output as a negative number. |
In the beginning negative number was used as a test for error handling. I use uint64_t for generated ID, and I've noticed that logged id differ from id in json. Additionally it will make it harder to interface with java app, where jackson can use BigInteger for large values. As for overflow. It is not so bad, as it can be range checked outside of library - large numbers will go into NumberFloatType (double), which will allow to do range check. For example: std::stringstream jstr;
uint64_t expected = std::numeric_limits<uint64_t>::max();
jstr << "{\"max\":" << expected << "}";
json j = json::parse(jstr.str());
uint64_t u_max = j.at("max");
double d_max = j.at("max");
std::cout << "expected: " << expected << std::endl;
std::cout << "json: " << jstr.str() << std::endl;
std::cout << "dump: " << j.dump() << std::endl;
std::cout << "uint64_t max: " << u_max << std::endl;
std::cout << "double max: " << d_max << std::endl;
std::cout << "is more than max(int64_t)? " << (d_max > std::numeric_limits<int64_t>::max()) << std::endl; with gcc 5.3 will produce
I can't get uint64_t from it, don't know why, but range check is working; |
std::numeric_limits<uint64_t>::max can not be stored in a double without loss of accuracy, it has too many digits. |
@gregmarr, @Furcube seems to be talking about parsing, or at least that is what his example does. However I think the behavior you raise is certainly worthy of its own issue. The loss of accuracy you also mention (which will also occur during parsing because all numbers are parsed initially as doubles so anything larger than 2^53 will have loss of accuracy) may or not be a problem as this behavior is in compliance with RFC7159, which allows implementations to set limits on the range and precision of numbers that are accepted and also recommends supporting double precision numbers for interoperability. It would be nice to go all the way to 2^64 - 1 but is not a necessity. @Furcube, when I run your example it doesn't execute the code path related to your fix but instead passes through the next switch case because With both VS2015 and gcc 4.9.2 compiling for x64 the subsequent The minimal example that replicates this behavior is: uint64_t x = std::numeric_limits<uint64_t>::max();
double y = static_cast<double>(x);
uint64_t z = static_cast<uint64_t>(y);
std::cout << x << std::endl << y << std::endl << z << std::endl; Here
The moral of the story is that if you parse a number that is larger than 2^53 then it will be stored as a double with reduced accuracy and if you then try and extract it to an Incidentally, @nlohmann, I notice there has been issues with Visual Studio in the past but the current version compiles without error or warning in VS2015. c++11 (and c99) support seems to be greatly improved. |
@twelsby Which version of MSVC did you try? Currently MSVC 19.0.23506.0 fails to compile the unit tests, see https://ci.appveyor.com/project/nlohmann/json. |
Here is one possible solution, admittedly a not very elegant brute force approach: #include <iostream>
#include <cfloat>
#include <cmath>
#include <cstdlib>
// Modify the existing value_t enum
enum class value_t : uint8_t
{
number_integer,
number_unsigned, // Add this
number_float
} m_type;
// Modify the exiting json_value union
union json_value {
double number_float;
uint64_t number_unsigned; // Add this
int64_t number_integer;
} m_value;
const char * m_start;
// Rewrite lexer::get_number()
void get_number()
{
char * end;
// Parse it as an integer
if(*m_start != '-') {
// Unsigned
m_value.number_unsigned = strtoull(m_start,&end,0);
m_type = value_t::number_unsigned;
}
else {
// Signed
m_value.number_integer = strtoll(m_start,&end,0);
m_type = value_t::number_integer;
}
// Parse it as a double
double test = strtold(m_start,&end);
double int_part;
double frac_part = std::modf(test, &int_part);
// Test if the double or integer is a better representation
if(frac_part != 0 ||
(m_type == value_t::number_unsigned && int_part != m_value.number_unsigned) ||
(m_type == value_t::number_integer && int_part != m_value.number_integer)) {
m_value.number_float = test;
m_type = value_t::number_float;
}
}
template<typename T>
T get_impl(T*)
{
switch (m_type)
{
case value_t::number_integer:
{
return static_cast<T>(m_value.number_integer);
}
// Add from here ------------------------
case value_t::number_unsigned:
{
return static_cast<T>(m_value.number_unsigned);
}
// to here-------------------------------
case value_t::number_float:
{
return static_cast<T>(m_value.number_float);
}
default:
{
}
}
}
int main()
{
const char * str[] = {
"18446744073709551615",
"9223372036854775807",
"-9223372036854775808",
"-123.456e13",
"123.456E-25",
"123.456e2"
};
for (int i=0; i < 6; i++) {
m_start = str[i];
get_number();
double dout = get_impl(static_cast<double*>(nullptr));
uint64_t uout = get_impl(static_cast<uint64_t*>(nullptr));
int64_t iout = get_impl(static_cast<int64_t*>(nullptr));
std::cout << "In:" << str[i];
std::cout << " Double:" << dout;
std::cout << " Unsigned:" << uout;
std::cout << " Signed:" << iout << std::endl;
}
return 0;
} This produces the output
This is the output I would expect i.e. for:
The downside of this approach is that it requires parsing the number twice, as an integer and as a double. The upside is that it is no longer necessary to do the precision loss check as this is effectively handled by the above code. I haven't looked at the object creation side but I would think that having the |
@nlohmann, yeah that one. I didn't build the unit tests however and wasn't trying to see if everything worked. Disregard then. |
Extending the enum |
It's not a bitfield, so adding another value to the enum doesn't affect the required size. |
I'll try it in the actual code and see how it goes. Although if anyone has a better idea jump in. |
Well it works well in testing so I have submitted a pull request. With the existing unit testing the only errors related to type verification, which were expected. Changing those to accept the new type and it passes. I added extra (equivalent) unit tests for the new type (very tedious) and they also pass as well as tests that check the behavior in the specific cases discussed here. |
@gregmarr, I understand that (Sorry for the stupid question...) |
@nlohmann I tend to agree with @gregmarr. Unless something funny is going on a |
Oh this is so embarrassing... I totally forgot the "8" in |
uint8_t is 8bit, enums are not bit based, they go like 0,1,2,3... |
FYI: I'll try to close as many issues as possible until the weekend and publish version 1.0.1 then before I can focus on the 2.0.0 release (I would really like to follow Semantic Versioning and treat the change in the public enum |
Issue #178 - Extending support to full uint64_t/int64_t range and unsigned type (updated)
Fixed by merging #193. Thanks so much!!! |
It is possible to assign negative json value to unsigned variable.
This is a possible patch
The text was updated successfully, but these errors were encountered: