-
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
json start/end position implementation #4517
base: develop
Are you sure you want to change the base?
Conversation
Thanks for the effort! However, adding two I am hesitant how to continue here. |
I wonder if this is something that could be done with the data in the custom base class, so that only those that want to opt in to this behavior could enable it. That does make it a custom class, and not |
Taking that advice, I'm gonna add a new class like below to use as a json custom base class
We will use |
include/nlohmann/detail/json_base_class_with_start_end_markers.hpp
Outdated
Show resolved
Hide resolved
include/nlohmann/detail/json_base_class_with_start_end_markers.hpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments from my side. I am honestly not 100% sold to the use case this PR is solving. As I stated in #4455 (comment), I think we preserve every relevant part of the raw input except whitespace. Did I miss anything?
…s PR comments and fix CI failures
Whitespace is key to our use case - my colleague will provide a more detailed explanation in #4455 soon. |
include/nlohmann/detail/json_base_class_with_start_end_markers.hpp
Outdated
Show resolved
Hide resolved
The main issue we are trying to solve is when needing to sign the string corresponding to a nested object in a json, example: Even thought this is a somewhat niche scenario, it's mostly exposing information nlohmann has access to and, as an optional feature, I think it can add value to the library, as doing it outside it would require some form of parsing the contents of the string again, doubling the complexity and involving using two separate json parsers where this change allows the use of a single one. |
I think it's going to be difficult to use the macro for I had another thought about these:
What if we created two set functions, and put them in the base class, with no-ops in the main class if the base class isn't used? Then the function can always be called, and you don't need to protect the one-line callers, but you would want to protect the more involved ones because of the extra code involved.
|
@gregmarr I'd considered that, but decided against it to minimize changes to basic_json. Since it's a no-op I'm guessing the compiler might just optimize those away. I will update with that change |
Yeah, since this is header-only, the call itself should definitely get eliminated, and then any computations that are just related to that will hopefully be eliminated too. |
include/nlohmann/json_fwd.hpp
Outdated
@@ -70,6 +71,20 @@ struct ordered_map; | |||
/// @sa https://json.nlohmann.me/api/ordered_json/ | |||
using ordered_json = basic_json<nlohmann::ordered_map>; | |||
|
|||
/// @brief a minimal specialization that uses the base class json_base_class_with_start_end_markers | |||
using json_with_start_end_markers = nlohmann::basic_json < |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope we can find a better name for this feature...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open to discussion for names :). I'm terrible at coming up with good names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I had a better one, I'd have dropped it here. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the base class, how about json_base_with_positions
? or json_base_positional
?
Then this could be json_with_positions
, or json_positional
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented the macros, so base class is gone. basic_json now exposes two property getters: start_pos()
and end_pos()
, if the DIAGNOSTIC_POSITIONS
macro is set. Otherwise there is no additional memory footprint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this PR, changing the name to JSON_DIAGNOSTIC_POSITION
would be sufficient. As possible next step, we should also pick up the positions to provide better error message since we now know the positions of each part of the JSON.
@gregmarr After some prototyping I have realized that to correctly enable SFINAE in the derived class, we need to add the following:
The methods need the alternate implementation in order to work with SFINAE. The already implemented versions in the base class do not count for the SFINAE pattern's error ignore. This results in a rather ugly The macros that I've added in the recent commit don't work with C++11, however, that can be fixed by adding a new version of the macros that takes no arguments and removes the varargs, which makes the CI checks happy. I will implement that in my next commit instead. |
A very wild idea: why not adding another macro like DIAGNOSTIC_POSITIONS and implement this without inheritance? |
There is no SFINAE for the simple functions with this method.
Since object and array do the same thing, that could also be simplified to reduce the number of functions:
That's certainly a possibility. I thought the base class might be cleaner, but it changes the types, so maybe it isn't. We'd need to encode this define into the intermediate namespace selection so that we don't link against something with the wrong definition, as we do for https://github.com/nlohmann/json/blob/develop/include/nlohmann/detail/abi_macros.hpp#L33-L37 |
This seems feasible, thanks for the note on the intermediate namespace @gregmarr. We are prototyping this locally as an option and will post the next iteration based on our findings. |
Co-authored-by: Sush Shringarputale <sushring@linux.microsoft.com>
@nlohmann pushed a new commit that implements the diagnostic style macro. I personally think this looks cleaner as well, PTAL. |
@@ -26,6 +26,10 @@ | |||
#define JSON_DIAGNOSTICS 0 | |||
#endif | |||
|
|||
#ifndef DIAGNOSTIC_POSITIONS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this define start with JSON_ like the others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and we should also have a brief first version of a documentation ready like https://raw.githubusercontent.com/nlohmann/json/develop/docs/mkdocs/docs/api/macros/json_diagnostics.md - I can take care about the rest.
|
||
#include <nlohmann/detail/exceptions.hpp> | ||
#include <nlohmann/detail/macro_scope.hpp> | ||
#include <nlohmann/detail/string_concat.hpp> | ||
|
||
#include <nlohmann/detail/input/lexer.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lost a blank line here.
#if DIAGNOSTIC_POSITIONS | ||
start_position = val.start_position; | ||
end_position = val.end_position; | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blank line after?
@@ -1204,15 +1209,24 @@ class basic_json // NOLINT(cppcoreguidelines-special-member-functions,hicpp-spec | |||
default: | |||
break; | |||
} | |||
#if DIAGNOSTIC_POSITIONS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blank line before?
} | ||
private: | ||
/// the start position of the value | ||
size_t start_position = std::string::npos; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these data members should be next to the other data members, down around 4265, probably after the existing ones so as to not unnecessarily change the order of data members. We also need to make sure that the members are initialized in the constructors in same order, as compilers will warn if they're not in the same order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a judgement call on the location, thanks for providing the appropriate one. I’ll make sure to run the ci tools locally for the compiler checks
@@ -417,6 +417,9 @@ typename container_input_adapter_factory_impl::container_input_adapter_factory<C | |||
return container_input_adapter_factory_impl::container_input_adapter_factory<ContainerType>::create(container); | |||
} | |||
|
|||
// specialization for std::string | |||
using string_input_adapter = decltype(input_adapter(std::declval<std::string>())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using string_input_adapter_type =
?
@@ -1631,7 +1631,8 @@ TEST_CASE("CBOR") | |||
}; | |||
|
|||
json j; | |||
auto cbp = nlohmann::detail::json_sax_dom_callback_parser<json>(j, callback, true); | |||
auto ia = nlohmann::detail::input_adapter(input); | |||
auto cbp = nlohmann::detail::json_sax_dom_callback_parser<json, decltype(ia)>(j, callback, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string_input_adapter_type
?
@@ -219,7 +219,8 @@ json parser_helper(const std::string& s) | |||
CHECK(j_nothrow == j); | |||
|
|||
json j_sax; | |||
nlohmann::detail::json_sax_dom_parser<json> sdp(j_sax); | |||
auto ia = nlohmann::detail::input_adapter(s); | |||
nlohmann::detail::json_sax_dom_parser<json, decltype(ia)> sdp(j_sax); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string_input_adapter_type
?
|
||
json j_sax; | ||
auto ia = nlohmann::detail::input_adapter(s); | ||
nlohmann::detail::json_sax_dom_parser<json, decltype(ia)> sdp(j_sax); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string_input_adapter_type
?
{ | ||
public: | ||
explicit sax_no_exception(json& j) | ||
: nlohmann::detail::json_sax_dom_parser<json>(j, false) | ||
: nlohmann::detail::json_sax_dom_parser<json, decltype(nlohmann::detail::input_adapter(""))>(j, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string_input_adapter_type
?
The ci_clang_tidy job keeps complaining about the parser callback being passed by value in json_sax, but that’s not really any code we’re modifying in this PR. What’s the suggestion for that? |
I think everywhere else we pass it as non-const parameter, which seems to be OK for Clang-Tidy. |
Abstract
Referring to discussion: 4455, this pull request introduces the implementation to retrieve the start and end positions of nested objects within the JSON during parsing.
Motivation
We have a service implementation with JSON schema where a field within the nested objects contains the hash value for that object. The service verifies the hash value of each of the nested objects before operating on the rest of the data sent.
For example, consider the following JSON:
Here, data_hash contains the hash of the object "details". In order to verify the data hash, we need to be able to retrieve the exact string that parsed out "details" including the spaces and newlines. Currently there is no way to achieve this using nlohmann/json parser.
Changes proposed
size_t start_position
andsize_t end_position
.Memory considerations
We considered storing substrings in the output JSON objects and sub-objects directly as well, however, considering the memory footprint increase that it would create, we opted for the option where only two size_t fields are stored per basic_json created.
Validation
We have added tests to the class_parser test suite that cover the following cases:
Since the change affects the sax_parser, for each of these test cases we validate scenarios where no callback is passed, a callback is passed that accepts all fields, and a callback is passed that filters specific fields.
Pull request checklist
Read the Contribution Guidelines for detailed information.
include/nlohmann
directory, runmake amalgamate
to create the single-header filessingle_include/nlohmann/json.hpp
andsingle_include/nlohmann/json_fwd.hpp
. The whole process is described here.Please don't
#ifdef
s or other means.