Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional information for sax parser via next token start stop #3165

Open
wants to merge 7 commits into
base: develop
Choose a base branch
from

Conversation

barcode
Copy link
Contributor

@barcode barcode commented Nov 28, 2021

This PR requires #3110.

This PR enables a sax parser to get additional information about line, column and byte position of a token from a parsed input (e.g. file). This can be used to give feedback to a user.
To use this feature a sax parser has to implement void next_token_start(std::size_t pos) and void next_token_end(std::size_t pos). These functions are supported for json and binary formats (cbor, bson, msgpack, ubjson).
If the input is json, the sax parser can implement template<class T1, class T2> void next_token_start(const nlohmann::detail::lexer<T1, T2>& lex) and template<class T1, class T2> void next_token_end(const nlohmann::detail::lexer<T1, T2>& lex) to get access to more detailed information (such as line and column).

For examples see the tests unit-sax-parser-extended.cpp and unit-sax-parser-store-source-location.cpp. The former is a general test for these functions, while the latter also serves as an example for a sax parser storing token start and end information in json nodes.

This PR is currently WIP and was only added at this time to

  1. document someone is working on this,
  2. enable people interested to give feedback or try it out and
  3. run the CI to detect issues.

TODO


Pull request checklist

Will be filled out when tin PR is ready to merge.

  • Changes are described in the pull request, or an existing issue is referenced.
  • The test suite compiles and runs without error.
  • Code coverage is 100%. Test cases can be added by editing the test suite.
  • The source code is amalgamated; that is, after making changes to the sources in the include/nlohmann directory, run make amalgamate to create the single-header file single_include/nlohmann/json.hpp. The whole process is described here.

@coveralls
Copy link

coveralls commented Nov 28, 2021

Coverage Status

coverage: 100.0%. remained the same when pulling a416868 on barcode:additional_information_for_sax_parser_via_next_token_start_stop into 5d27543 on nlohmann:develop.

@barcode barcode force-pushed the additional_information_for_sax_parser_via_next_token_start_stop branch 2 times, most recently from fffed12 to 94cff5d Compare December 5, 2021 00:28
@barcode
Copy link
Contributor Author

barcode commented Dec 5, 2021

I will add some documentation, but otherwise my work on this PR is finished.

@stale
Copy link

stale bot commented Jan 9, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Jan 9, 2022
simark pushed a commit to simark/babeltrace that referenced this pull request Jun 10, 2022
This patch adds the bt2_common::parseJson() functions in
`parse-json.hpp`.

Those functions wrap the file-internal
`bt2_common::internal::JsonParser` class of which an instance can parse
a single JSON value, calling specific methods of a JSON event listener
as it processes. Internally, `bt2_common::internal::JsonParser` uses a
string scanner (`bt2_common::StrScanner`).

In searching for a simple JSON parsing solution, I could not find, as of
this date, any project which satisfies the following requirements out of
the box:

* Is well-known, well documented, and well tested.

* Has an MIT-compatible license.

* Parses both unsigned and signed 64-bit integers (range
  -9,223,372,036,854,775,808 to 18,446,744,073,709,551,615).

* Provides an exact text location (offset, line number, column number)
  on parsing error.

* Provides an exact text location (offset, line number, column number)
  for each parsed value.

I believe the text locations are essential as this JSON parser will be
used to decode CTF2‑SPECRC‑4.0 [1] auxiliary and metadata streams:
because Babeltrace 2 will be a reference implementation of CTF 2, it
makes sense to make an effort to pinpoint the exact location of
syntactic and semantic errors.

More specifically:

* JSON for Modern C++ (by Niels Lohmann) [2] doesn't support text
  location access, although there's a pending pull request (draft as of
  this date) to add such support [3].

* The exceptions of JsonCpp [4] don't contain a text location, only a
  message.

* SimpleJSON [5] doesn't offer text location access and seems to be an
  archived project.

* RapidJSON [6] doesn't offer text location access.

* yajl [7] could offer some form of text location access (offset, at
  least) with yajl_get_bytes_consumed(), remembering the last offset on
  our side, although I don't know how nice it would play with
  whitespaces.

  That being said, regarding integers, the `yajl_callbacks`
  structure [8] only contains a `yajl_integer` function pointer which
  receives a `long long` value (no direct 64-bit unsigned integer
  support). It's possible to set the `yajl_number` callback for any
  number, but the `yajl_double` callback gets disabled in that case, and
  the callback receives a string which needs further parsing on our
  side: this is pretty much what's implemented `bt2_common::StrScanner`
  anyway.

At this point I stopped searching as I already had a working and tested
string scanner and, as you can see, without comments, `parse-json.hpp`
is only 231 lines of effective code and satisfies all the requirements
above.

You can test bt2_common::parseJson() with a simple program like this:

    #include <iostream>
    #include <cstring>

    #include "parse-json.hpp"

    struct Printer
    {
        void onNull(const bt2_common::TextLoc&)
        {
            std::cout << "null\n";
        }

        template <typename ValT>
        void onScalarVal(const ValT& val, const bt2_common::TextLoc&)
        {
            std::cout << val << '\n';
        }

        void onArrayBegin(const bt2_common::TextLoc&)
        {
            std::cout << "[\n";
        }

        void onArrayEnd(const bt2_common::TextLoc&)
        {
            std::cout << "]\n";
        }

        void onObjBegin(const bt2_common::TextLoc&)
        {
            std::cout << "{\n";
        }

        void onObjKey(const std::string& key,
                      const bt2_common::TextLoc&)
        {
            std::cout << key << ": ";
        }

        void onObjEnd(const bt2_common::TextLoc&)
        {
            std::cout << "}\n";
        }
    };

    int main(const int, const char * const * const argv)
    {
        Printer printer;

        bt2_common::parseJson(argv[1], printer);
    }

Then:

    $ ./test-parse-json 23
    $ ./test-parse-json '"\u03c9 represents angular velocity"'
    $ ./test-parse-json '{"salut": [23, true, 42.4e-9, {"meow": null}]}'
    $ ./test-parse-json 18446744073709551615
    $ ./test-parse-json -9223372036854775808

Also try some parsing errors:

    $ ./test-parse-json '{"salut": [false, 42.4e-9, "meow": null}]}'
    $ ./test-parse-json 18446744073709551616
    $ ./test-parse-json -9223372036854775809
    $ ./test-parse-json '"invalid \u8dkf codepoint"'

[1]: https://diamon.org/ctf/files/CTF2-SPECRC-4.0.html
[2]: https://github.com/nlohmann/json
[3]: nlohmann/json#3165
[4]: https://github.com/open-source-parsers/jsoncpp
[5]: https://github.com/nbsdx/SimpleJSON
[6]: https://rapidjson.org/
[7]: https://github.com/lloyd/yajl
[8]: https://lloyd.github.io/yajl/yajl-2.1.0/structyajl__callbacks.html

Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Change-Id: Id32c2b64723ca50b044369c424fe046c0a183cce
Reviewed-on: https://review.lttng.org/c/babeltrace/+/7411
@gregmarr
Copy link
Contributor

gregmarr commented Jun 18, 2022

@barcode Are you able to add the documentation to finish up this PR? It also needs to be rebased now.

@stale stale bot removed the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Jun 18, 2022
simark pushed a commit to simark/babeltrace that referenced this pull request Jun 29, 2022
This patch adds the bt2_common::parseJson() functions in
`parse-json.hpp`.

Those functions wrap the file-internal
`bt2_common::internal::JsonParser` class of which an instance can parse
a single JSON value, calling specific methods of a JSON event listener
as it processes. Internally, `bt2_common::internal::JsonParser` uses a
string scanner (`bt2_common::StrScanner`).

In searching for a simple JSON parsing solution, I could not find, as of
this date, any project which satisfies the following requirements out of
the box:

* Is well-known, well documented, and well tested.

* Has an MIT-compatible license.

* Parses both unsigned and signed 64-bit integers (range
  -9,223,372,036,854,775,808 to 18,446,744,073,709,551,615).

* Provides an exact text location (offset, line number, column number)
  on parsing error.

* Provides an exact text location (offset, line number, column number)
  for each parsed value.

I believe the text locations are essential as this JSON parser will be
used to decode CTF2‑SPECRC‑4.0 [1] auxiliary and metadata streams:
because Babeltrace 2 will be a reference implementation of CTF 2, it
makes sense to make an effort to pinpoint the exact location of
syntactic and semantic errors.

More specifically:

* JSON for Modern C++ (by Niels Lohmann) [2] doesn't support text
  location access, although there's a pending pull request (draft as of
  this date) to add such support [3].

* The exceptions of JsonCpp [4] don't contain a text location, only a
  message.

* SimpleJSON [5] doesn't offer text location access and seems to be an
  archived project.

* RapidJSON [6] doesn't offer text location access.

* yajl [7] could offer some form of text location access (offset, at
  least) with yajl_get_bytes_consumed(), remembering the last offset on
  our side, although I don't know how nice it would play with
  whitespaces.

  That being said, regarding integers, the `yajl_callbacks`
  structure [8] only contains a `yajl_integer` function pointer which
  receives a `long long` value (no direct 64-bit unsigned integer
  support). It's possible to set the `yajl_number` callback for any
  number, but the `yajl_double` callback gets disabled in that case, and
  the callback receives a string which needs further parsing on our
  side: this is pretty much what's implemented `bt2_common::StrScanner`
  anyway.

At this point I stopped searching as I already had a working and tested
string scanner and, as you can see, without comments, `parse-json.hpp`
is only 231 lines of effective code and satisfies all the requirements
above.

You can test bt2_common::parseJson() with a simple program like this:

    #include <iostream>
    #include <cstring>

    #include "parse-json.hpp"

    struct Printer
    {
        void onNull(const bt2_common::TextLoc&)
        {
            std::cout << "null\n";
        }

        template <typename ValT>
        void onScalarVal(const ValT& val, const bt2_common::TextLoc&)
        {
            std::cout << val << '\n';
        }

        void onArrayBegin(const bt2_common::TextLoc&)
        {
            std::cout << "[\n";
        }

        void onArrayEnd(const bt2_common::TextLoc&)
        {
            std::cout << "]\n";
        }

        void onObjBegin(const bt2_common::TextLoc&)
        {
            std::cout << "{\n";
        }

        void onObjKey(const std::string& key,
                      const bt2_common::TextLoc&)
        {
            std::cout << key << ": ";
        }

        void onObjEnd(const bt2_common::TextLoc&)
        {
            std::cout << "}\n";
        }
    };

    int main(const int, const char * const * const argv)
    {
        Printer printer;

        bt2_common::parseJson(argv[1], printer);
    }

Then:

    $ ./test-parse-json 23
    $ ./test-parse-json '"\u03c9 represents angular velocity"'
    $ ./test-parse-json '{"salut": [23, true, 42.4e-9, {"meow": null}]}'
    $ ./test-parse-json 18446744073709551615
    $ ./test-parse-json -9223372036854775808

Also try some parsing errors:

    $ ./test-parse-json '{"salut": [false, 42.4e-9, "meow": null}]}'
    $ ./test-parse-json 18446744073709551616
    $ ./test-parse-json -9223372036854775809
    $ ./test-parse-json '"invalid \u8dkf codepoint"'

[1]: https://diamon.org/ctf/files/CTF2-SPECRC-4.0.html
[2]: https://github.com/nlohmann/json
[3]: nlohmann/json#3165
[4]: https://github.com/open-source-parsers/jsoncpp
[5]: https://github.com/nbsdx/SimpleJSON
[6]: https://rapidjson.org/
[7]: https://github.com/lloyd/yajl
[8]: https://lloyd.github.io/yajl/yajl-2.1.0/structyajl__callbacks.html

Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Change-Id: Id32c2b64723ca50b044369c424fe046c0a183cce
Reviewed-on: https://review.lttng.org/c/babeltrace/+/7411
simark pushed a commit to simark/babeltrace that referenced this pull request Jun 30, 2022
This patch adds the bt2_common::parseJson() functions in
`parse-json.hpp`.

Those functions wrap the file-internal
`bt2_common::internal::JsonParser` class of which an instance can parse
a single JSON value, calling specific methods of a JSON event listener
as it processes. Internally, `bt2_common::internal::JsonParser` uses a
string scanner (`bt2_common::StrScanner`).

In searching for a simple JSON parsing solution, I could not find, as of
this date, any project which satisfies the following requirements out of
the box:

* Is well-known, well documented, and well tested.

* Has an MIT-compatible license.

* Parses both unsigned and signed 64-bit integers (range
  -9,223,372,036,854,775,808 to 18,446,744,073,709,551,615).

* Provides an exact text location (offset, line number, column number)
  on parsing error.

* Provides an exact text location (offset, line number, column number)
  for each parsed value.

I believe the text locations are essential as this JSON parser will be
used to decode CTF2‑SPECRC‑4.0 [1] auxiliary and metadata streams:
because Babeltrace 2 will be a reference implementation of CTF 2, it
makes sense to make an effort to pinpoint the exact location of
syntactic and semantic errors.

More specifically:

* JSON for Modern C++ (by Niels Lohmann) [2] doesn't support text
  location access, although there's a pending pull request (draft as of
  this date) to add such support [3].

* The exceptions of JsonCpp [4] don't contain a text location, only a
  message.

* SimpleJSON [5] doesn't offer text location access and seems to be an
  archived project.

* RapidJSON [6] doesn't offer text location access.

* yajl [7] could offer some form of text location access (offset, at
  least) with yajl_get_bytes_consumed(), remembering the last offset on
  our side, although I don't know how nice it would play with
  whitespaces.

  That being said, regarding integers, the `yajl_callbacks`
  structure [8] only contains a `yajl_integer` function pointer which
  receives a `long long` value (no direct 64-bit unsigned integer
  support). It's possible to set the `yajl_number` callback for any
  number, but the `yajl_double` callback gets disabled in that case, and
  the callback receives a string which needs further parsing on our
  side: this is pretty much what's implemented `bt2_common::StrScanner`
  anyway.

At this point I stopped searching as I already had a working and tested
string scanner and, as you can see, without comments, `parse-json.hpp`
is only 231 lines of effective code and satisfies all the requirements
above.

You can test bt2_common::parseJson() with a simple program like this:

    #include <iostream>
    #include <cstring>

    #include "parse-json.hpp"

    struct Printer
    {
        void onNull(const bt2_common::TextLoc&)
        {
            std::cout << "null\n";
        }

        template <typename ValT>
        void onScalarVal(const ValT& val, const bt2_common::TextLoc&)
        {
            std::cout << val << '\n';
        }

        void onArrayBegin(const bt2_common::TextLoc&)
        {
            std::cout << "[\n";
        }

        void onArrayEnd(const bt2_common::TextLoc&)
        {
            std::cout << "]\n";
        }

        void onObjBegin(const bt2_common::TextLoc&)
        {
            std::cout << "{\n";
        }

        void onObjKey(const std::string& key,
                      const bt2_common::TextLoc&)
        {
            std::cout << key << ": ";
        }

        void onObjEnd(const bt2_common::TextLoc&)
        {
            std::cout << "}\n";
        }
    };

    int main(const int, const char * const * const argv)
    {
        Printer printer;

        bt2_common::parseJson(argv[1], printer);
    }

Then:

    $ ./test-parse-json 23
    $ ./test-parse-json '"\u03c9 represents angular velocity"'
    $ ./test-parse-json '{"salut": [23, true, 42.4e-9, {"meow": null}]}'
    $ ./test-parse-json 18446744073709551615
    $ ./test-parse-json -9223372036854775808

Also try some parsing errors:

    $ ./test-parse-json '{"salut": [false, 42.4e-9, "meow": null}]}'
    $ ./test-parse-json 18446744073709551616
    $ ./test-parse-json -9223372036854775809
    $ ./test-parse-json '"invalid \u8dkf codepoint"'

[1]: https://diamon.org/ctf/files/CTF2-SPECRC-4.0.html
[2]: https://github.com/nlohmann/json
[3]: nlohmann/json#3165
[4]: https://github.com/open-source-parsers/jsoncpp
[5]: https://github.com/nbsdx/SimpleJSON
[6]: https://rapidjson.org/
[7]: https://github.com/lloyd/yajl
[8]: https://lloyd.github.io/yajl/yajl-2.1.0/structyajl__callbacks.html

Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Change-Id: Id32c2b64723ca50b044369c424fe046c0a183cce
Reviewed-on: https://review.lttng.org/c/babeltrace/+/7411
eepp added a commit to eepp/babeltrace that referenced this pull request Jul 5, 2022
This patch adds the bt2_common::parseJson() functions in
`parse-json.hpp`.

Those functions wrap the file-internal
`bt2_common::internal::JsonParser` class of which an instance can parse
a single JSON value, calling specific methods of a JSON event listener
as it processes. Internally, `bt2_common::internal::JsonParser` uses a
string scanner (`bt2_common::StrScanner`).

In searching for a simple JSON parsing solution, I could not find, as of
this date, any project which satisfies the following requirements out of
the box:

* Is well-known, well documented, and well tested.

* Has an MIT-compatible license.

* Parses both unsigned and signed 64-bit integers (range
  -9,223,372,036,854,775,808 to 18,446,744,073,709,551,615).

* Provides an exact text location (offset, line number, column number)
  on parsing error.

* Provides an exact text location (offset, line number, column number)
  for each parsed value.

I believe the text locations are essential as this JSON parser will be
used to decode CTF2‑SPECRC‑4.0 [1] auxiliary and metadata streams:
because Babeltrace 2 will be a reference implementation of CTF 2, it
makes sense to make an effort to pinpoint the exact location of
syntactic and semantic errors.

More specifically:

* JSON for Modern C++ (by Niels Lohmann) [2] doesn't support text
  location access, although there's a pending pull request (draft as of
  this date) to add such support [3].

* The exceptions of JsonCpp [4] don't contain a text location, only a
  message.

* SimpleJSON [5] doesn't offer text location access and seems to be an
  archived project.

* RapidJSON [6] doesn't offer text location access.

* yajl [7] could offer some form of text location access (offset, at
  least) with yajl_get_bytes_consumed(), remembering the last offset on
  our side, although I don't know how nice it would play with
  whitespaces.

  That being said, regarding integers, the `yajl_callbacks`
  structure [8] only contains a `yajl_integer` function pointer which
  receives a `long long` value (no direct 64-bit unsigned integer
  support). It's possible to set the `yajl_number` callback for any
  number, but the `yajl_double` callback gets disabled in that case, and
  the callback receives a string which needs further parsing on our
  side: this is pretty much what's implemented `bt2_common::StrScanner`
  anyway.

At this point I stopped searching as I already had a working and tested
string scanner and, as you can see, without comments, `parse-json.hpp`
is only 231 lines of effective code and satisfies all the requirements
above.

You can test bt2_common::parseJson() with a simple program like this:

    #include <iostream>
    #include <cstring>

    #include "parse-json.hpp"

    struct Printer
    {
        void onNull(const bt2_common::TextLoc&)
        {
            std::cout << "null\n";
        }

        template <typename ValT>
        void onScalarVal(const ValT& val, const bt2_common::TextLoc&)
        {
            std::cout << val << '\n';
        }

        void onArrayBegin(const bt2_common::TextLoc&)
        {
            std::cout << "[\n";
        }

        void onArrayEnd(const bt2_common::TextLoc&)
        {
            std::cout << "]\n";
        }

        void onObjBegin(const bt2_common::TextLoc&)
        {
            std::cout << "{\n";
        }

        void onObjKey(const std::string& key,
                      const bt2_common::TextLoc&)
        {
            std::cout << key << ": ";
        }

        void onObjEnd(const bt2_common::TextLoc&)
        {
            std::cout << "}\n";
        }
    };

    int main(const int, const char * const * const argv)
    {
        Printer printer;

        bt2_common::parseJson(argv[1], printer);
    }

Then:

    $ ./test-parse-json 23
    $ ./test-parse-json '"\u03c9 represents angular velocity"'
    $ ./test-parse-json '{"salut": [23, true, 42.4e-9, {"meow": null}]}'
    $ ./test-parse-json 18446744073709551615
    $ ./test-parse-json -9223372036854775808

Also try some parsing errors:

    $ ./test-parse-json '{"salut": [false, 42.4e-9, "meow": null}]}'
    $ ./test-parse-json 18446744073709551616
    $ ./test-parse-json -9223372036854775809
    $ ./test-parse-json '"invalid \u8dkf codepoint"'

[1]: https://diamon.org/ctf/files/CTF2-SPECRC-4.0.html
[2]: https://github.com/nlohmann/json
[3]: nlohmann/json#3165
[4]: https://github.com/open-source-parsers/jsoncpp
[5]: https://github.com/nbsdx/SimpleJSON
[6]: https://rapidjson.org/
[7]: https://github.com/lloyd/yajl
[8]: https://lloyd.github.io/yajl/yajl-2.1.0/structyajl__callbacks.html

Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Change-Id: Id32c2b64723ca50b044369c424fe046c0a183cce
Reviewed-on: https://review.lttng.org/c/babeltrace/+/7411
eepp added a commit to eepp/babeltrace that referenced this pull request Jul 7, 2022
This patch adds the bt2_common::parseJson() functions in
`parse-json.hpp`.

Those functions wrap the file-internal
`bt2_common::internal::JsonParser` class of which an instance can parse
a single JSON value, calling specific methods of a JSON event listener
as it processes. Internally, `bt2_common::internal::JsonParser` uses a
string scanner (`bt2_common::StrScanner`).

In searching for a simple JSON parsing solution, I could not find, as of
this date, any project which satisfies the following requirements out of
the box:

* Is well-known, well documented, and well tested.

* Has an MIT-compatible license.

* Parses both unsigned and signed 64-bit integers (range
  -9,223,372,036,854,775,808 to 18,446,744,073,709,551,615).

* Provides an exact text location (offset, line number, column number)
  on parsing error.

* Provides an exact text location (offset, line number, column number)
  for each parsed value.

I believe the text locations are essential as this JSON parser will be
used to decode CTF2‑SPECRC‑4.0 [1] auxiliary and metadata streams:
because Babeltrace 2 will be a reference implementation of CTF 2, it
makes sense to make an effort to pinpoint the exact location of
syntactic and semantic errors.

More specifically:

* JSON for Modern C++ (by Niels Lohmann) [2] doesn't support text
  location access, although there's a pending pull request (draft as of
  this date) to add such support [3].

* The exceptions of JsonCpp [4] don't contain a text location, only a
  message.

* SimpleJSON [5] doesn't offer text location access and seems to be an
  archived project.

* RapidJSON [6] doesn't offer text location access.

* yajl [7] could offer some form of text location access (offset, at
  least) with yajl_get_bytes_consumed(), remembering the last offset on
  our side, although I don't know how nice it would play with
  whitespaces.

  That being said, regarding integers, the `yajl_callbacks`
  structure [8] only contains a `yajl_integer` function pointer which
  receives a `long long` value (no direct 64-bit unsigned integer
  support). It's possible to set the `yajl_number` callback for any
  number, but the `yajl_double` callback gets disabled in that case, and
  the callback receives a string which needs further parsing on our
  side: this is pretty much what's implemented `bt2_common::StrScanner`
  anyway.

At this point I stopped searching as I already had a working and tested
string scanner and, as you can see, without comments, `parse-json.hpp`
is only 231 lines of effective code and satisfies all the requirements
above.

You can test bt2_common::parseJson() with a simple program like this:

    #include <iostream>
    #include <cstring>

    #include "parse-json.hpp"

    struct Printer
    {
        void onNull(const bt2_common::TextLoc&)
        {
            std::cout << "null\n";
        }

        template <typename ValT>
        void onScalarVal(const ValT& val, const bt2_common::TextLoc&)
        {
            std::cout << val << '\n';
        }

        void onArrayBegin(const bt2_common::TextLoc&)
        {
            std::cout << "[\n";
        }

        void onArrayEnd(const bt2_common::TextLoc&)
        {
            std::cout << "]\n";
        }

        void onObjBegin(const bt2_common::TextLoc&)
        {
            std::cout << "{\n";
        }

        void onObjKey(const std::string& key,
                      const bt2_common::TextLoc&)
        {
            std::cout << key << ": ";
        }

        void onObjEnd(const bt2_common::TextLoc&)
        {
            std::cout << "}\n";
        }
    };

    int main(const int, const char * const * const argv)
    {
        Printer printer;

        bt2_common::parseJson(argv[1], printer);
    }

Then:

    $ ./test-parse-json 23
    $ ./test-parse-json '"\u03c9 represents angular velocity"'
    $ ./test-parse-json '{"salut": [23, true, 42.4e-9, {"meow": null}]}'
    $ ./test-parse-json 18446744073709551615
    $ ./test-parse-json -9223372036854775808

Also try some parsing errors:

    $ ./test-parse-json '{"salut": [false, 42.4e-9, "meow": null}]}'
    $ ./test-parse-json 18446744073709551616
    $ ./test-parse-json -9223372036854775809
    $ ./test-parse-json '"invalid \u8dkf codepoint"'

[1]: https://diamon.org/ctf/files/CTF2-SPECRC-4.0.html
[2]: https://github.com/nlohmann/json
[3]: nlohmann/json#3165
[4]: https://github.com/open-source-parsers/jsoncpp
[5]: https://github.com/nbsdx/SimpleJSON
[6]: https://rapidjson.org/
[7]: https://github.com/lloyd/yajl
[8]: https://lloyd.github.io/yajl/yajl-2.1.0/structyajl__callbacks.html

Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Change-Id: Id32c2b64723ca50b044369c424fe046c0a183cce
Reviewed-on: https://review.lttng.org/c/babeltrace/+/7411
eepp added a commit to eepp/babeltrace that referenced this pull request Aug 2, 2022
This patch adds the bt2_common::parseJson() functions in
`parse-json.hpp`.

Those functions wrap the file-internal
`bt2_common::internal::JsonParser` class of which an instance can parse
a single JSON value, calling specific methods of a JSON event listener
as it processes. Internally, `bt2_common::internal::JsonParser` uses a
string scanner (`bt2_common::StrScanner`).

In searching for a simple JSON parsing solution, I could not find, as of
this date, any project which satisfies the following requirements out of
the box:

* Is well-known, well documented, and well tested.

* Has an MIT-compatible license.

* Parses both unsigned and signed 64-bit integers (range
  -9,223,372,036,854,775,808 to 18,446,744,073,709,551,615).

* Provides an exact text location (offset, line number, column number)
  on parsing error (through logging and the message of an error cause).

* Provides an exact text location (offset, line number, column number)
  for each parsed value.

I believe the text locations are essential as this JSON parser will be
used to decode CTF2‑SPECRC‑4.0 [1] auxiliary and metadata streams:
because Babeltrace 2 will be a reference implementation of CTF 2, it
makes sense to make an effort to pinpoint the exact location of
syntactic and semantic errors.

More specifically:

* JSON for Modern C++ (by Niels Lohmann) [2] doesn't support text
  location access, although there's a pending pull request (draft as of
  this date) to add such support [3].

* The exceptions of JsonCpp [4] don't contain a text location, only a
  message.

* SimpleJSON [5] doesn't offer text location access and seems to be an
  archived project.

* RapidJSON [6] doesn't offer text location access.

* yajl [7] could offer some form of text location access (offset, at
  least) with yajl_get_bytes_consumed(), remembering the last offset on
  our side, although I don't know how nice it would play with
  whitespaces.

  That being said, regarding integers, the `yajl_callbacks`
  structure [8] only contains a `yajl_integer` function pointer which
  receives a `long long` value (no direct 64-bit unsigned integer
  support). It's possible to set the `yajl_number` callback for any
  number, but the `yajl_double` callback gets disabled in that case, and
  the callback receives a string which needs further parsing on our
  side: this is pretty much what's implemented `bt2_common::StrScanner`
  anyway.

At this point I stopped searching as I already had a working and tested
string scanner and, as you can see, without comments, `parse-json.hpp`
is only 231 lines of effective code and satisfies all the requirements
above.

You can test bt2_common::parseJson() with a simple program like this:

    #include <iostream>
    #include <cstring>

    #include "parse-json.hpp"

    struct Printer
    {
        void onNull(const bt2_common::TextLoc&)
        {
            std::cout << "null\n";
        }

        template <typename ValT>
        void onScalarVal(const ValT& val, const bt2_common::TextLoc&)
        {
            std::cout << val << '\n';
        }

        void onArrayBegin(const bt2_common::TextLoc&)
        {
            std::cout << "[\n";
        }

        void onArrayEnd(const bt2_common::TextLoc&)
        {
            std::cout << "]\n";
        }

        void onObjBegin(const bt2_common::TextLoc&)
        {
            std::cout << "{\n";
        }

        void onObjKey(const std::string& key,
                      const bt2_common::TextLoc&)
        {
            std::cout << key << ": ";
        }

        void onObjEnd(const bt2_common::TextLoc&)
        {
            std::cout << "}\n";
        }
    };

    int main(const int, const char * const * const argv)
    {
        Printer printer;

        bt2_common::parseJson(argv[1], printer);
    }

Then:

    $ ./test-parse-json 23
    $ ./test-parse-json '"\u03c9 represents angular velocity"'
    $ ./test-parse-json '{"salut": [23, true, 42.4e-9, {"meow": null}]}'
    $ ./test-parse-json 18446744073709551615
    $ ./test-parse-json -9223372036854775808

Also try some parsing errors:

    $ ./test-parse-json '{"salut": [false, 42.4e-9, "meow": null}]}'
    $ ./test-parse-json 18446744073709551616
    $ ./test-parse-json -9223372036854775809
    $ ./test-parse-json '"invalid \u8dkf codepoint"'

[1]: https://diamon.org/ctf/files/CTF2-SPECRC-4.0.html
[2]: https://github.com/nlohmann/json
[3]: nlohmann/json#3165
[4]: https://github.com/open-source-parsers/jsoncpp
[5]: https://github.com/nbsdx/SimpleJSON
[6]: https://rapidjson.org/
[7]: https://github.com/lloyd/yajl
[8]: https://lloyd.github.io/yajl/yajl-2.1.0/structyajl__callbacks.html

Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Change-Id: Id32c2b64723ca50b044369c424fe046c0a183cce
Reviewed-on: https://review.lttng.org/c/babeltrace/+/7411
PSRCode pushed a commit to PSRCode/babeltrace that referenced this pull request Aug 12, 2022
This patch adds the bt2_common::parseJson() functions in
`parse-json.hpp`.

Those functions wrap the file-internal
`bt2_common::internal::JsonParser` class of which an instance can parse
a single JSON value, calling specific methods of a JSON event listener
as it processes. Internally, `bt2_common::internal::JsonParser` uses a
string scanner (`bt2_common::StrScanner`).

In searching for a simple JSON parsing solution, I could not find, as of
this date, any project which satisfies the following requirements out of
the box:

* Is well-known, well documented, and well tested.

* Has an MIT-compatible license.

* Parses both unsigned and signed 64-bit integers (range
  -9,223,372,036,854,775,808 to 18,446,744,073,709,551,615).

* Provides an exact text location (offset, line number, column number)
  on parsing error (through logging and the message of an error cause).

* Provides an exact text location (offset, line number, column number)
  for each parsed value.

I believe the text locations are essential as this JSON parser will be
used to decode CTF2‑SPECRC‑4.0 [1] auxiliary and metadata streams:
because Babeltrace 2 will be a reference implementation of CTF 2, it
makes sense to make an effort to pinpoint the exact location of
syntactic and semantic errors.

More specifically:

* JSON for Modern C++ (by Niels Lohmann) [2] doesn't support text
  location access, although there's a pending pull request (draft as of
  this date) to add such support [3].

* The exceptions of JsonCpp [4] don't contain a text location, only a
  message.

* SimpleJSON [5] doesn't offer text location access and seems to be an
  archived project.

* RapidJSON [6] doesn't offer text location access.

* yajl [7] could offer some form of text location access (offset, at
  least) with yajl_get_bytes_consumed(), remembering the last offset on
  our side, although I don't know how nice it would play with
  whitespaces.

  That being said, regarding integers, the `yajl_callbacks`
  structure [8] only contains a `yajl_integer` function pointer which
  receives a `long long` value (no direct 64-bit unsigned integer
  support). It's possible to set the `yajl_number` callback for any
  number, but the `yajl_double` callback gets disabled in that case, and
  the callback receives a string which needs further parsing on our
  side: this is pretty much what's implemented `bt2_common::StrScanner`
  anyway.

At this point I stopped searching as I already had a working and tested
string scanner and, as you can see, without comments, `parse-json.hpp`
is only 231 lines of effective code and satisfies all the requirements
above.

You can test bt2_common::parseJson() with a simple program like this:

    #include <iostream>
    #include <cstring>

    #include "parse-json.hpp"

    struct Printer
    {
        void onNull(const bt2_common::TextLoc&)
        {
            std::cout << "null\n";
        }

        template <typename ValT>
        void onScalarVal(const ValT& val, const bt2_common::TextLoc&)
        {
            std::cout << val << '\n';
        }

        void onArrayBegin(const bt2_common::TextLoc&)
        {
            std::cout << "[\n";
        }

        void onArrayEnd(const bt2_common::TextLoc&)
        {
            std::cout << "]\n";
        }

        void onObjBegin(const bt2_common::TextLoc&)
        {
            std::cout << "{\n";
        }

        void onObjKey(const std::string& key,
                      const bt2_common::TextLoc&)
        {
            std::cout << key << ": ";
        }

        void onObjEnd(const bt2_common::TextLoc&)
        {
            std::cout << "}\n";
        }
    };

    int main(const int, const char * const * const argv)
    {
        Printer printer;

        bt2_common::parseJson(argv[1], printer);
    }

Then:

    $ ./test-parse-json 23
    $ ./test-parse-json '"\u03c9 represents angular velocity"'
    $ ./test-parse-json '{"salut": [23, true, 42.4e-9, {"meow": null}]}'
    $ ./test-parse-json 18446744073709551615
    $ ./test-parse-json -9223372036854775808

Also try some parsing errors:

    $ ./test-parse-json '{"salut": [false, 42.4e-9, "meow": null}]}'
    $ ./test-parse-json 18446744073709551616
    $ ./test-parse-json -9223372036854775809
    $ ./test-parse-json '"invalid \u8dkf codepoint"'

[1]: https://diamon.org/ctf/files/CTF2-SPECRC-4.0.html
[2]: https://github.com/nlohmann/json
[3]: nlohmann/json#3165
[4]: https://github.com/open-source-parsers/jsoncpp
[5]: https://github.com/nbsdx/SimpleJSON
[6]: https://rapidjson.org/
[7]: https://github.com/lloyd/yajl
[8]: https://lloyd.github.io/yajl/yajl-2.1.0/structyajl__callbacks.html

Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Change-Id: Id32c2b64723ca50b044369c424fe046c0a183cce
Reviewed-on: https://review.lttng.org/c/babeltrace/+/7411
eepp added a commit to eepp/babeltrace that referenced this pull request Aug 26, 2022
This patch adds the bt2_common::parseJson() functions in
`parse-json.hpp`.

Those functions wrap the file-internal
`bt2_common::internal::JsonParser` class of which an instance can parse
a single JSON value, calling specific methods of a JSON event listener
as it processes. Internally, `bt2_common::internal::JsonParser` uses a
string scanner (`bt2_common::StrScanner`).

In searching for a simple JSON parsing solution, I could not find, as of
this date, any project which satisfies the following requirements out of
the box:

* Is well-known, well documented, and well tested.

* Has an MIT-compatible license.

* Parses both unsigned and signed 64-bit integers (range
  -9,223,372,036,854,775,808 to 18,446,744,073,709,551,615).

* Provides an exact text location (offset, line number, column number)
  on parsing error (through logging and the message of an error cause).

* Provides an exact text location (offset, line number, column number)
  for each parsed value.

I believe the text locations are essential as this JSON parser will be
used to decode CTF2‑SPECRC‑4.0 [1] auxiliary and metadata streams:
because Babeltrace 2 will be a reference implementation of CTF 2, it
makes sense to make an effort to pinpoint the exact location of
syntactic and semantic errors.

More specifically:

* JSON for Modern C++ (by Niels Lohmann) [2] doesn't support text
  location access, although there's a pending pull request (draft as of
  this date) to add such support [3].

* The exceptions of JsonCpp [4] don't contain a text location, only a
  message.

* SimpleJSON [5] doesn't offer text location access and seems to be an
  archived project.

* RapidJSON [6] doesn't offer text location access.

* yajl [7] could offer some form of text location access (offset, at
  least) with yajl_get_bytes_consumed(), remembering the last offset on
  our side, although I don't know how nice it would play with
  whitespaces.

  That being said, regarding integers, the `yajl_callbacks`
  structure [8] only contains a `yajl_integer` function pointer which
  receives a `long long` value (no direct 64-bit unsigned integer
  support). It's possible to set the `yajl_number` callback for any
  number, but the `yajl_double` callback gets disabled in that case, and
  the callback receives a string which needs further parsing on our
  side: this is pretty much what's implemented `bt2_common::StrScanner`
  anyway.

At this point I stopped searching as I already had a working and tested
string scanner and, as you can see, without comments, `parse-json.hpp`
is only 231 lines of effective code and satisfies all the requirements
above.

You can test bt2_common::parseJson() with a simple program like this:

    #include <iostream>
    #include <cstring>

    #include "parse-json.hpp"

    struct Printer
    {
        void onNull(const bt2_common::TextLoc&)
        {
            std::cout << "null\n";
        }

        template <typename ValT>
        void onScalarVal(const ValT& val, const bt2_common::TextLoc&)
        {
            std::cout << val << '\n';
        }

        void onArrayBegin(const bt2_common::TextLoc&)
        {
            std::cout << "[\n";
        }

        void onArrayEnd(const bt2_common::TextLoc&)
        {
            std::cout << "]\n";
        }

        void onObjBegin(const bt2_common::TextLoc&)
        {
            std::cout << "{\n";
        }

        void onObjKey(const std::string& key,
                      const bt2_common::TextLoc&)
        {
            std::cout << key << ": ";
        }

        void onObjEnd(const bt2_common::TextLoc&)
        {
            std::cout << "}\n";
        }
    };

    int main(const int, const char * const * const argv)
    {
        Printer printer;

        bt2_common::parseJson(argv[1], printer);
    }

Then:

    $ ./test-parse-json 23
    $ ./test-parse-json '"\u03c9 represents angular velocity"'
    $ ./test-parse-json '{"salut": [23, true, 42.4e-9, {"meow": null}]}'
    $ ./test-parse-json 18446744073709551615
    $ ./test-parse-json -9223372036854775808

Also try some parsing errors:

    $ ./test-parse-json '{"salut": [false, 42.4e-9, "meow": null}]}'
    $ ./test-parse-json 18446744073709551616
    $ ./test-parse-json -9223372036854775809
    $ ./test-parse-json '"invalid \u8dkf codepoint"'

[1]: https://diamon.org/ctf/files/CTF2-SPECRC-4.0.html
[2]: https://github.com/nlohmann/json
[3]: nlohmann/json#3165
[4]: https://github.com/open-source-parsers/jsoncpp
[5]: https://github.com/nbsdx/SimpleJSON
[6]: https://rapidjson.org/
[7]: https://github.com/lloyd/yajl
[8]: https://lloyd.github.io/yajl/yajl-2.1.0/structyajl__callbacks.html

Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Change-Id: Id32c2b64723ca50b044369c424fe046c0a183cce
Reviewed-on: https://review.lttng.org/c/babeltrace/+/7411
@raphael-grimm raphael-grimm force-pushed the additional_information_for_sax_parser_via_next_token_start_stop branch from 94cff5d to f60c41e Compare December 16, 2022 10:10
@github-actions
Copy link

🔴 Amalgamation check failed! 🔴

The source code has not been amalgamated.

@raphael-grimm
Copy link

This PR is not dead, it was just sleeping very deeply for a very long time... I finally have some time to work on this again.
(I am the same guy as barcode. Commits / Messages are now from my second account since it is on company time.)
I will update the code and write some documentation. After I am done, I will remove the draft status.

@raphael-grimm raphael-grimm force-pushed the additional_information_for_sax_parser_via_next_token_start_stop branch 2 times, most recently from ab6bb9a to 4617679 Compare December 19, 2022 17:17
@raphael-grimm
Copy link

I updated the code and added some documentation.

Currently there is one thing I am unhappy about:
When parsing json data it would be nice to have more information than just the current byte (e.g., line + col).
Currently I achieve this by passing the lexer to the methods (in case they support this). Then the lexer can be used to get additional information about the current parse.
Primarily lex.get_position() would be of interest, but this way it is possible to add some additional introspection methods to the lexer and have them directly available in the SAX parser.

This comes with additional complexity in the user interface (compared with directly passing nlohmann::detail::position_t ). I don’t think this kind of future proofing can justify this additional complexity.
I think it would be better to directly pass nlohmann::detail::position_t. This makes the user interface and documentation easier to understand.
Also we should move nlohmann::detail::position_t to nlohmann::position_t to prevent leaking the nlohmann::detail namespace.

Before changing to nlohmann::detail::position_t, I wanted to ask what your (@gregmarr / @nlohmann) opinion is on this matter. Is passing the lexer worth the additional complexity (maybe there is some issue where this would be useful) or do you see any issue with me moving nlohmann::detail::position_t to nlohmann::position_t?

@barcode
Copy link
Contributor Author

barcode commented Dec 23, 2022

I switched to using nlohmann::position_t and added some more documentation.

@barcode barcode force-pushed the additional_information_for_sax_parser_via_next_token_start_stop branch 3 times, most recently from 026a36c to 4d97ee5 Compare December 24, 2022 12:05
@barcode barcode force-pushed the additional_information_for_sax_parser_via_next_token_start_stop branch 2 times, most recently from db38dc1 to 7d91147 Compare December 25, 2022 22:59
@barcode barcode force-pushed the additional_information_for_sax_parser_via_next_token_start_stop branch from 7d91147 to dbdee8e Compare January 31, 2023 18:32
@barcode
Copy link
Contributor Author

barcode commented Feb 1, 2023

Work on this PR is done and it only needs a review.

@barcode barcode force-pushed the additional_information_for_sax_parser_via_next_token_start_stop branch from dfa986c to c8845d3 Compare February 2, 2023 20:23
eepp added a commit to eepp/babeltrace that referenced this pull request May 20, 2023
This patch adds the bt2_common::parseJson() functions in
`parse-json.hpp`.

Those functions wrap the file-internal
`bt2_common::internal::JsonParser` class of which an instance can parse
a single JSON value, calling specific methods of a JSON event listener
as it processes. Internally, `bt2_common::internal::JsonParser` uses a
string scanner (`bt2_common::StrScanner`).

In searching for a simple JSON parsing solution, I could not find, as of
this date, any project which satisfies the following requirements out of
the box:

* Is well-known, well documented, and well tested.

* Has an MIT-compatible license.

* Parses both unsigned and signed 64-bit integers (range
  -9,223,372,036,854,775,808 to 18,446,744,073,709,551,615).

* Provides an exact text location (offset, line number, column number)
  on parsing error (through logging and the message of an error cause).

* Provides an exact text location (offset, line number, column number)
  for each parsed value.

I believe the text locations are essential as this JSON parser will be
used to decode CTF2‑SPECRC‑4.0 [1] auxiliary and metadata streams:
because Babeltrace 2 will be a reference implementation of CTF 2, it
makes sense to make an effort to pinpoint the exact location of
syntactic and semantic errors.

More specifically:

* JSON for Modern C++ (by Niels Lohmann) [2] doesn't support text
  location access, although there's a pending pull request (draft as of
  this date) to add such support [3].

* The exceptions of JsonCpp [4] don't contain a text location, only a
  message.

* SimpleJSON [5] doesn't offer text location access and seems to be an
  archived project.

* RapidJSON [6] doesn't offer text location access.

* yajl [7] could offer some form of text location access (offset, at
  least) with yajl_get_bytes_consumed(), remembering the last offset on
  our side, although I don't know how nice it would play with
  whitespaces.

  That being said, regarding integers, the `yajl_callbacks`
  structure [8] only contains a `yajl_integer` function pointer which
  receives a `long long` value (no direct 64-bit unsigned integer
  support). It's possible to set the `yajl_number` callback for any
  number, but the `yajl_double` callback gets disabled in that case, and
  the callback receives a string which needs further parsing on our
  side: this is pretty much what's implemented `bt2_common::StrScanner`
  anyway.

At this point I stopped searching as I already had a working and tested
string scanner and, as you can see, without comments, `parse-json.hpp`
is only 231 lines of effective code and satisfies all the requirements
above.

You can test bt2_common::parseJson() with a simple program like this:

    #include <iostream>
    #include <cstring>

    #include "parse-json.hpp"

    struct Printer
    {
        void onNull(const bt2_common::TextLoc&)
        {
            std::cout << "null\n";
        }

        template <typename ValT>
        void onScalarVal(const ValT& val, const bt2_common::TextLoc&)
        {
            std::cout << val << '\n';
        }

        void onArrayBegin(const bt2_common::TextLoc&)
        {
            std::cout << "[\n";
        }

        void onArrayEnd(const bt2_common::TextLoc&)
        {
            std::cout << "]\n";
        }

        void onObjBegin(const bt2_common::TextLoc&)
        {
            std::cout << "{\n";
        }

        void onObjKey(const std::string& key,
                      const bt2_common::TextLoc&)
        {
            std::cout << key << ": ";
        }

        void onObjEnd(const bt2_common::TextLoc&)
        {
            std::cout << "}\n";
        }
    };

    int main(const int, const char * const * const argv)
    {
        Printer printer;

        bt2_common::parseJson(argv[1], printer);
    }

Then:

    $ ./test-parse-json 23
    $ ./test-parse-json '"\u03c9 represents angular velocity"'
    $ ./test-parse-json '{"salut": [23, true, 42.4e-9, {"meow": null}]}'
    $ ./test-parse-json 18446744073709551615
    $ ./test-parse-json -9223372036854775808

Also try some parsing errors:

    $ ./test-parse-json '{"salut": [false, 42.4e-9, "meow": null}]}'
    $ ./test-parse-json 18446744073709551616
    $ ./test-parse-json -9223372036854775809
    $ ./test-parse-json '"invalid \u8dkf codepoint"'

[1]: https://diamon.org/ctf/files/CTF2-SPECRC-4.0.html
[2]: https://github.com/nlohmann/json
[3]: nlohmann/json#3165
[4]: https://github.com/open-source-parsers/jsoncpp
[5]: https://github.com/nbsdx/SimpleJSON
[6]: https://rapidjson.org/
[7]: https://github.com/lloyd/yajl
[8]: https://lloyd.github.io/yajl/yajl-2.1.0/structyajl__callbacks.html

Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Change-Id: Id32c2b64723ca50b044369c424fe046c0a183cce
Reviewed-on: https://review.lttng.org/c/babeltrace/+/7411
@raphael-grimm raphael-grimm force-pushed the additional_information_for_sax_parser_via_next_token_start_stop branch from c8845d3 to a416868 Compare August 4, 2023 07:17
simark added a commit to simark/babeltrace that referenced this pull request Nov 28, 2023
This patch adds the bt2_common::parseJson() functions in
`parse-json.hpp`.

Those functions wrap the file-internal
`bt2_common::internal::JsonParser` class of which an instance can parse
a single JSON value, calling specific methods of a JSON event listener
as it processes. Internally, `bt2_common::internal::JsonParser` uses a
string scanner (`bt2_common::StrScanner`).

In searching for a simple JSON parsing solution, I could not find, as of
this date, any project which satisfies the following requirements out of
the box:

* Is well-known, well documented, and well tested.

* Has an MIT-compatible license.

* Parses both unsigned and signed 64-bit integers (range
  -9,223,372,036,854,775,808 to 18,446,744,073,709,551,615).

* Provides an exact text location (offset, line number, column number)
  on parsing error (through logging and the message of an error cause).

* Provides an exact text location (offset, line number, column number)
  for each parsed value.

I believe the text locations are essential as this JSON parser will be
used to decode CTF2‑SPECRC‑4.0 [1] auxiliary and metadata streams:
because Babeltrace 2 will be a reference implementation of CTF 2, it
makes sense to make an effort to pinpoint the exact location of
syntactic and semantic errors.

More specifically:

* JSON for Modern C++ (by Niels Lohmann) [2] doesn't support text
  location access, although there's a pending pull request (draft as of
  this date) to add such support [3].

* The exceptions of JsonCpp [4] don't contain a text location, only a
  message.

* SimpleJSON [5] doesn't offer text location access and seems to be an
  archived project.

* RapidJSON [6] doesn't offer text location access.

* yajl [7] could offer some form of text location access (offset, at
  least) with yajl_get_bytes_consumed(), remembering the last offset on
  our side, although I don't know how nice it would play with
  whitespaces.

  That being said, regarding integers, the `yajl_callbacks`
  structure [8] only contains a `yajl_integer` function pointer which
  receives a `long long` value (no direct 64-bit unsigned integer
  support). It's possible to set the `yajl_number` callback for any
  number, but the `yajl_double` callback gets disabled in that case, and
  the callback receives a string which needs further parsing on our
  side: this is pretty much what's implemented `bt2_common::StrScanner`
  anyway.

At this point I stopped searching as I already had a working and tested
string scanner and, as you can see, without comments, `parse-json.hpp`
is only 231 lines of effective code and satisfies all the requirements
above.

You can test bt2_common::parseJson() with a simple program like this:

    #include <iostream>
    #include <cstring>

    #include "parse-json.hpp"

    struct Printer
    {
        void onNull(const bt2_common::TextLoc&)
        {
            std::cout << "null\n";
        }

        template <typename ValT>
        void onScalarVal(const ValT& val, const bt2_common::TextLoc&)
        {
            std::cout << val << '\n';
        }

        void onArrayBegin(const bt2_common::TextLoc&)
        {
            std::cout << "[\n";
        }

        void onArrayEnd(const bt2_common::TextLoc&)
        {
            std::cout << "]\n";
        }

        void onObjBegin(const bt2_common::TextLoc&)
        {
            std::cout << "{\n";
        }

        void onObjKey(const std::string& key,
                      const bt2_common::TextLoc&)
        {
            std::cout << key << ": ";
        }

        void onObjEnd(const bt2_common::TextLoc&)
        {
            std::cout << "}\n";
        }
    };

    int main(const int, const char * const * const argv)
    {
        Printer printer;

        bt2_common::parseJson(argv[1], printer);
    }

Then:

    $ ./test-parse-json 23
    $ ./test-parse-json '"\u03c9 represents angular velocity"'
    $ ./test-parse-json '{"salut": [23, true, 42.4e-9, {"meow": null}]}'
    $ ./test-parse-json 18446744073709551615
    $ ./test-parse-json -9223372036854775808

Also try some parsing errors:

    $ ./test-parse-json '{"salut": [false, 42.4e-9, "meow": null}]}'
    $ ./test-parse-json 18446744073709551616
    $ ./test-parse-json -9223372036854775809
    $ ./test-parse-json '"invalid \u8dkf codepoint"'

[1]: https://diamon.org/ctf/files/CTF2-SPECRC-4.0.html
[2]: https://github.com/nlohmann/json
[3]: nlohmann/json#3165
[4]: https://github.com/open-source-parsers/jsoncpp
[5]: https://github.com/nbsdx/SimpleJSON
[6]: https://rapidjson.org/
[7]: https://github.com/lloyd/yajl
[8]: https://lloyd.github.io/yajl/yajl-2.1.0/structyajl__callbacks.html

Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Change-Id: Id32c2b64723ca50b044369c424fe046c0a183cce
Reviewed-on: https://review.lttng.org/c/babeltrace/+/7411
simark added a commit to simark/babeltrace that referenced this pull request Dec 10, 2023
This patch adds the bt2c::parseJson() functions in `parse-json.hpp`.

Those functions wrap the file-internal `bt2c::internal::JsonParser`
class of which an instance can parse a single JSON value, calling
specific methods of a JSON event listener as it processes. Internally,
`bt2c::internal::JsonParser` uses a string scanner (`bt2c::StrScanner`).

In searching for a simple JSON parsing solution, I could not find, as of
this date, any project which satisfies the following requirements out of
the box:

* Is well-known, well documented, and well tested.

* Has an MIT-compatible license.

* Parses both unsigned and signed 64-bit integers (range
  -9,223,372,036,854,775,808 to 18,446,744,073,709,551,615).

* Provides an exact text location (offset, line number, column number)
  on parsing error (through logging and the message of an error cause).

* Provides an exact text location (offset, line number, column number)
  for each parsed value.

I believe the text locations are essential as this JSON parser will be
used to decode CTF2‑SPECRC‑4.0 [1] auxiliary and metadata streams:
because Babeltrace 2 will be a reference implementation of CTF 2, it
makes sense to make an effort to pinpoint the exact location of
syntactic and semantic errors.

More specifically:

* JSON for Modern C++ (by Niels Lohmann) [2] doesn't support text
  location access, although there's a pending pull request (draft as of
  this date) to add such support [3].

* The exceptions of JsonCpp [4] don't contain a text location, only a
  message.

* SimpleJSON [5] doesn't offer text location access and seems to be an
  archived project.

* RapidJSON [6] doesn't offer text location access.

* yajl [7] could offer some form of text location access (offset, at
  least) with yajl_get_bytes_consumed(), remembering the last offset on
  our side, although I don't know how nice it would play with
  whitespaces.

  That being said, regarding integers, the `yajl_callbacks`
  structure [8] only contains a `yajl_integer` function pointer which
  receives a `long long` value (no direct 64-bit unsigned integer
  support). It's possible to set the `yajl_number` callback for any
  number, but the `yajl_double` callback gets disabled in that case, and
  the callback receives a string which needs further parsing on our
  side: this is pretty much what's implemented `bt2c::StrScanner`
  anyway.

At this point I stopped searching as I already had a working and tested
string scanner and, as you can see, without comments, `parse-json.hpp`
is only 231 lines of effective code and satisfies all the requirements
above.

You can test bt2c::parseJson() with a simple program like this:

    #include <iostream>
    #include <cstring>

    #include "parse-json.hpp"

    struct Printer
    {
        void onNull(const bt2c::TextLoc&)
        {
            std::cout << "null\n";
        }

        template <typename ValT>
        void onScalarVal(const ValT& val, const bt2c::TextLoc&)
        {
            std::cout << val << '\n';
        }

        void onArrayBegin(const bt2c::TextLoc&)
        {
            std::cout << "[\n";
        }

        void onArrayEnd(const bt2c::TextLoc&)
        {
            std::cout << "]\n";
        }

        void onObjBegin(const bt2c::TextLoc&)
        {
            std::cout << "{\n";
        }

        void onObjKey(const std::string& key,
                      const bt2c::TextLoc&)
        {
            std::cout << key << ": ";
        }

        void onObjEnd(const bt2c::TextLoc&)
        {
            std::cout << "}\n";
        }
    };

    int main(const int, const char * const * const argv)
    {
        Printer printer;

        bt2c::parseJson(argv[1], printer);
    }

Then:

    $ ./test-parse-json 23
    $ ./test-parse-json '"\u03c9 represents angular velocity"'
    $ ./test-parse-json '{"salut": [23, true, 42.4e-9, {"meow": null}]}'
    $ ./test-parse-json 18446744073709551615
    $ ./test-parse-json -9223372036854775808

Also try some parsing errors:

    $ ./test-parse-json '{"salut": [false, 42.4e-9, "meow": null}]}'
    $ ./test-parse-json 18446744073709551616
    $ ./test-parse-json -9223372036854775809
    $ ./test-parse-json '"invalid \u8dkf codepoint"'

[1]: https://diamon.org/ctf/files/CTF2-SPECRC-4.0.html
[2]: https://github.com/nlohmann/json
[3]: nlohmann/json#3165
[4]: https://github.com/open-source-parsers/jsoncpp
[5]: https://github.com/nbsdx/SimpleJSON
[6]: https://rapidjson.org/
[7]: https://github.com/lloyd/yajl
[8]: https://lloyd.github.io/yajl/yajl-2.1.0/structyajl__callbacks.html

Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Change-Id: Id32c2b64723ca50b044369c424fe046c0a183cce
Reviewed-on: https://review.lttng.org/c/babeltrace/+/7411
simark added a commit to simark/babeltrace that referenced this pull request Dec 11, 2023
This patch adds the bt2c::parseJson() functions in `parse-json.hpp`.

Those functions wrap the file-internal `bt2c::internal::JsonParser`
class of which an instance can parse a single JSON value, calling
specific methods of a JSON event listener as it processes. Internally,
`bt2c::internal::JsonParser` uses a string scanner (`bt2c::StrScanner`).

In searching for a simple JSON parsing solution, I could not find, as of
this date, any project which satisfies the following requirements out of
the box:

* Is well-known, well documented, and well tested.

* Has an MIT-compatible license.

* Parses both unsigned and signed 64-bit integers (range
  -9,223,372,036,854,775,808 to 18,446,744,073,709,551,615).

* Provides an exact text location (offset, line number, column number)
  on parsing error (through logging and the message of an error cause).

* Provides an exact text location (offset, line number, column number)
  for each parsed value.

I believe the text locations are essential as this JSON parser will be
used to decode CTF2‑SPECRC‑4.0 [1] auxiliary and metadata streams:
because Babeltrace 2 will be a reference implementation of CTF 2, it
makes sense to make an effort to pinpoint the exact location of
syntactic and semantic errors.

More specifically:

* JSON for Modern C++ (by Niels Lohmann) [2] doesn't support text
  location access, although there's a pending pull request (draft as of
  this date) to add such support [3].

* The exceptions of JsonCpp [4] don't contain a text location, only a
  message.

* SimpleJSON [5] doesn't offer text location access and seems to be an
  archived project.

* RapidJSON [6] doesn't offer text location access.

* yajl [7] could offer some form of text location access (offset, at
  least) with yajl_get_bytes_consumed(), remembering the last offset on
  our side, although I don't know how nice it would play with
  whitespaces.

  That being said, regarding integers, the `yajl_callbacks`
  structure [8] only contains a `yajl_integer` function pointer which
  receives a `long long` value (no direct 64-bit unsigned integer
  support). It's possible to set the `yajl_number` callback for any
  number, but the `yajl_double` callback gets disabled in that case, and
  the callback receives a string which needs further parsing on our
  side: this is pretty much what's implemented `bt2c::StrScanner`
  anyway.

At this point I stopped searching as I already had a working and tested
string scanner and, as you can see, without comments, `parse-json.hpp`
is only 231 lines of effective code and satisfies all the requirements
above.

You can test bt2c::parseJson() with a simple program like this:

    #include <iostream>
    #include <cstring>

    #include "parse-json.hpp"

    struct Printer
    {
        void onNull(const bt2c::TextLoc&)
        {
            std::cout << "null\n";
        }

        template <typename ValT>
        void onScalarVal(const ValT& val, const bt2c::TextLoc&)
        {
            std::cout << val << '\n';
        }

        void onArrayBegin(const bt2c::TextLoc&)
        {
            std::cout << "[\n";
        }

        void onArrayEnd(const bt2c::TextLoc&)
        {
            std::cout << "]\n";
        }

        void onObjBegin(const bt2c::TextLoc&)
        {
            std::cout << "{\n";
        }

        void onObjKey(const std::string& key,
                      const bt2c::TextLoc&)
        {
            std::cout << key << ": ";
        }

        void onObjEnd(const bt2c::TextLoc&)
        {
            std::cout << "}\n";
        }
    };

    int main(const int, const char * const * const argv)
    {
        Printer printer;

        bt2c::parseJson(argv[1], printer);
    }

Then:

    $ ./test-parse-json 23
    $ ./test-parse-json '"\u03c9 represents angular velocity"'
    $ ./test-parse-json '{"salut": [23, true, 42.4e-9, {"meow": null}]}'
    $ ./test-parse-json 18446744073709551615
    $ ./test-parse-json -9223372036854775808

Also try some parsing errors:

    $ ./test-parse-json '{"salut": [false, 42.4e-9, "meow": null}]}'
    $ ./test-parse-json 18446744073709551616
    $ ./test-parse-json -9223372036854775809
    $ ./test-parse-json '"invalid \u8dkf codepoint"'

[1]: https://diamon.org/ctf/files/CTF2-SPECRC-4.0.html
[2]: https://github.com/nlohmann/json
[3]: nlohmann/json#3165
[4]: https://github.com/open-source-parsers/jsoncpp
[5]: https://github.com/nbsdx/SimpleJSON
[6]: https://rapidjson.org/
[7]: https://github.com/lloyd/yajl
[8]: https://lloyd.github.io/yajl/yajl-2.1.0/structyajl__callbacks.html

Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Change-Id: Id32c2b64723ca50b044369c424fe046c0a183cce
Reviewed-on: https://review.lttng.org/c/babeltrace/+/7411
simark added a commit to simark/babeltrace that referenced this pull request Dec 12, 2023
This patch adds the bt2c::parseJson() functions in `parse-json.hpp`.

Those functions wrap the file-internal `bt2c::internal::JsonParser`
class of which an instance can parse a single JSON value, calling
specific methods of a JSON event listener as it processes. Internally,
`bt2c::internal::JsonParser` uses a string scanner (`bt2c::StrScanner`).

In searching for a simple JSON parsing solution, I could not find, as of
this date, any project which satisfies the following requirements out of
the box:

* Is well-known, well documented, and well tested.

* Has an MIT-compatible license.

* Parses both unsigned and signed 64-bit integers (range
  -9,223,372,036,854,775,808 to 18,446,744,073,709,551,615).

* Provides an exact text location (offset, line number, column number)
  on parsing error (through logging and the message of an error cause).

* Provides an exact text location (offset, line number, column number)
  for each parsed value.

I believe the text locations are essential as this JSON parser will be
used to decode CTF2‑SPECRC‑4.0 [1] auxiliary and metadata streams:
because Babeltrace 2 will be a reference implementation of CTF 2, it
makes sense to make an effort to pinpoint the exact location of
syntactic and semantic errors.

More specifically:

* JSON for Modern C++ (by Niels Lohmann) [2] doesn't support text
  location access, although there's a pending pull request (draft as of
  this date) to add such support [3].

* The exceptions of JsonCpp [4] don't contain a text location, only a
  message.

* SimpleJSON [5] doesn't offer text location access and seems to be an
  archived project.

* RapidJSON [6] doesn't offer text location access.

* yajl [7] could offer some form of text location access (offset, at
  least) with yajl_get_bytes_consumed(), remembering the last offset on
  our side, although I don't know how nice it would play with
  whitespaces.

  That being said, regarding integers, the `yajl_callbacks`
  structure [8] only contains a `yajl_integer` function pointer which
  receives a `long long` value (no direct 64-bit unsigned integer
  support). It's possible to set the `yajl_number` callback for any
  number, but the `yajl_double` callback gets disabled in that case, and
  the callback receives a string which needs further parsing on our
  side: this is pretty much what's implemented `bt2c::StrScanner`
  anyway.

At this point I stopped searching as I already had a working and tested
string scanner and, as you can see, without comments, `parse-json.hpp`
is only 231 lines of effective code and satisfies all the requirements
above.

You can test bt2c::parseJson() with a simple program like this:

    #include <iostream>
    #include <cstring>

    #include "parse-json.hpp"

    struct Printer
    {
        void onNull(const bt2c::TextLoc&)
        {
            std::cout << "null\n";
        }

        template <typename ValT>
        void onScalarVal(const ValT& val, const bt2c::TextLoc&)
        {
            std::cout << val << '\n';
        }

        void onArrayBegin(const bt2c::TextLoc&)
        {
            std::cout << "[\n";
        }

        void onArrayEnd(const bt2c::TextLoc&)
        {
            std::cout << "]\n";
        }

        void onObjBegin(const bt2c::TextLoc&)
        {
            std::cout << "{\n";
        }

        void onObjKey(const std::string& key,
                      const bt2c::TextLoc&)
        {
            std::cout << key << ": ";
        }

        void onObjEnd(const bt2c::TextLoc&)
        {
            std::cout << "}\n";
        }
    };

    int main(const int, const char * const * const argv)
    {
        Printer printer;

        bt2c::parseJson(argv[1], printer);
    }

Then:

    $ ./test-parse-json 23
    $ ./test-parse-json '"\u03c9 represents angular velocity"'
    $ ./test-parse-json '{"salut": [23, true, 42.4e-9, {"meow": null}]}'
    $ ./test-parse-json 18446744073709551615
    $ ./test-parse-json -9223372036854775808

Also try some parsing errors:

    $ ./test-parse-json '{"salut": [false, 42.4e-9, "meow": null}]}'
    $ ./test-parse-json 18446744073709551616
    $ ./test-parse-json -9223372036854775809
    $ ./test-parse-json '"invalid \u8dkf codepoint"'

[1]: https://diamon.org/ctf/files/CTF2-SPECRC-4.0.html
[2]: https://github.com/nlohmann/json
[3]: nlohmann/json#3165
[4]: https://github.com/open-source-parsers/jsoncpp
[5]: https://github.com/nbsdx/SimpleJSON
[6]: https://rapidjson.org/
[7]: https://github.com/lloyd/yajl
[8]: https://lloyd.github.io/yajl/yajl-2.1.0/structyajl__callbacks.html

Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Change-Id: Id32c2b64723ca50b044369c424fe046c0a183cce
Reviewed-on: https://review.lttng.org/c/babeltrace/+/7411
simark added a commit to simark/babeltrace that referenced this pull request Dec 13, 2023
This patch adds the bt2c::parseJson() functions in `parse-json.hpp`.

Those functions wrap the file-internal `bt2c::internal::JsonParser`
class of which an instance can parse a single JSON value, calling
specific methods of a JSON event listener as it processes. Internally,
`bt2c::internal::JsonParser` uses a string scanner (`bt2c::StrScanner`).

In searching for a simple JSON parsing solution, I could not find, as of
this date, any project which satisfies the following requirements out of
the box:

* Is well-known, well documented, and well tested.

* Has an MIT-compatible license.

* Parses both unsigned and signed 64-bit integers (range
  -9,223,372,036,854,775,808 to 18,446,744,073,709,551,615).

* Provides an exact text location (offset, line number, column number)
  on parsing error (through logging and the message of an error cause).

* Provides an exact text location (offset, line number, column number)
  for each parsed value.

I believe the text locations are essential as this JSON parser will be
used to decode CTF2‑SPECRC‑4.0 [1] auxiliary and metadata streams:
because Babeltrace 2 will be a reference implementation of CTF 2, it
makes sense to make an effort to pinpoint the exact location of
syntactic and semantic errors.

More specifically:

* JSON for Modern C++ (by Niels Lohmann) [2] doesn't support text
  location access, although there's a pending pull request (draft as of
  this date) to add such support [3].

* The exceptions of JsonCpp [4] don't contain a text location, only a
  message.

* SimpleJSON [5] doesn't offer text location access and seems to be an
  archived project.

* RapidJSON [6] doesn't offer text location access.

* yajl [7] could offer some form of text location access (offset, at
  least) with yajl_get_bytes_consumed(), remembering the last offset on
  our side, although I don't know how nice it would play with
  whitespaces.

  That being said, regarding integers, the `yajl_callbacks`
  structure [8] only contains a `yajl_integer` function pointer which
  receives a `long long` value (no direct 64-bit unsigned integer
  support). It's possible to set the `yajl_number` callback for any
  number, but the `yajl_double` callback gets disabled in that case, and
  the callback receives a string which needs further parsing on our
  side: this is pretty much what's implemented `bt2c::StrScanner`
  anyway.

At this point I stopped searching as I already had a working and tested
string scanner and, as you can see, without comments, `parse-json.hpp`
is only 231 lines of effective code and satisfies all the requirements
above.

You can test bt2c::parseJson() with a simple program like this:

    #include <iostream>
    #include <cstring>

    #include "parse-json.hpp"

    struct Printer
    {
        void onNull(const bt2c::TextLoc&)
        {
            std::cout << "null\n";
        }

        template <typename ValT>
        void onScalarVal(const ValT& val, const bt2c::TextLoc&)
        {
            std::cout << val << '\n';
        }

        void onArrayBegin(const bt2c::TextLoc&)
        {
            std::cout << "[\n";
        }

        void onArrayEnd(const bt2c::TextLoc&)
        {
            std::cout << "]\n";
        }

        void onObjBegin(const bt2c::TextLoc&)
        {
            std::cout << "{\n";
        }

        void onObjKey(const std::string& key,
                      const bt2c::TextLoc&)
        {
            std::cout << key << ": ";
        }

        void onObjEnd(const bt2c::TextLoc&)
        {
            std::cout << "}\n";
        }
    };

    int main(const int, const char * const * const argv)
    {
        Printer printer;

        bt2c::parseJson(argv[1], printer);
    }

Then:

    $ ./test-parse-json 23
    $ ./test-parse-json '"\u03c9 represents angular velocity"'
    $ ./test-parse-json '{"salut": [23, true, 42.4e-9, {"meow": null}]}'
    $ ./test-parse-json 18446744073709551615
    $ ./test-parse-json -9223372036854775808

Also try some parsing errors:

    $ ./test-parse-json '{"salut": [false, 42.4e-9, "meow": null}]}'
    $ ./test-parse-json 18446744073709551616
    $ ./test-parse-json -9223372036854775809
    $ ./test-parse-json '"invalid \u8dkf codepoint"'

[1]: https://diamon.org/ctf/files/CTF2-SPECRC-4.0.html
[2]: https://github.com/nlohmann/json
[3]: nlohmann/json#3165
[4]: https://github.com/open-source-parsers/jsoncpp
[5]: https://github.com/nbsdx/SimpleJSON
[6]: https://rapidjson.org/
[7]: https://github.com/lloyd/yajl
[8]: https://lloyd.github.io/yajl/yajl-2.1.0/structyajl__callbacks.html

Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Change-Id: Id32c2b64723ca50b044369c424fe046c0a183cce
Reviewed-on: https://review.lttng.org/c/babeltrace/+/7411
simark added a commit to simark/babeltrace that referenced this pull request Dec 13, 2023
This patch adds the bt2c::parseJson() functions in `parse-json.hpp`.

Those functions wrap the file-internal `bt2c::internal::JsonParser`
class of which an instance can parse a single JSON value, calling
specific methods of a JSON event listener as it processes. Internally,
`bt2c::internal::JsonParser` uses a string scanner (`bt2c::StrScanner`).

In searching for a simple JSON parsing solution, I could not find, as of
this date, any project which satisfies the following requirements out of
the box:

* Is well-known, well documented, and well tested.

* Has an MIT-compatible license.

* Parses both unsigned and signed 64-bit integers (range
  -9,223,372,036,854,775,808 to 18,446,744,073,709,551,615).

* Provides an exact text location (offset, line number, column number)
  on parsing error (through logging and the message of an error cause).

* Provides an exact text location (offset, line number, column number)
  for each parsed value.

I believe the text locations are essential as this JSON parser will be
used to decode CTF2‑SPECRC‑4.0 [1] auxiliary and metadata streams:
because Babeltrace 2 will be a reference implementation of CTF 2, it
makes sense to make an effort to pinpoint the exact location of
syntactic and semantic errors.

More specifically:

* JSON for Modern C++ (by Niels Lohmann) [2] doesn't support text
  location access, although there's a pending pull request (draft as of
  this date) to add such support [3].

* The exceptions of JsonCpp [4] don't contain a text location, only a
  message.

* SimpleJSON [5] doesn't offer text location access and seems to be an
  archived project.

* RapidJSON [6] doesn't offer text location access.

* yajl [7] could offer some form of text location access (offset, at
  least) with yajl_get_bytes_consumed(), remembering the last offset on
  our side, although I don't know how nice it would play with
  whitespaces.

  That being said, regarding integers, the `yajl_callbacks`
  structure [8] only contains a `yajl_integer` function pointer which
  receives a `long long` value (no direct 64-bit unsigned integer
  support). It's possible to set the `yajl_number` callback for any
  number, but the `yajl_double` callback gets disabled in that case, and
  the callback receives a string which needs further parsing on our
  side: this is pretty much what's implemented `bt2c::StrScanner`
  anyway.

At this point I stopped searching as I already had a working and tested
string scanner and, as you can see, without comments, `parse-json.hpp`
is only 231 lines of effective code and satisfies all the requirements
above.

You can test bt2c::parseJson() with a simple program like this:

    #include <iostream>
    #include <cstring>

    #include "parse-json.hpp"

    struct Printer
    {
        void onNull(const bt2c::TextLoc&)
        {
            std::cout << "null\n";
        }

        template <typename ValT>
        void onScalarVal(const ValT& val, const bt2c::TextLoc&)
        {
            std::cout << val << '\n';
        }

        void onArrayBegin(const bt2c::TextLoc&)
        {
            std::cout << "[\n";
        }

        void onArrayEnd(const bt2c::TextLoc&)
        {
            std::cout << "]\n";
        }

        void onObjBegin(const bt2c::TextLoc&)
        {
            std::cout << "{\n";
        }

        void onObjKey(const std::string& key,
                      const bt2c::TextLoc&)
        {
            std::cout << key << ": ";
        }

        void onObjEnd(const bt2c::TextLoc&)
        {
            std::cout << "}\n";
        }
    };

    int main(const int, const char * const * const argv)
    {
        Printer printer;

        bt2c::parseJson(argv[1], printer);
    }

Then:

    $ ./test-parse-json 23
    $ ./test-parse-json '"\u03c9 represents angular velocity"'
    $ ./test-parse-json '{"salut": [23, true, 42.4e-9, {"meow": null}]}'
    $ ./test-parse-json 18446744073709551615
    $ ./test-parse-json -9223372036854775808

Also try some parsing errors:

    $ ./test-parse-json '{"salut": [false, 42.4e-9, "meow": null}]}'
    $ ./test-parse-json 18446744073709551616
    $ ./test-parse-json -9223372036854775809
    $ ./test-parse-json '"invalid \u8dkf codepoint"'

[1]: https://diamon.org/ctf/files/CTF2-SPECRC-4.0.html
[2]: https://github.com/nlohmann/json
[3]: nlohmann/json#3165
[4]: https://github.com/open-source-parsers/jsoncpp
[5]: https://github.com/nbsdx/SimpleJSON
[6]: https://rapidjson.org/
[7]: https://github.com/lloyd/yajl
[8]: https://lloyd.github.io/yajl/yajl-2.1.0/structyajl__callbacks.html

Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Change-Id: Id32c2b64723ca50b044369c424fe046c0a183cce
Reviewed-on: https://review.lttng.org/c/babeltrace/+/7411
simark added a commit to simark/babeltrace that referenced this pull request Dec 13, 2023
This patch adds the bt2c::parseJson() functions in `parse-json.hpp`.

Those functions wrap the file-internal `bt2c::internal::JsonParser`
class of which an instance can parse a single JSON value, calling
specific methods of a JSON event listener as it processes. Internally,
`bt2c::internal::JsonParser` uses a string scanner (`bt2c::StrScanner`).

In searching for a simple JSON parsing solution, I could not find, as of
this date, any project which satisfies the following requirements out of
the box:

* Is well-known, well documented, and well tested.

* Has an MIT-compatible license.

* Parses both unsigned and signed 64-bit integers (range
  -9,223,372,036,854,775,808 to 18,446,744,073,709,551,615).

* Provides an exact text location (offset, line number, column number)
  on parsing error (through logging and the message of an error cause).

* Provides an exact text location (offset, line number, column number)
  for each parsed value.

I believe the text locations are essential as this JSON parser will be
used to decode CTF2‑SPECRC‑4.0 [1] auxiliary and metadata streams:
because Babeltrace 2 will be a reference implementation of CTF 2, it
makes sense to make an effort to pinpoint the exact location of
syntactic and semantic errors.

More specifically:

* JSON for Modern C++ (by Niels Lohmann) [2] doesn't support text
  location access, although there's a pending pull request (draft as of
  this date) to add such support [3].

* The exceptions of JsonCpp [4] don't contain a text location, only a
  message.

* SimpleJSON [5] doesn't offer text location access and seems to be an
  archived project.

* RapidJSON [6] doesn't offer text location access.

* yajl [7] could offer some form of text location access (offset, at
  least) with yajl_get_bytes_consumed(), remembering the last offset on
  our side, although I don't know how nice it would play with
  whitespaces.

  That being said, regarding integers, the `yajl_callbacks`
  structure [8] only contains a `yajl_integer` function pointer which
  receives a `long long` value (no direct 64-bit unsigned integer
  support). It's possible to set the `yajl_number` callback for any
  number, but the `yajl_double` callback gets disabled in that case, and
  the callback receives a string which needs further parsing on our
  side: this is pretty much what's implemented `bt2c::StrScanner`
  anyway.

At this point I stopped searching as I already had a working and tested
string scanner and, as you can see, without comments, `parse-json.hpp`
is only 231 lines of effective code and satisfies all the requirements
above.

You can test bt2c::parseJson() with a simple program like this:

    #include <iostream>
    #include <cstring>

    #include "parse-json.hpp"

    struct Printer
    {
        void onNull(const bt2c::TextLoc&)
        {
            std::cout << "null\n";
        }

        template <typename ValT>
        void onScalarVal(const ValT& val, const bt2c::TextLoc&)
        {
            std::cout << val << '\n';
        }

        void onArrayBegin(const bt2c::TextLoc&)
        {
            std::cout << "[\n";
        }

        void onArrayEnd(const bt2c::TextLoc&)
        {
            std::cout << "]\n";
        }

        void onObjBegin(const bt2c::TextLoc&)
        {
            std::cout << "{\n";
        }

        void onObjKey(const std::string& key,
                      const bt2c::TextLoc&)
        {
            std::cout << key << ": ";
        }

        void onObjEnd(const bt2c::TextLoc&)
        {
            std::cout << "}\n";
        }
    };

    int main(const int, const char * const * const argv)
    {
        Printer printer;

        bt2c::parseJson(argv[1], printer);
    }

Then:

    $ ./test-parse-json 23
    $ ./test-parse-json '"\u03c9 represents angular velocity"'
    $ ./test-parse-json '{"salut": [23, true, 42.4e-9, {"meow": null}]}'
    $ ./test-parse-json 18446744073709551615
    $ ./test-parse-json -9223372036854775808

Also try some parsing errors:

    $ ./test-parse-json '{"salut": [false, 42.4e-9, "meow": null}]}'
    $ ./test-parse-json 18446744073709551616
    $ ./test-parse-json -9223372036854775809
    $ ./test-parse-json '"invalid \u8dkf codepoint"'

[1]: https://diamon.org/ctf/files/CTF2-SPECRC-4.0.html
[2]: https://github.com/nlohmann/json
[3]: nlohmann/json#3165
[4]: https://github.com/open-source-parsers/jsoncpp
[5]: https://github.com/nbsdx/SimpleJSON
[6]: https://rapidjson.org/
[7]: https://github.com/lloyd/yajl
[8]: https://lloyd.github.io/yajl/yajl-2.1.0/structyajl__callbacks.html

Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Change-Id: Id32c2b64723ca50b044369c424fe046c0a183cce
Reviewed-on: https://review.lttng.org/c/babeltrace/+/7411
simark added a commit to simark/babeltrace that referenced this pull request Dec 13, 2023
This patch adds the bt2c::parseJson() functions in `parse-json.hpp`.

Those functions wrap the file-internal `bt2c::internal::JsonParser`
class of which an instance can parse a single JSON value, calling
specific methods of a JSON event listener as it processes. Internally,
`bt2c::internal::JsonParser` uses a string scanner (`bt2c::StrScanner`).

In searching for a simple JSON parsing solution, I could not find, as of
this date, any project which satisfies the following requirements out of
the box:

* Is well-known, well documented, and well tested.

* Has an MIT-compatible license.

* Parses both unsigned and signed 64-bit integers (range
  -9,223,372,036,854,775,808 to 18,446,744,073,709,551,615).

* Provides an exact text location (offset, line number, column number)
  on parsing error (through logging and the message of an error cause).

* Provides an exact text location (offset, line number, column number)
  for each parsed value.

I believe the text locations are essential as this JSON parser will be
used to decode CTF2‑SPECRC‑4.0 [1] auxiliary and metadata streams:
because Babeltrace 2 will be a reference implementation of CTF 2, it
makes sense to make an effort to pinpoint the exact location of
syntactic and semantic errors.

More specifically:

* JSON for Modern C++ (by Niels Lohmann) [2] doesn't support text
  location access, although there's a pending pull request (draft as of
  this date) to add such support [3].

* The exceptions of JsonCpp [4] don't contain a text location, only a
  message.

* SimpleJSON [5] doesn't offer text location access and seems to be an
  archived project.

* RapidJSON [6] doesn't offer text location access.

* yajl [7] could offer some form of text location access (offset, at
  least) with yajl_get_bytes_consumed(), remembering the last offset on
  our side, although I don't know how nice it would play with
  whitespaces.

  That being said, regarding integers, the `yajl_callbacks`
  structure [8] only contains a `yajl_integer` function pointer which
  receives a `long long` value (no direct 64-bit unsigned integer
  support). It's possible to set the `yajl_number` callback for any
  number, but the `yajl_double` callback gets disabled in that case, and
  the callback receives a string which needs further parsing on our
  side: this is pretty much what's implemented `bt2c::StrScanner`
  anyway.

At this point I stopped searching as I already had a working and tested
string scanner and, as you can see, without comments, `parse-json.hpp`
is only 231 lines of effective code and satisfies all the requirements
above.

You can test bt2c::parseJson() with a simple program like this:

    #include <iostream>
    #include <cstring>

    #include "parse-json.hpp"

    struct Printer
    {
        void onNull(const bt2c::TextLoc&)
        {
            std::cout << "null\n";
        }

        template <typename ValT>
        void onScalarVal(const ValT& val, const bt2c::TextLoc&)
        {
            std::cout << val << '\n';
        }

        void onArrayBegin(const bt2c::TextLoc&)
        {
            std::cout << "[\n";
        }

        void onArrayEnd(const bt2c::TextLoc&)
        {
            std::cout << "]\n";
        }

        void onObjBegin(const bt2c::TextLoc&)
        {
            std::cout << "{\n";
        }

        void onObjKey(const std::string& key,
                      const bt2c::TextLoc&)
        {
            std::cout << key << ": ";
        }

        void onObjEnd(const bt2c::TextLoc&)
        {
            std::cout << "}\n";
        }
    };

    int main(const int, const char * const * const argv)
    {
        Printer printer;

        bt2c::parseJson(argv[1], printer);
    }

Then:

    $ ./test-parse-json 23
    $ ./test-parse-json '"\u03c9 represents angular velocity"'
    $ ./test-parse-json '{"salut": [23, true, 42.4e-9, {"meow": null}]}'
    $ ./test-parse-json 18446744073709551615
    $ ./test-parse-json -9223372036854775808

Also try some parsing errors:

    $ ./test-parse-json '{"salut": [false, 42.4e-9, "meow": null}]}'
    $ ./test-parse-json 18446744073709551616
    $ ./test-parse-json -9223372036854775809
    $ ./test-parse-json '"invalid \u8dkf codepoint"'

[1]: https://diamon.org/ctf/files/CTF2-SPECRC-4.0.html
[2]: https://github.com/nlohmann/json
[3]: nlohmann/json#3165
[4]: https://github.com/open-source-parsers/jsoncpp
[5]: https://github.com/nbsdx/SimpleJSON
[6]: https://rapidjson.org/
[7]: https://github.com/lloyd/yajl
[8]: https://lloyd.github.io/yajl/yajl-2.1.0/structyajl__callbacks.html

Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Change-Id: Id32c2b64723ca50b044369c424fe046c0a183cce
Reviewed-on: https://review.lttng.org/c/babeltrace/+/7411
simark added a commit to simark/babeltrace that referenced this pull request Jan 17, 2024
This patch adds the bt2c::parseJson() functions in `parse-json.hpp`.

Those functions wrap the file-internal `bt2c::internal::JsonParser`
class of which an instance can parse a single JSON value, calling
specific methods of a JSON event listener as it processes. Internally,
`bt2c::internal::JsonParser` uses a string scanner (`bt2c::StrScanner`).

In searching for a simple JSON parsing solution, I could not find, as of
this date, any project which satisfies the following requirements out of
the box:

* Is well-known, well documented, and well tested.

* Has an MIT-compatible license.

* Parses both unsigned and signed 64-bit integers (range
  -9,223,372,036,854,775,808 to 18,446,744,073,709,551,615).

* Provides an exact text location (offset, line number, column number)
  on parsing error (through logging and the message of an error cause).

* Provides an exact text location (offset, line number, column number)
  for each parsed value.

I believe the text locations are essential as this JSON parser will be
used to decode CTF2‑SPECRC‑4.0 [1] auxiliary and metadata streams:
because Babeltrace 2 will be a reference implementation of CTF 2, it
makes sense to make an effort to pinpoint the exact location of
syntactic and semantic errors.

More specifically:

* JSON for Modern C++ (by Niels Lohmann) [2] doesn't support text
  location access, although there's a pending pull request (draft as of
  this date) to add such support [3].

* The exceptions of JsonCpp [4] don't contain a text location, only a
  message.

* SimpleJSON [5] doesn't offer text location access and seems to be an
  archived project.

* RapidJSON [6] doesn't offer text location access.

* yajl [7] could offer some form of text location access (offset, at
  least) with yajl_get_bytes_consumed(), remembering the last offset on
  our side, although I don't know how nice it would play with
  whitespaces.

  That being said, regarding integers, the `yajl_callbacks`
  structure [8] only contains a `yajl_integer` function pointer which
  receives a `long long` value (no direct 64-bit unsigned integer
  support). It's possible to set the `yajl_number` callback for any
  number, but the `yajl_double` callback gets disabled in that case, and
  the callback receives a string which needs further parsing on our
  side: this is pretty much what's implemented `bt2c::StrScanner`
  anyway.

At this point I stopped searching as I already had a working and tested
string scanner and, as you can see, without comments, `parse-json.hpp`
is only 231 lines of effective code and satisfies all the requirements
above.

You can test bt2c::parseJson() with a simple program like this:

    #include <iostream>
    #include <cstring>

    #include "parse-json.hpp"

    struct Printer
    {
        void onNull(const bt2c::TextLoc&)
        {
            std::cout << "null\n";
        }

        template <typename ValT>
        void onScalarVal(const ValT& val, const bt2c::TextLoc&)
        {
            std::cout << val << '\n';
        }

        void onArrayBegin(const bt2c::TextLoc&)
        {
            std::cout << "[\n";
        }

        void onArrayEnd(const bt2c::TextLoc&)
        {
            std::cout << "]\n";
        }

        void onObjBegin(const bt2c::TextLoc&)
        {
            std::cout << "{\n";
        }

        void onObjKey(const std::string& key,
                      const bt2c::TextLoc&)
        {
            std::cout << key << ": ";
        }

        void onObjEnd(const bt2c::TextLoc&)
        {
            std::cout << "}\n";
        }
    };

    int main(const int, const char * const * const argv)
    {
        Printer printer;

        bt2c::parseJson(argv[1], printer);
    }

Then:

    $ ./test-parse-json 23
    $ ./test-parse-json '"\u03c9 represents angular velocity"'
    $ ./test-parse-json '{"salut": [23, true, 42.4e-9, {"meow": null}]}'
    $ ./test-parse-json 18446744073709551615
    $ ./test-parse-json -9223372036854775808

Also try some parsing errors:

    $ ./test-parse-json '{"salut": [false, 42.4e-9, "meow": null}]}'
    $ ./test-parse-json 18446744073709551616
    $ ./test-parse-json -9223372036854775809
    $ ./test-parse-json '"invalid \u8dkf codepoint"'

[1]: https://diamon.org/ctf/files/CTF2-SPECRC-4.0.html
[2]: https://github.com/nlohmann/json
[3]: nlohmann/json#3165
[4]: https://github.com/open-source-parsers/jsoncpp
[5]: https://github.com/nbsdx/SimpleJSON
[6]: https://rapidjson.org/
[7]: https://github.com/lloyd/yajl
[8]: https://lloyd.github.io/yajl/yajl-2.1.0/structyajl__callbacks.html

Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Change-Id: Id32c2b64723ca50b044369c424fe046c0a183cce
Reviewed-on: https://review.lttng.org/c/babeltrace/+/7411
simark added a commit to simark/babeltrace that referenced this pull request Jan 17, 2024
This patch adds the bt2c::parseJson() functions in `parse-json.hpp`.

Those functions wrap the file-internal `bt2c::internal::JsonParser`
class of which an instance can parse a single JSON value, calling
specific methods of a JSON event listener as it processes. Internally,
`bt2c::internal::JsonParser` uses a string scanner (`bt2c::StrScanner`).

In searching for a simple JSON parsing solution, I could not find, as of
this date, any project which satisfies the following requirements out of
the box:

* Is well-known, well documented, and well tested.

* Has an MIT-compatible license.

* Parses both unsigned and signed 64-bit integers (range
  -9,223,372,036,854,775,808 to 18,446,744,073,709,551,615).

* Provides an exact text location (offset, line number, column number)
  on parsing error (through logging and the message of an error cause).

* Provides an exact text location (offset, line number, column number)
  for each parsed value.

I believe the text locations are essential as this JSON parser will be
used to decode CTF2‑SPECRC‑4.0 [1] auxiliary and metadata streams:
because Babeltrace 2 will be a reference implementation of CTF 2, it
makes sense to make an effort to pinpoint the exact location of
syntactic and semantic errors.

More specifically:

* JSON for Modern C++ (by Niels Lohmann) [2] doesn't support text
  location access, although there's a pending pull request (draft as of
  this date) to add such support [3].

* The exceptions of JsonCpp [4] don't contain a text location, only a
  message.

* SimpleJSON [5] doesn't offer text location access and seems to be an
  archived project.

* RapidJSON [6] doesn't offer text location access.

* yajl [7] could offer some form of text location access (offset, at
  least) with yajl_get_bytes_consumed(), remembering the last offset on
  our side, although I don't know how nice it would play with
  whitespaces.

  That being said, regarding integers, the `yajl_callbacks`
  structure [8] only contains a `yajl_integer` function pointer which
  receives a `long long` value (no direct 64-bit unsigned integer
  support). It's possible to set the `yajl_number` callback for any
  number, but the `yajl_double` callback gets disabled in that case, and
  the callback receives a string which needs further parsing on our
  side: this is pretty much what's implemented `bt2c::StrScanner`
  anyway.

At this point I stopped searching as I already had a working and tested
string scanner and, as you can see, without comments, `parse-json.hpp`
is only 231 lines of effective code and satisfies all the requirements
above.

You can test bt2c::parseJson() with a simple program like this:

    #include <iostream>
    #include <cstring>

    #include "parse-json.hpp"

    struct Printer
    {
        void onNull(const bt2c::TextLoc&)
        {
            std::cout << "null\n";
        }

        template <typename ValT>
        void onScalarVal(const ValT& val, const bt2c::TextLoc&)
        {
            std::cout << val << '\n';
        }

        void onArrayBegin(const bt2c::TextLoc&)
        {
            std::cout << "[\n";
        }

        void onArrayEnd(const bt2c::TextLoc&)
        {
            std::cout << "]\n";
        }

        void onObjBegin(const bt2c::TextLoc&)
        {
            std::cout << "{\n";
        }

        void onObjKey(const std::string& key,
                      const bt2c::TextLoc&)
        {
            std::cout << key << ": ";
        }

        void onObjEnd(const bt2c::TextLoc&)
        {
            std::cout << "}\n";
        }
    };

    int main(const int, const char * const * const argv)
    {
        Printer printer;

        bt2c::parseJson(argv[1], printer);
    }

Then:

    $ ./test-parse-json 23
    $ ./test-parse-json '"\u03c9 represents angular velocity"'
    $ ./test-parse-json '{"salut": [23, true, 42.4e-9, {"meow": null}]}'
    $ ./test-parse-json 18446744073709551615
    $ ./test-parse-json -9223372036854775808

Also try some parsing errors:

    $ ./test-parse-json '{"salut": [false, 42.4e-9, "meow": null}]}'
    $ ./test-parse-json 18446744073709551616
    $ ./test-parse-json -9223372036854775809
    $ ./test-parse-json '"invalid \u8dkf codepoint"'

[1]: https://diamon.org/ctf/files/CTF2-SPECRC-4.0.html
[2]: https://github.com/nlohmann/json
[3]: nlohmann/json#3165
[4]: https://github.com/open-source-parsers/jsoncpp
[5]: https://github.com/nbsdx/SimpleJSON
[6]: https://rapidjson.org/
[7]: https://github.com/lloyd/yajl
[8]: https://lloyd.github.io/yajl/yajl-2.1.0/structyajl__callbacks.html

Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Change-Id: Id32c2b64723ca50b044369c424fe046c0a183cce
Reviewed-on: https://review.lttng.org/c/babeltrace/+/7411
simark added a commit to simark/babeltrace that referenced this pull request Mar 26, 2024
This patch adds the bt2c::parseJson() functions in `parse-json.hpp`.

Those functions wrap the file-internal `bt2c::internal::JsonParser`
class of which an instance can parse a single JSON value, calling
specific methods of a JSON event listener as it processes. Internally,
`bt2c::internal::JsonParser` uses a string scanner (`bt2c::StrScanner`).

In searching for a simple JSON parsing solution, I could not find, as of
this date, any project which satisfies the following requirements out of
the box:

* Is well-known, well documented, and well tested.

* Has an MIT-compatible license.

* Parses both unsigned and signed 64-bit integers (range
  -9,223,372,036,854,775,808 to 18,446,744,073,709,551,615).

* Provides an exact text location (offset, line number, column number)
  on parsing error (through logging and the message of an error cause).

* Provides an exact text location (offset, line number, column number)
  for each parsed value.

I believe the text locations are essential as this JSON parser will be
used to decode CTF2‑SPECRC‑4.0 [1] auxiliary and metadata streams:
because Babeltrace 2 will be a reference implementation of CTF 2, it
makes sense to make an effort to pinpoint the exact location of
syntactic and semantic errors.

More specifically:

* JSON for Modern C++ (by Niels Lohmann) [2] doesn't support text
  location access, although there's a pending pull request (draft as of
  this date) to add such support [3].

* The exceptions of JsonCpp [4] don't contain a text location, only a
  message.

* SimpleJSON [5] doesn't offer text location access and seems to be an
  archived project.

* RapidJSON [6] doesn't offer text location access.

* yajl [7] could offer some form of text location access (offset, at
  least) with yajl_get_bytes_consumed(), remembering the last offset on
  our side, although I don't know how nice it would play with
  whitespaces.

  That being said, regarding integers, the `yajl_callbacks`
  structure [8] only contains a `yajl_integer` function pointer which
  receives a `long long` value (no direct 64-bit unsigned integer
  support). It's possible to set the `yajl_number` callback for any
  number, but the `yajl_double` callback gets disabled in that case, and
  the callback receives a string which needs further parsing on our
  side: this is pretty much what's implemented `bt2c::StrScanner`
  anyway.

At this point I stopped searching as I already had a working and tested
string scanner and, as you can see, without comments, `parse-json.hpp`
is only 231 lines of effective code and satisfies all the requirements
above.

You can test bt2c::parseJson() with a simple program like this:

    #include <iostream>
    #include <cstring>

    #include "parse-json.hpp"

    struct Printer
    {
        void onNull(const bt2c::TextLoc&)
        {
            std::cout << "null\n";
        }

        template <typename ValT>
        void onScalarVal(const ValT& val, const bt2c::TextLoc&)
        {
            std::cout << val << '\n';
        }

        void onArrayBegin(const bt2c::TextLoc&)
        {
            std::cout << "[\n";
        }

        void onArrayEnd(const bt2c::TextLoc&)
        {
            std::cout << "]\n";
        }

        void onObjBegin(const bt2c::TextLoc&)
        {
            std::cout << "{\n";
        }

        void onObjKey(const std::string& key,
                      const bt2c::TextLoc&)
        {
            std::cout << key << ": ";
        }

        void onObjEnd(const bt2c::TextLoc&)
        {
            std::cout << "}\n";
        }
    };

    int main(const int, const char * const * const argv)
    {
        Printer printer;

        bt2c::parseJson(argv[1], printer);
    }

Then:

    $ ./test-parse-json 23
    $ ./test-parse-json '"\u03c9 represents angular velocity"'
    $ ./test-parse-json '{"salut": [23, true, 42.4e-9, {"meow": null}]}'
    $ ./test-parse-json 18446744073709551615
    $ ./test-parse-json -9223372036854775808

Also try some parsing errors:

    $ ./test-parse-json '{"salut": [false, 42.4e-9, "meow": null}]}'
    $ ./test-parse-json 18446744073709551616
    $ ./test-parse-json -9223372036854775809
    $ ./test-parse-json '"invalid \u8dkf codepoint"'

[1]: https://diamon.org/ctf/files/CTF2-SPECRC-4.0.html
[2]: https://github.com/nlohmann/json
[3]: nlohmann/json#3165
[4]: https://github.com/open-source-parsers/jsoncpp
[5]: https://github.com/nbsdx/SimpleJSON
[6]: https://rapidjson.org/
[7]: https://github.com/lloyd/yajl
[8]: https://lloyd.github.io/yajl/yajl-2.1.0/structyajl__callbacks.html

Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Change-Id: Id32c2b64723ca50b044369c424fe046c0a183cce
Reviewed-on: https://review.lttng.org/c/babeltrace/+/7411
simark added a commit to simark/babeltrace that referenced this pull request Apr 17, 2024
This patch adds the bt2c::parseJson() functions in `parse-json.hpp`.

Those functions wrap the file-internal `bt2c::internal::JsonParser`
class of which an instance can parse a single JSON value, calling
specific methods of a JSON event listener as it processes. Internally,
`bt2c::internal::JsonParser` uses a string scanner (`bt2c::StrScanner`).

In searching for a simple JSON parsing solution, I could not find, as of
this date, any project which satisfies the following requirements out of
the box:

* Is well-known, well documented, and well tested.

* Has an MIT-compatible license.

* Parses both unsigned and signed 64-bit integers (range
  -9,223,372,036,854,775,808 to 18,446,744,073,709,551,615).

* Provides an exact text location (offset, line number, column number)
  on parsing error (through logging and the message of an error cause).

* Provides an exact text location (offset, line number, column number)
  for each parsed value.

I believe the text locations are essential as this JSON parser will be
used to decode CTF2‑SPECRC‑4.0 [1] auxiliary and metadata streams:
because Babeltrace 2 will be a reference implementation of CTF 2, it
makes sense to make an effort to pinpoint the exact location of
syntactic and semantic errors.

More specifically:

* JSON for Modern C++ (by Niels Lohmann) [2] doesn't support text
  location access, although there's a pending pull request (draft as of
  this date) to add such support [3].

* The exceptions of JsonCpp [4] don't contain a text location, only a
  message.

* SimpleJSON [5] doesn't offer text location access and seems to be an
  archived project.

* RapidJSON [6] doesn't offer text location access.

* yajl [7] could offer some form of text location access (offset, at
  least) with yajl_get_bytes_consumed(), remembering the last offset on
  our side, although I don't know how nice it would play with
  whitespaces.

  That being said, regarding integers, the `yajl_callbacks`
  structure [8] only contains a `yajl_integer` function pointer which
  receives a `long long` value (no direct 64-bit unsigned integer
  support). It's possible to set the `yajl_number` callback for any
  number, but the `yajl_double` callback gets disabled in that case, and
  the callback receives a string which needs further parsing on our
  side: this is pretty much what's implemented `bt2c::StrScanner`
  anyway.

At this point I stopped searching as I already had a working and tested
string scanner and, as you can see, without comments, `parse-json.hpp`
is only 231 lines of effective code and satisfies all the requirements
above.

You can test bt2c::parseJson() with a simple program like this:

    #include <iostream>
    #include <cstring>

    #include "parse-json.hpp"

    struct Printer
    {
        void onNull(const bt2c::TextLoc&)
        {
            std::cout << "null\n";
        }

        template <typename ValT>
        void onScalarVal(const ValT& val, const bt2c::TextLoc&)
        {
            std::cout << val << '\n';
        }

        void onArrayBegin(const bt2c::TextLoc&)
        {
            std::cout << "[\n";
        }

        void onArrayEnd(const bt2c::TextLoc&)
        {
            std::cout << "]\n";
        }

        void onObjBegin(const bt2c::TextLoc&)
        {
            std::cout << "{\n";
        }

        void onObjKey(const std::string& key,
                      const bt2c::TextLoc&)
        {
            std::cout << key << ": ";
        }

        void onObjEnd(const bt2c::TextLoc&)
        {
            std::cout << "}\n";
        }
    };

    int main(const int, const char * const * const argv)
    {
        Printer printer;

        bt2c::parseJson(argv[1], printer);
    }

Then:

    $ ./test-parse-json 23
    $ ./test-parse-json '"\u03c9 represents angular velocity"'
    $ ./test-parse-json '{"salut": [23, true, 42.4e-9, {"meow": null}]}'
    $ ./test-parse-json 18446744073709551615
    $ ./test-parse-json -9223372036854775808

Also try some parsing errors:

    $ ./test-parse-json '{"salut": [false, 42.4e-9, "meow": null}]}'
    $ ./test-parse-json 18446744073709551616
    $ ./test-parse-json -9223372036854775809
    $ ./test-parse-json '"invalid \u8dkf codepoint"'

[1]: https://diamon.org/ctf/files/CTF2-SPECRC-4.0.html
[2]: https://github.com/nlohmann/json
[3]: nlohmann/json#3165
[4]: https://github.com/open-source-parsers/jsoncpp
[5]: https://github.com/nbsdx/SimpleJSON
[6]: https://rapidjson.org/
[7]: https://github.com/lloyd/yajl
[8]: https://lloyd.github.io/yajl/yajl-2.1.0/structyajl__callbacks.html

Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Change-Id: Id32c2b64723ca50b044369c424fe046c0a183cce
Reviewed-on: https://review.lttng.org/c/babeltrace/+/7411
simark added a commit to simark/babeltrace that referenced this pull request May 8, 2024
This patch adds the bt2c::parseJson() functions in `parse-json.hpp`.

Those functions wrap the file-internal `bt2c::internal::JsonParser`
class of which an instance can parse a single JSON value, calling
specific methods of a JSON event listener as it processes. Internally,
`bt2c::internal::JsonParser` uses a string scanner (`bt2c::StrScanner`).

In searching for a simple JSON parsing solution, I could not find, as of
this date, any project which satisfies the following requirements out of
the box:

* Is well-known, well documented, and well tested.

* Has an MIT-compatible license.

* Parses both unsigned and signed 64-bit integers (range
  -9,223,372,036,854,775,808 to 18,446,744,073,709,551,615).

* Provides an exact text location (offset, line number, column number)
  on parsing error (through logging and the message of an error cause).

* Provides an exact text location (offset, line number, column number)
  for each parsed value.

I believe the text locations are essential as this JSON parser will be
used to decode CTF2‑SPECRC‑4.0 [1] auxiliary and metadata streams:
because Babeltrace 2 will be a reference implementation of CTF 2, it
makes sense to make an effort to pinpoint the exact location of
syntactic and semantic errors.

More specifically:

* JSON for Modern C++ (by Niels Lohmann) [2] doesn't support text
  location access, although there's a pending pull request (draft as of
  this date) to add such support [3].

* The exceptions of JsonCpp [4] don't contain a text location, only a
  message.

* SimpleJSON [5] doesn't offer text location access and seems to be an
  archived project.

* RapidJSON [6] doesn't offer text location access.

* yajl [7] could offer some form of text location access (offset, at
  least) with yajl_get_bytes_consumed(), remembering the last offset on
  our side, although I don't know how nice it would play with
  whitespaces.

  That being said, regarding integers, the `yajl_callbacks`
  structure [8] only contains a `yajl_integer` function pointer which
  receives a `long long` value (no direct 64-bit unsigned integer
  support). It's possible to set the `yajl_number` callback for any
  number, but the `yajl_double` callback gets disabled in that case, and
  the callback receives a string which needs further parsing on our
  side: this is pretty much what's implemented `bt2c::StrScanner`
  anyway.

At this point I stopped searching as I already had a working and tested
string scanner and, as you can see, without comments, `parse-json.hpp`
is only 231 lines of effective code and satisfies all the requirements
above.

You can test bt2c::parseJson() with a simple program like this:

    #include <iostream>
    #include <cstring>

    #include "parse-json.hpp"

    struct Printer
    {
        void onNull(const bt2c::TextLoc&)
        {
            std::cout << "null\n";
        }

        template <typename ValT>
        void onScalarVal(const ValT& val, const bt2c::TextLoc&)
        {
            std::cout << val << '\n';
        }

        void onArrayBegin(const bt2c::TextLoc&)
        {
            std::cout << "[\n";
        }

        void onArrayEnd(const bt2c::TextLoc&)
        {
            std::cout << "]\n";
        }

        void onObjBegin(const bt2c::TextLoc&)
        {
            std::cout << "{\n";
        }

        void onObjKey(const std::string& key,
                      const bt2c::TextLoc&)
        {
            std::cout << key << ": ";
        }

        void onObjEnd(const bt2c::TextLoc&)
        {
            std::cout << "}\n";
        }
    };

    int main(const int, const char * const * const argv)
    {
        Printer printer;

        bt2c::parseJson(argv[1], printer);
    }

Then:

    $ ./test-parse-json 23
    $ ./test-parse-json '"\u03c9 represents angular velocity"'
    $ ./test-parse-json '{"salut": [23, true, 42.4e-9, {"meow": null}]}'
    $ ./test-parse-json 18446744073709551615
    $ ./test-parse-json -9223372036854775808

Also try some parsing errors:

    $ ./test-parse-json '{"salut": [false, 42.4e-9, "meow": null}]}'
    $ ./test-parse-json 18446744073709551616
    $ ./test-parse-json -9223372036854775809
    $ ./test-parse-json '"invalid \u8dkf codepoint"'

[1]: https://diamon.org/ctf/files/CTF2-SPECRC-4.0.html
[2]: https://github.com/nlohmann/json
[3]: nlohmann/json#3165
[4]: https://github.com/open-source-parsers/jsoncpp
[5]: https://github.com/nbsdx/SimpleJSON
[6]: https://rapidjson.org/
[7]: https://github.com/lloyd/yajl
[8]: https://lloyd.github.io/yajl/yajl-2.1.0/structyajl__callbacks.html

Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Change-Id: Id32c2b64723ca50b044369c424fe046c0a183cce
Reviewed-on: https://review.lttng.org/c/babeltrace/+/7411
simark added a commit to simark/babeltrace that referenced this pull request May 21, 2024
This patch adds the bt2c::parseJson() functions in `parse-json.hpp`.

Those functions wrap the file-internal `bt2c::internal::JsonParser`
class of which an instance can parse a single JSON value, calling
specific methods of a JSON event listener as it processes. Internally,
`bt2c::internal::JsonParser` uses a string scanner (`bt2c::StrScanner`).

In searching for a simple JSON parsing solution, I could not find, as of
this date, any project which satisfies the following requirements out of
the box:

* Is well-known, well documented, and well tested.

* Has an MIT-compatible license.

* Parses both unsigned and signed 64-bit integers (range
  -9,223,372,036,854,775,808 to 18,446,744,073,709,551,615).

* Provides an exact text location (offset, line number, column number)
  on parsing error (through logging and the message of an error cause).

* Provides an exact text location (offset, line number, column number)
  for each parsed value.

I believe the text locations are essential as this JSON parser will be
used to decode CTF2‑SPECRC‑4.0 [1] auxiliary and metadata streams:
because Babeltrace 2 will be a reference implementation of CTF 2, it
makes sense to make an effort to pinpoint the exact location of
syntactic and semantic errors.

More specifically:

* JSON for Modern C++ (by Niels Lohmann) [2] doesn't support text
  location access, although there's a pending pull request (draft as of
  this date) to add such support [3].

* The exceptions of JsonCpp [4] don't contain a text location, only a
  message.

* SimpleJSON [5] doesn't offer text location access and seems to be an
  archived project.

* RapidJSON [6] doesn't offer text location access.

* yajl [7] could offer some form of text location access (offset, at
  least) with yajl_get_bytes_consumed(), remembering the last offset on
  our side, although I don't know how nice it would play with
  whitespaces.

  That being said, regarding integers, the `yajl_callbacks`
  structure [8] only contains a `yajl_integer` function pointer which
  receives a `long long` value (no direct 64-bit unsigned integer
  support). It's possible to set the `yajl_number` callback for any
  number, but the `yajl_double` callback gets disabled in that case, and
  the callback receives a string which needs further parsing on our
  side: this is pretty much what's implemented `bt2c::StrScanner`
  anyway.

At this point I stopped searching as I already had a working and tested
string scanner and, as you can see, without comments, `parse-json.hpp`
is only 231 lines of effective code and satisfies all the requirements
above.

You can test bt2c::parseJson() with a simple program like this:

    #include <iostream>
    #include <cstring>

    #include "parse-json.hpp"

    struct Printer
    {
        void onNull(const bt2c::TextLoc&)
        {
            std::cout << "null\n";
        }

        template <typename ValT>
        void onScalarVal(const ValT& val, const bt2c::TextLoc&)
        {
            std::cout << val << '\n';
        }

        void onArrayBegin(const bt2c::TextLoc&)
        {
            std::cout << "[\n";
        }

        void onArrayEnd(const bt2c::TextLoc&)
        {
            std::cout << "]\n";
        }

        void onObjBegin(const bt2c::TextLoc&)
        {
            std::cout << "{\n";
        }

        void onObjKey(const std::string& key,
                      const bt2c::TextLoc&)
        {
            std::cout << key << ": ";
        }

        void onObjEnd(const bt2c::TextLoc&)
        {
            std::cout << "}\n";
        }
    };

    int main(const int, const char * const * const argv)
    {
        Printer printer;

        bt2c::parseJson(argv[1], printer);
    }

Then:

    $ ./test-parse-json 23
    $ ./test-parse-json '"\u03c9 represents angular velocity"'
    $ ./test-parse-json '{"salut": [23, true, 42.4e-9, {"meow": null}]}'
    $ ./test-parse-json 18446744073709551615
    $ ./test-parse-json -9223372036854775808

Also try some parsing errors:

    $ ./test-parse-json '{"salut": [false, 42.4e-9, "meow": null}]}'
    $ ./test-parse-json 18446744073709551616
    $ ./test-parse-json -9223372036854775809
    $ ./test-parse-json '"invalid \u8dkf codepoint"'

[1]: https://diamon.org/ctf/files/CTF2-SPECRC-4.0.html
[2]: https://github.com/nlohmann/json
[3]: nlohmann/json#3165
[4]: https://github.com/open-source-parsers/jsoncpp
[5]: https://github.com/nbsdx/SimpleJSON
[6]: https://rapidjson.org/
[7]: https://github.com/lloyd/yajl
[8]: https://lloyd.github.io/yajl/yajl-2.1.0/structyajl__callbacks.html

Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Change-Id: Id32c2b64723ca50b044369c424fe046c0a183cce
Reviewed-on: https://review.lttng.org/c/babeltrace/+/7411
ebugden pushed a commit to ebugden/babeltrace that referenced this pull request Aug 8, 2024
This patch adds the bt2c::parseJson() functions in `parse-json.hpp`.

Those functions wrap the file-internal `bt2c::internal::JsonParser`
class of which an instance can parse a single JSON value, calling
specific methods of a JSON event listener as it processes. Internally,
`bt2c::internal::JsonParser` uses a string scanner (`bt2c::StrScanner`).

In searching for a simple JSON parsing solution, I could not find, as of
this date, any project which satisfies the following requirements out of
the box:

• Is well known, well documented, and well tested.

• Has an MIT-compatible license.

• Parses both unsigned and signed 64-bit integers (range
  -9,223,372,036,854,775,808 to 18,446,744,073,709,551,615).

• Provides an exact text location (offset, line number, column number)
  on parsing error (through logging and the message of an error cause).

• Provides an exact text location (offset, line number, column number)
  for each parsed value.

I believe the text locations are essential as this JSON parser will be
used to decode CTF2‑SPEC‑2.0 [1] metadata streams: because Babeltrace 2
will be a reference implementation of CTF 2, it makes sense to make an
effort to pinpoint the exact location of syntactic and semantic errors.

More specifically:

• JSON for Modern C++ (by Niels Lohmann) [2] doesn't support text
  location access, although there's a pending pull request (draft as of
  this date) to add such support [3].

• The exceptions of JsonCpp [4] don't contain a text location, only
  a message.

• SimpleJSON [5] doesn't offer text location access and seems to be an
  archived project.

• RapidJSON [6] doesn't offer text location access.

• yajl [7] could offer some form of text location access (offset, at
  least) with yajl_get_bytes_consumed(), remembering the last offset on
  our side, although I don't know how nice it would play
  with whitespaces.

  That being said, regarding integers, the `yajl_callbacks`
  structure [8] only contains a `yajl_integer` function pointer which
  receives a `long long` value (no direct 64-bit unsigned integer
  support). It's possible to set the `yajl_number` callback for any
  number, but the `yajl_double` callback gets disabled in that case, and
  the callback receives a string which needs further parsing on our
  side: this is pretty much what's implemented `bt2c::StrScanner`
  anyway.

At this point I stopped searching as I already had a working and tested
string scanner and, as you can see, without comments, `parse-json.hpp`
is only 231 lines of effective code and satisfies all the
requirements above.

You can test bt2c::parseJson() with a simple program like this:

    #include <iostream>
    #include <cstring>

    #include "parse-json.hpp"

    struct Printer
    {
        void onNull(const bt2c::TextLoc&)
        {
            std::cout << "null\n";
        }

        template <typename ValT>
        void onScalarVal(const ValT& val, const bt2c::TextLoc&)
        {
            std::cout << val << '\n';
        }

        void onArrayBegin(const bt2c::TextLoc&)
        {
            std::cout << "[\n";
        }

        void onArrayEnd(const bt2c::TextLoc&)
        {
            std::cout << "]\n";
        }

        void onObjBegin(const bt2c::TextLoc&)
        {
            std::cout << "{\n";
        }

        void onObjKey(const std::string& key,
                      const bt2c::TextLoc&)
        {
            std::cout << key << ": ";
        }

        void onObjEnd(const bt2c::TextLoc&)
        {
            std::cout << "}\n";
        }
    };

    int main(const int, const char * const * const argv)
    {
        Printer printer;

        bt2c::parseJson(argv[1], printer);
    }

Then:

    $ ./test-parse-json 23
    $ ./test-parse-json '"\u03c9 represents angular velocity"'
    $ ./test-parse-json '{"salut": [23, true, 42.4e-9, {"meow": null}]}'
    $ ./test-parse-json 18446744073709551615
    $ ./test-parse-json -9223372036854775808

Also try some parsing errors:

    $ ./test-parse-json '{"salut": [false, 42.4e-9, "meow": null}]}'
    $ ./test-parse-json 18446744073709551616
    $ ./test-parse-json -9223372036854775809
    $ ./test-parse-json '"invalid \u8dkf codepoint"'

[1]: https://diamon.org/ctf/CTF2-SPEC-2.0.html
[2]: https://github.com/nlohmann/json
[3]: nlohmann/json#3165
[4]: https://github.com/open-source-parsers/jsoncpp
[5]: https://github.com/nbsdx/SimpleJSON
[6]: https://rapidjson.org/
[7]: https://github.com/lloyd/yajl
[8]: https://lloyd.github.io/yajl/yajl-2.1.0/structyajl__callbacks.html

Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Change-Id: Id32c2b64723ca50b044369c424fe046c0a183cce
Reviewed-on: https://review.lttng.org/c/babeltrace/+/7411
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants