improve error handling #1152

maddanio · 2018-07-03T10:15:54Z

we commonly get quite horrible errors from the parser or when converting to user types. also error handling is not even mentioned in the docs.

I would propose making error handling a first class citizen, report lines/columns, give more detailed info in conversion to user types, etc.

I am willing to help, as this library is currently quite integral to our codebase

theodelrieu · 2018-07-03T10:55:11Z

give more detailed info in conversion to user types

Are you talking about runtime errors here too? Or compile-time?

report lines/columns

There are new exceptions since 3.0, however they're in the detail namespace, we sould expose them publicly. parse_error shows at which byte the parsing failed. I guess it can be enhanced to contain more information.

maddanio · 2018-07-03T14:06:34Z

i mean run-time, yeah, like
"invalid token on line x col y"
"cannot convert int at line x col y to T"
the latter would require json elements carrying source info, obviously optional, probably as an uint64 with 0 meaning no info or something. but i think it would be worth it. or one can make that compile time optional

maddanio · 2018-07-03T14:08:03Z

so do these exceptions propagate out? thing is the user really wants line/col info, so it would be nice if the exceptio would report that also. the parser would simply have to keep track of newlines as it is chugging along. again i would say well worth it.

maddanio · 2018-07-03T14:10:33Z

but yeah, a first remedy would be exposing them, so we can catch them and retro-provide line information. still think it would be worthwhile collecting line inf oduring parsing. its a simple matter of counting line breaks, and the number of bytes since the last line break...or do you think it would be very complicated?

theodelrieu · 2018-07-03T14:11:21Z

Unfortunately, I do know very little about the parsing/lexing aspect of this library. This is a question for the maintainer :)

maddanio · 2018-07-03T14:42:44Z

too a quick look at the parser. so adding line counting would be quite simply done in the lexer. then current_position would return row/col instead of byte count, or both. that could then be put directly into parse_error (which actually seems to already be exposed via basic_json::parse_error). adding source info to the json value is also simply done when constructing it in the parser. the sax interface would simply take source location as an argument for the events. i will think about how the source info can be made optional, probaby using a second template argument to basic_json.

maddanio · 2018-07-03T22:12:12Z

ok, was possible, did it on out branch. i will send a pull request sometime where source locations per json value will be off by default (with no penalty) and easy to opt in to

maddanio · 2018-07-03T22:12:41Z

also tomorrow will push to a public branch and link that here
nightnight

maddanio · 2018-07-03T22:17:08Z

so, why wait:
branch

nlohmann · 2018-07-04T05:43:56Z

This change would (strictly speaking) break compatibility with the 3.x.x versions and should not be added before version 4.0.0. This is not necessarily a bad thing - I just think that if we want to improve the diagnosis information, we may want to think bigger. For instance, we may think about using json itself as payload for the exceptions. Then we are free to choose, for each exception, what data to include. For parse errors, we thereby would not only be able to add line and column information, but also the read and expected tokens, etc. Also other issues like #932 could be address this way.

What do you think?

maddanio · 2018-07-04T06:13:13Z

well, it should be doable that programs usually just compile, but abi may break, and the change would be detectable, as some types change.
Hmm, for the parse error you can add all that information, but thats the only one i think. For deserialization i think source location is enough per json entity. what is more interesting is to think of not only the one entity that has an unexpected type, but the compund entity. but to make that better the whole deserialization has to be changed i think, so that deserializing a complex type is more than a series of deserializations of fields and such. or one just wraps all of the conversions from json to user type in try/catch to enrich the exception on the way up?

maddanio · 2018-07-04T06:13:53Z

also for the parser error you can just directly generate all that information at the throw site, i.e. in the parser, as you should have it all right there,...

nlohmann · 2018-07-04T06:43:47Z

So what is your proposal?

nlohmann · 2018-07-18T11:31:06Z

Any ideas on this?

maddanio · 2018-07-24T06:51:15Z

Hmm, i can’t think of info beyond Source location that is needed. I don’t think Lexer state is needed. If parsing fails then we will not get a result anyway. My suggestion would still be just a source info type which is a template parameter. And the possibility to omit it at no cost (probably using empty base class optimization). Question then is what to use as default. Gruß! Daniel Oberhoff

…

Am 18.07.2018 um 13:31 schrieb Niels Lohmann ***@***.***>: Any ideas on this? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

nlohmann · 2018-07-24T14:44:34Z

My suggestion would still be just a source info type which is a template parameter. And the possibility to omit it at no cost (probably using empty base class optimization).

I don't understand. Can you make an example?

maddanio · 2018-07-25T14:24:58Z

I am traveling, but maybe I can code it up rough next week sometime on my clone

nlohmann · 2018-08-04T09:12:46Z

Any news on this?

maddanio · 2018-08-04T12:43:00Z

i have a working solution here:
https://github.com/maddanio/json
what is missing is real support for reduced source_location types, like an empty one to opt out and one only with byte pos, which suffices for binary formats and is needed to properly pass the tesgts relying on byte-pos only.
it pases most of check-fast, except for:

        SECTION("StandardLayoutType")
        {
            CHECK(std::is_standard_layout<json>::value);
        }

why is this needed? i think this is because of the private base class, but honestly i got dizzy reading the concept...

maddanio · 2018-08-04T12:44:11Z

btw, the base class is needed to properly support opting out, because only this way wil an empty source location storage truly be empty, due to the empty base class optimization (usually in c++ classes cann not be smaller than 1 byte)

nlohmann · 2018-08-15T05:24:31Z

I am currently not sure how to proceed here. I would need a concrete example with concrete expected error messages.

maddanio · 2018-08-20T13:18:58Z

example:
[json.exception.type_error.305@byte: 6608 line: 180 column: 39] cannot use operator[] with null

maddanio · 2018-08-20T13:20:25Z

i.e. usage errors on a json object now tell me the source location of the object. helps enormously, since json does not have a schema.

maddanio · 2018-08-20T13:21:35Z

very simple in principle. just preserve source location along with the json objects.

maddanio · 2018-08-20T13:21:48Z

oops, wrong button, sorry

maddanio · 2018-08-20T13:22:15Z

basically this servers the use case where json is used as configuration language, and the config file is broken

nlohmann · 2018-08-20T13:57:52Z

This would mean an overhead of a string and at least one integer per JSON value. Right?

maddanio · 2018-08-20T14:02:29Z

3 integers: byte pos, row, column. by the string i suppose you mean the filename. no, that is not saved. the data maybe from memory also, i.e. a rest resource loaded from http or something, but location info is still useful since the resource can also be viewed in a browser.
my idea to mitigate this was to make the representation of the source location a template parameter of the json type. by exploiting the empty base class optimization an empty representation would then imply "opting out"

maddanio · 2018-08-20T14:04:05Z

other alternatives are shorter integer types, or ommitting any of the values (i.e. row alone could often be enough)

nlohmann · 2018-08-20T14:07:08Z

This would only work if the value is unchanged between parsing and the error message. What would you do if a value is changed after the fact?

maddanio · 2018-08-20T14:12:29Z

change would mean re-assignment i guess? or do you allow in-place changes beyond container modifications? for re-assignments the source location would be overwritten too. for container modifications i would just leave the old info there. this might not be academically correct for some definition of correct, but i would consider this a case of perfect vs good.

nlohmann · 2018-08-20T14:17:01Z

I mean

json j = parse(...);

j["foo"] = false;

std::string s = j["foo"];

The last line would trigger an exception. For this, we would not have source locations, right?

maddanio · 2018-08-20T14:54:44Z

yes, but i think that actually makes sense, because there is none, at least not at runtime

maddanio · 2018-08-20T14:55:36Z

if insteade on the last line you where to index with an unknown key you would in turn get one.

efp · 2018-08-21T22:39:39Z

Howdy. I was poking around just for this, wondering if there was a way to customize the error messages, especially for source line info. I was thinking of hacking this into the parse function myself... once the parse is done, I don't see how source locations are valid.

It might be nice if the exception class had a few more fields than 'id' so we could construct our own error messages. For instance, name (ename) and desc (the what_arg) fields. And for parse errors only, perhaps byte, line and column.

I've spent all of 10 minutes looking at this, but I'd say have lexer.get_position() return a struct with byte, line, and column?

efp · 2018-08-22T16:53:23Z

P.S. If one forgets a comma between objects, e.g.

{ "foo" : 5 "bar" : 6 }

I get 'unexpected string literal; expected '}' ... should be: expected ',' or '}'

maddanio · 2018-08-22T20:24:33Z

Customizing the message would bevrelatively easy

maddanio · 2018-08-22T20:25:18Z

Also FYI for me it was about errors also in usage, i.e. after parsing

efp · 2018-08-22T21:06:46Z

In the code where I'm using this lib I report errors in usage by the path in the json tree, e.g.

{ "foo" : {"bar" : 1, "baz" : 2}}

might yield something like "error [foo.baz] should be a string"

But this is done with a custom front end class. I don't think that functionality belongs in this library, keeping track of that goes beyond the json spec.

But, I do agree that reporting parse errors by line and column (or at least having this info available to the client) would be quite useful. I've got this working via a 'position' struct.

nlohmann · 2018-08-23T09:12:48Z

@efp Thanks for reporting the issue in
#1152 (comment). Adding a column to the parse_error should be easy. I'll follow up on this.

maddanio · 2018-08-23T10:22:52Z

@efp the last case you talked about is exactly why i added source location to the json data. while i agree that validation may not belong in the json code (though since there are things like bson json schema may also be of benefit?). but without the sourvce info a frontend cannot tell the user where the error happened in the config file.

efp · 2018-08-23T16:57:14Z

Here's what I did:

https://github.com/efp/json/tree/parse_error_lines_cols

Let me know if you'd like it as a pull request.

nlohmann · 2018-08-23T18:17:32Z

@efp This looks good! I would be happy to have a PR!

stale · 2018-09-22T18:54:27Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

nlohmann added the state: please discuss please discuss the issue or vote for your favorite option label Jul 4, 2018

nlohmann mentioned this issue Jul 5, 2018

Add Key name to Exception #932

Closed

nlohmann added the state: needs more info the author of the issue needs to provide more details label Aug 15, 2018

maddanio closed this as completed Aug 20, 2018

maddanio reopened this Aug 20, 2018

stale bot added the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Sep 22, 2018

stale bot closed this as completed Sep 29, 2018

improve error handling #1152

improve error handling #1152

Comments

maddanio commented Jul 3, 2018

theodelrieu commented Jul 3, 2018

maddanio commented Jul 3, 2018

maddanio commented Jul 3, 2018

maddanio commented Jul 3, 2018

theodelrieu commented Jul 3, 2018

maddanio commented Jul 3, 2018

maddanio commented Jul 3, 2018

maddanio commented Jul 3, 2018

maddanio commented Jul 3, 2018

nlohmann commented Jul 4, 2018

maddanio commented Jul 4, 2018

maddanio commented Jul 4, 2018

nlohmann commented Jul 4, 2018

nlohmann commented Jul 18, 2018

maddanio commented Jul 24, 2018 via email

nlohmann commented Jul 24, 2018

maddanio commented Jul 25, 2018

nlohmann commented Aug 4, 2018

maddanio commented Aug 4, 2018

maddanio commented Aug 4, 2018

nlohmann commented Aug 15, 2018

maddanio commented Aug 20, 2018

maddanio commented Aug 20, 2018

maddanio commented Aug 20, 2018

maddanio commented Aug 20, 2018

maddanio commented Aug 20, 2018

nlohmann commented Aug 20, 2018

maddanio commented Aug 20, 2018

maddanio commented Aug 20, 2018

nlohmann commented Aug 20, 2018

maddanio commented Aug 20, 2018

nlohmann commented Aug 20, 2018

maddanio commented Aug 20, 2018

maddanio commented Aug 20, 2018

efp commented Aug 21, 2018

efp commented Aug 22, 2018

maddanio commented Aug 22, 2018

maddanio commented Aug 22, 2018

efp commented Aug 22, 2018

nlohmann commented Aug 23, 2018

maddanio commented Aug 23, 2018

efp commented Aug 23, 2018

nlohmann commented Aug 23, 2018

stale bot commented Sep 22, 2018