Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std: Support user-provided jsonParse method. Unify json.Parser and json.parse* #15705

Merged
merged 12 commits into from
Jun 19, 2023

Conversation

thejoshwolfe
Copy link
Contributor

@thejoshwolfe thejoshwolfe commented May 14, 2023

This PR combines the dynamic json.Parser and the static json.parse*(). The bundled arena strategy from ParseTree is now available for all parsing via json.parseFrom*(). The old behavior is still available at json.parseFrom*Leaky() (name bikeshedding welcome on that one).

The parseFree() function is gone in favor of deinitting an arena instead. The main motivation for this was to allow customizable jsonParse() methods, and it was too hard to get customizable jsonParseFree() methods to work.

Breaking changes in this specific PR

Uses of parseFromSlice or parseFromTokenStream (introduced in #15602 ) need to change.

  1. If you're using your own arena allocator, call parse*Leaky instead.
  2. Otherwise, the return value goes into a wrapper object that needs to be deinitted with defer parsed.deinit();.
  3. No more parseFree().

See also below:

Breaking changes since 0.10

json.Parser replaced by json.parseFromSlice(json.Value, ...)

The dynamically typed use case previously supported by:

var parser = json.Parser.init(allocator, b);
defer parser.deinit();
var tree = parser.parse(str);
defer tree.deinit();
const root = tree.root;

Now accomplished like this:

var parsed = json.parseFromSlice(json.Value, allocator, str, .{});
defer parsed.deinit();
const root = parsed.value;

json.parse and json.parseFree replaced by json.parseFromSlice

You no longer get the value directly; it's now wrapped in a json.Parsed(T). Instead of calling json.parseFree on the value, defer parsed.deinit(); instead.

Implement jsonParse in your class

The old union parsing behavior can possibly be accomplished by implementing a custom parse function. See tools/spirv/grammar.zig for a string-vs-int implementation of parsing into a union.

Potential improvements

The jsonParse customization could be better documented and sanity checked. For example asserting the stack height before and after calling a user-supplied function would be nice.

@thejoshwolfe thejoshwolfe marked this pull request as draft May 14, 2023 14:23
@thejoshwolfe thejoshwolfe changed the title std: Support user-provided jsonParse method. Unify json. std: Support user-provided jsonParse method. Unify json.Parser and json.parse* May 14, 2023
@thejoshwolfe
Copy link
Contributor Author

thejoshwolfe commented May 14, 2023

The broken tools/ still need to be updated. Coming soon. EDIT: done.

@thejoshwolfe thejoshwolfe marked this pull request as ready for review May 14, 2023 23:52
@weejulius
Copy link

is it possible to keep function 'json.parseFree', so that the parsed objects can be free handly.

@silversquirl
Copy link
Contributor

is it possible to keep function 'json.parseFree', so that the parsed objects can be free handly.

Use an arena, as mentioned

@malcolmstill
Copy link
Contributor

@thejoshwolfe great work on this and your previous PR (streaming support is awesome!).

One place I am using the old union behaviour is here:

You can see e.g. the Command type is a union of structs, where there is a common @"type" field in each of those structs. The @"type" field is defined as comptime with corresponding static value (e.g. "module", "assert_return")

Do you think adding a jsonParse method to the Command and Action unions works with this schema? It seems different to the tools/spirv/grammar.zig example, with the @"type" field being inside each struct.

@thejoshwolfe
Copy link
Contributor Author

Do you think adding a jsonParse method to the Command and Action unions works with this schema?

You could do something like this:

var dynamic = try json.parseFromSliceLeaky(json.Value, arena.allocator(), s, .{});
const type = parsed.value.object.get("type").?;
if (mem.eql(u8, type, "module")) {
    const module = try json.parseFromSliceLeaky(Module, arena.allocator(), s, .{});
} else if (mem.eql(...)) {
    ...
}

It would be cool if there were a json.parseFromValue(Module, allocator, value, .{}). That would be an entirely different implementation, but it would fit into the use case we're discussing here. That way you could also implement the dip in and out of dynamic typing at any level of the document, not just at the top level as my code above does.

@thejoshwolfe
Copy link
Contributor Author

I can also imagine exposing the parsing of the fields of a struct not counting the .object_begin and .object_end tags. Then if the "type": field was guaranteed to be the first field, then you could implement a jsonParse method that checks the type and then parses the rest of it. I wasn't able to find an example input document in your program @malcolmstill , so I can't evaluate that guarantee.

If the "type": field is not guaranteed to be the first field, then there's always going to need to be some kind of temporary buffering of fields before we know what the fields are supposed to mean. Parsing into a Value seems a decent design for that, but I can see how that would get pretty annoying to do in the application code when there are lots of different types in the union, and some of them contain more unions, like the Action type in your example.

Another very different idea is to make a super struct, rather than a union of structs, that has all possible fields, and most of them default to null. Then if there's a compelling use case for getting a more strictly typed union, you could make a reflection function to copy the super struct into the more specific union in your application code.

This is all just brainstorming, and I concede that the regression in #15602 does make your use case harder to support. But I don't think complete program is significantly more complex with these application-side implementation options. In my opinion, the old union parsing code was very complex, and (as mentioned in the previous PR) simply didn't work in a streaming context.

The worst case scenario is that you could copy the old std.json implementation from before the PR and maintain it in your application. 😅

@thejoshwolfe
Copy link
Contributor Author

... you could make a reflection function to copy the super struct into the more specific union in your application code.

This really sounds like some kind of parseFromValue() doesn't it. Maybe there is a compelling case for implementing that. 🤔

@daurnimator
Copy link
Contributor

With allocators now required, does json parsing still work at comptime?

Then if the "type": field was guaranteed to be the first field

I don't think we should ever assume or even encourage this: json fields are unordered and anything that depends on order is going to break when someone does some valid reformatting.

@thejoshwolfe
Copy link
Contributor Author

I don't think we should ever assume or even encourage this: json fields are unordered and anything that depends on order is going to break when someone does some valid reformatting.

Ah, the ambiguity of the json spec. I generally agree that "json" in it's purest most abstract form has no guarantees of field order, but then it also has no guarantees of field uniqueness, or even any schema for that matter. I'm working on a project that does make assumptions about field order, because I know the code that produces it will always order things in a predictable way: https://clang.llvm.org/doxygen/JSONNodeDumper_8cpp_source.html https://clang.llvm.org/doxygen/JSONNodeDumper_8h_source.html thejoshwolfe/find-unused#1 (search for "inner") assuming that the tree node recursion comes after all other fields of a node makes it possible to process one node at a time before moving on to the next one. (As far as I've found, there's no documented schema for clang's -Xast-dump=json data, so the source code and runtime behavior is the best documentation I can get.)

All that to say, I'm pushing back on the concept of "valid json" or "valid reformatting" in general. When json is used for communication between two or more parties, it's up to the parties involved to decide what assumptions and guarantees are to be made about the json, usually including at least field names and types of the values, but it could also include field order, number precision, heck even whitespace usage. Json itself is just a syntax and doesn't assign meaning to anything (unlike protobuf for example).

All that philosophical wandering about to say: the general-purpose json package in the zig std lib should be designed with the most options feasibly possible for the user of the library. Making assumptions about field order is useful in some situations and not in others. This package allows you to care about field order or not care according to your use case. This package does not allow you to care about whitespace when parsing, but I think that's a reasonable limitation; we still do want to be motivated by real-world use cases after all.

@thejoshwolfe
Copy link
Contributor Author

With allocators now required, does json parsing still work at comptime?

It might work with a fixed buffer allocator maybe. Do you have a link to a use case for this?

@mlugg
Copy link
Member

mlugg commented May 17, 2023

It might work with a fixed buffer allocator maybe. Do you have a link to a use case for this?

Allocators currently can't be used at comptime due to some restrictions of the comptime memory model - see #14931. That probably won't be solved for a while, since it's a pretty hard problem.

The Zig homepage actually uses JSON parsing at comptime as the first example a potential user ever sees of comptime - I suspect this is because JSON parsing is an easily understandable task, but is complex enough that it's "impressive" to see it done at comptime. However, I'm skeptical of whether this is actually useful in practice. If we drop this support, we should make sure someone changes the example on that page!

Note that if you do want to preserve this functionality (due to seeing a convincing use case or otherwise), the new @inComptime() builtin should help you.

@malcolmstill
Copy link
Contributor

malcolmstill commented May 17, 2023

I wasn't able to find an example input document in your program @malcolmstill , so I can't evaluate that guarantee.

@thejoshwolfe ah, an example would be https://github.com/malcolmstill/zware/blob/master/test/testsuite-generated/br.json, and I think these files happen to consistently be generated with the "type" field first.

jdmichaud added a commit to jdmichaud/universal-headers that referenced this pull request May 19, 2023
jdmichaud added a commit to jdmichaud/universal-headers that referenced this pull request May 19, 2023
@thejoshwolfe
Copy link
Contributor Author

@malcolmstill take a look at #15981 . Does that kind of strategy solve your problem?

@thejoshwolfe thejoshwolfe force-pushed the json-dynamic-parse branch 2 times, most recently from 0669bef to b934176 Compare June 17, 2023 19:44
@thejoshwolfe thejoshwolfe merged commit 32cb946 into master Jun 19, 2023
@thejoshwolfe thejoshwolfe deleted the json-dynamic-parse branch June 19, 2023 15:21
@malcolmstill
Copy link
Contributor

@thejoshwolfe that worked a treat, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants