-
-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lint: consider the JSON parsing/deserialization design #312
Comments
I would also be in favor of trying to use the standard library first, and see how far we can take it. I don't mind us writing a bit of verbose code. |
@ErikSchierboom updated the top post with some investigation of behavior in edge cases. We probably want to define "valid JSON" in the spec, and perhaps explicitly forbid duplicate keys. |
I don't know, both
You mean forking the existing code? Would this be something that you could PR to Nim itself? |
I meant that we do some workaround such that
Yes, it's possible to add it to Nim itself. But it wouldn't be available until Nim 1.6.0 anyway, which might take a while. (We shouldn't build a configlet release with the It would be added an opt-in strict mode, since making I'd suggest we should do option 2 in the meantime regardless. The main downside is that we wouldn't immediately get upstream bug fixes in the patched files, unless we backport the latest changes manually. But such upstream changes to |
@ee7 I think that sounds like a good plan! 👍 |
The Nim stdlib JSON modules parse JSON in a lenient way: they silently ignore: - a comma after the last item in an array or object - a line comment with `//` - a potentially-multi-line comment with `/* */` This is useful for parsing real-world JSON, but is bad for our use case of validating JSON. JSON that contains a trailing comma or such comments is technically invalid [1], and less portable. Most JSON parsers produce an error for a trailing comma, including the Ruby library that Exercism uses. Therefore `configlet lint` should produce an error for a trailing comma, so that a track doesn't merge a PR that contains one. This commit adds our own copies of the Nim JSON modules to our repo, and patches them to raise an exception for these cases. Each module was copied from the latest version on Nim's `version-1-4` branch. In the future, we will pull in upstream changes to these files when we update the Nim version. This approach is better than calling something like `jq` because it: - doesn't require/prefer the user to have an extra program installed - doesn't special-case running in CI - doesn't require refactoring our codebase - is much faster We use the somewhat-obscure `nimscript.patchFile` [2] mechanism to override the location of the `json` and `parsejson` modules, so that we can keep `std/json` in our import statements. Instead of the `patchFile` approach, it would also be possible to apply some transformation to the AST of the relevant procs at compile-time, but that's less readable and harder to maintain. Our approach might be obsoleted in the future - for example, if we ever move to some other JSON library, or if the upstream `parseJson` gains e.g. an `allowTrailingCommas = true` argument. [1] See e.g. https://www.json.org/json-en.html [2] https://nim-lang.org/docs/nimscript.html#patchFile%2Cstring%2Cstring%2Cstring Closes: #345 See also: #312
In the meantime, we:
However, I'm still undecided about the best overall design for a refactor (not a high priority). For example, we could:
The latter two options have the downside of increasing our dependence on a non-stdlib package. However: the The latter options also give us less control over error messages. For example, if we use
|
I honestly don't care for performance as configlet is already incredibly fast. Robustness/readability/maintainability are much more important for configlet.
So if I'm interpreting this correctly, jsony is different from std/json in that it returns an error message if a type mismatch between the JSON content and the type to serialize to occurs? And in that case jsony only returns one (the first?) error? If so, I'd be totally fine with that. We'd be able to remove tons of validation code and type errors should be quite rare.
Would that be a lot of work? |
Yes. We can also fail fast for e.g. seeing a slug that is not kebab-case.
Yes. Although I imagine it's technically possible to output all the type mismatches - but probably not worth it.
I was thinking the same. Another advantage is better error messages: we'd find type errors at the time of parsing, and so we still have direct access to line number information.
I'd guess/hope that it wouldn't be too bad. It might even be simpler than having a separate first pass that checks the JSON is valid and that key name capitalization is correct. We could also try maintaining a patch, if the diff is small (this is what we do with cligen's |
This is a very important point.
This has worked out well with I've looked at what should be patched:
The first one is still required, as I've just checked it with the latest Ruby version. |
It's from For us, one of the main differences is that
(see a897d05cb0b4 for background).
Yes,
OK - thanks.
It's not completely strict, but it's stricter than I thought/remembered. It turns out that the only looseness is "the value of a snake_case JSON key does set the value of a camelCase nim object field". See the jsony docs, and the relevant jsony code. I've tried to illustrate below how jsony behaves. Feel free to stare at this: import pkg/jsony
type
ObjA = object
foo_bar: int
ObjB = object
anotherField: int
func init(T: typedesc[ObjA | ObjB], s: string): T =
fromJson(s, T)
# Summary: jsony is stricter when the object field name is snake_case style.
func main =
block:
let t = ObjA.init """{"foo_bar": 1}"""
doAssert t.foo_bar == 1
# The value of a camelCase JSON key DOES NOT set the value of a corresponding snake_case field.
block:
let t = ObjA.init """{"fooBar": 1}"""
doAssert t.foo_bar == 0 # The default value.
# And other capitalization is also not accepted.
block:
let t = ObjA.init """{"foo_Bar": 1}"""
doAssert t.foo_bar == 0
block:
let t = ObjA.init """{"foobar": 1}"""
doAssert t.foo_bar == 0
# ----------------------------------------------------------------------------
# The value of a snake_case JSON key DOES set the value of a corresponding camelCase field.
block:
let t = ObjB.init """{"another_field": 1}"""
doAssert t.anotherField == 1
block:
let t = ObjB.init """{"anotherField": 1}"""
doAssert t.anotherField == 1
# But other capitalization is not accepted.
block:
let t = ObjB.init """{"another_Field": 1}"""
doAssert t.anotherField == 0
block:
let t = ObjB.init """{"anotherfield": 1}"""
doAssert t.anotherField == 0
main() Summary: I think that unpatched parsing with jsony alone is sufficient to check JSON key names (and everything except trailing commas), as long as our spec for Exercism JSON files has no uppercase character in any key name, and we do one of these:
I'd suggest the first. Which leaves us doing one of these:
I think 1 is best, but we can do 2, 3, or 4 as a first implementation if it turns out that 1 is difficult. There is some subtlety though: if a user runs |
Agreed.
I wouldn't mind erroring on a trailing commas for |
@ErikSchierboom do you have reference link to the "does not support trailing commas"? I did not find mention of trailing commas, so I was not able to figure out if it is simply not mentioned or not supported or disallowed? I checked (searched for the word "trailing") v1.0 and v1.1 as it is here and did not find any mention of this. I might have missed it though. |
@kotp That link is a spec for JSON APIs. For the standards for JSON itself, see the railroad diagrams on https://www.json.org: And:
They're allowed in JavaScript, though. See: |
Thanks @ee7. I saw that as well, though still can not find anything about trailing commas being either unsupported or allowed, or even a "should" statement regarding this. The https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Trailing_commas document states that it (JSON) disallows trailing commas but does not show where that information is made known. So I would vote to not, if it is true that it is disallowed. |
Main options:
std/json
1a. The approach so far: parse into a
JsonNode
and work only with that.1b. Parse into a
JsonNode
, then unmarshall into some object usingto
.1c. Plus
std/jsonutils
Araq/packedjson
- keeps everything as a string. Lower memory usage thanstd/json
, and sometimes faster.planetis-m/eminim
- deserializes usingstd/streams
directly to anobject
. Doesn't fully support object variants, but maybe that isn't a problem for us.status-im/nim-json-serialization
- deserializes usingnim-faststreams
directly to anobject
. Probably the most mature third-party option. Currently has a large dependency tree, includingchronos
andbearssl
.treeform/jsony
- deserializes fromstring
directly to anobject
.(Note that
disruptek/jason
is serialization-only).There are also some more obscure ones that I haven't tried, and don't know anything about:
gabbhack/deser
andgabbhack/deser_json
Q-Master/packets
xomachine/NESM
Some of the above are possibly too lenient or require special handling in some edge cases.
Summary:
std/json
std/json
patchedpackedjson
eminim
json_serialization
jsony
For example:
std/json
permits a trailing comma, and comments with//
and/* */
. This is the main reason that it took a while to tick the boxes for "the file must be valid JSON" in lint: implement remaining linting rules #249. But we now have own patchedstd/json
with stricter parsing.configlet lint
must exit with a non-zero exit code for a trailing comma because the Ruby library that parses it later produces an error for a trailing comma.null
.See also:
I'd suggest that
jsony
ornim-json-serialization
might be best in the long-term. But maybe it's better to stick with the current approach until we've implemented all the linting rules, and refactor it later.One advantage of the current approach is that it's more low-level, which might better ensure that we're "checking the JSON file itself" rather than "checking that each value is valid when parsed with library X".
The text was updated successfully, but these errors were encountered: