Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support yaml files with multiple documents #222

Open
nikolaik opened this issue Jan 10, 2023 · 7 comments
Open

Support yaml files with multiple documents #222

nikolaik opened this issue Jan 10, 2023 · 7 comments
Labels
enhancement New feature or request needs-investigation Requires study to find an appropriate resolution

Comments

@nikolaik
Copy link
Contributor

nikolaik commented Jan 10, 2023

Is there any interest in adding support for checking yaml files with multiple documents in them? That is:

catalog-info.yaml:

---
apiVersion: backstage.io/v1alpha1
kind: System
metadata:
  name: example-system
spec:
  owner: my-team
---
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: example-service
spec:
  owner:  my-team
  system: example-system
❯ check-jsonschema --schemafile "https://json.schemastore.org/catalog-info.json" -v catalog-info.yaml
Several files failed to parse.
  FailedFileLoadError: Failed to parse catalog-info.yaml
    in "/Users/nikolark/.pyenv/versions/3.10.8/lib/python3.10/site-packages/check_jsonschema/instance_loader.py", line 31
    >>> data: t.Any = self._parsers.parse_file(path, self._default_filetype)

    caused by

    ComposerError: expected a single document in the stream
    in "<byte string>", line 2, column 1:
      apiVersion: backstage.io/v1alpha1
      ^ (line: 2)
  but found another document
    in "<byte string>", line 8, column 1:
      ---
      ^ (line: 8)
      in "/Users/nikolark/.pyenv/versions/3.10.8/lib/python3.10/site-packages/check_jsonschema/parsers/__init__.py", line 89
      >>> return loadfunc(fp)

Ref https://yaml.org/spec/1.2.2/#22-structures

@nikolaik
Copy link
Contributor Author

It seems to run with the following changes #223

@sirosen
Copy link
Member

sirosen commented Jan 10, 2023

Thanks for raising this, and sharing the initial crack at an implementation in #223 !
I need to think about this more to figure out what the right thing to do is.

One thing which we'd need to consider is that a document can contain a top-level array, like

- foo
- bar

and you can have a schema which expects/requires this.
So that use-case needs to be supported and handled separately from loading multiple documents.

I'm also going to have to think about the output, and how it could be clarified in such a case.
We could replace the document name with the name + index, like yolo.yaml:0:$: ErrorString (here's where the formatting with :: happens today).
Or maybe it should be the line number where the document started, yolo.yaml:8:$: ErrorString... 🤔


I'm receptive to this, especially with someone putting in the effort to show how it could be done (again, thanks!) but need to think harder about it to decide what is right.

@sirosen sirosen added enhancement New feature or request needs-investigation Requires study to find an appropriate resolution labels Jan 20, 2023
@s-weigand
Copy link
Contributor

Any update on this? 😅

Reporting wise IMHO <file_name>:<line_number> is better since it can be used in editors to jump directly to the code line that caused the error.

@sirosen
Copy link
Member

sirosen commented Sep 25, 2023

I haven't been able to make time for it, unfortunately. My time for check-jsonschema has been spent on other issues, but it's on my radar as a "would be nice". At check-jsonschema's currently level of popularity, I'd call this relatively high interest (multiple 👍s + a follow-up comment), which is relevant for my prioritization.

It's good to know that <filename>:<lineno>: <message> is a useful format for a specific purpose. I think we can call that part settled.

The part that's not clear to me, circling back to this thread, is what the CLI interface should be for this to disambiguate the following two files:

# list.yaml
- foo: bar
- bar: baz

vs

# multiobject.yaml
---
foo: bar
---
bar: baz

And how should it behave if you run this?

check-jsonschema --schemafile foo.json list.yaml multiobject.yaml

I think the right thing is that all YAML files be treated as multi-item. Is that all we need? I'm trying to make sure we can describe the behavior well, and it should be easy enough to write code to that spec.

@vorburger
Copy link

I think the right thing is that all YAML files be treated as multi-item. Is that all we need?

Actually, @sirosen I think about this a bit differently.

To me, list.yaml and multiobject.yaml aren't the same.

The former is 1 "object" (?) of "root type" List, with an "element type" of "something than can be foo or bar" (which seems un-usual).

The latter however is 2 (!) "objects" of the same "root type" as what's the "element type" of the former. Does this make sense?

So the way I understand things, a YAML with multiple documents is just a list of N instead of 1 "roots", which one considers to (have to be) "of the same type" for such a validation.

When this will be implemented, I could (try to) validate e.g. this with it.

@sirosen
Copy link
Member

sirosen commented Jan 16, 2024

To me, list.yaml and multiobject.yaml aren't the same.

The former is 1 "object" (?) of "root type" List, with an "element type" of "something than can be foo or bar" (which seems un-usual).

The latter however is 2 (!) "objects" of the same "root type" as what's the "element type" of the former. Does this make sense?

Yes, that matches my understanding as well. One file contains a single document, a list, while the other is a file containing two documents, each of which is an object.

I haven't looked at implementing this in a while, but IIRC the ruamel.yaml interface for loading multiple documents makes these hard to distinguish. As long as we don't "flatten" those two files to be identical once loaded, we can implement some change -- probably without issue.

@znd4
Copy link

znd4 commented Jul 23, 2024

A kludge:

yq ea -j '[.]' document-stream.yaml \
  | check-jsonschema --schemafile schema.json -

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs-investigation Requires study to find an appropriate resolution
Projects
None yet
Development

No branches or pull requests

5 participants