Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document 'XCollection' serialization specifics #509

Merged
merged 4 commits into from
Nov 7, 2023
Merged

Document 'XCollection' serialization specifics #509

merged 4 commits into from
Nov 7, 2023

Conversation

goneall
Copy link
Member

@goneall goneall commented Oct 5, 2023

No description provided.

@goneall goneall added the serialization Something about the representation of data in bytes label Oct 5, 2023
@goneall goneall added this to the 3.0-rc2 milestone Oct 5, 2023
@goneall
Copy link
Member Author

goneall commented Oct 9, 2023

Notes that in the proposed writeup "XCollection" will be in the serialized data with ONLY the properties not supported by the serialization format. which implies that all other "XCollection" properties would be serialized.

Since "XCollection" is a subclass of ElementCollection, the following required properties would always be serialized:

  • SPDXID
  • CreationInfo

There are also a number of other optional properties like name and comment which are inherited from Element.

I can see quite a few benefits to having these present in the serialized data, so I personally don't see any issue - but I thought I would point this out to make sure we're on the same page on how the "XCollection" is serialized.

Signed-off-by: Gary O'Neall <gary@sourceauditor.com>
Signed-off-by: Gary O'Neall <gary@sourceauditor.com>
…n team

Signed-off-by: Gary O'Neall <gary@sourceauditor.com>
@davaya
Copy link
Contributor

davaya commented Oct 16, 2023

This PR describes X-Collection / SerializableCollection / LogicalDocument.

PR #500 defines SerializedElements / PhysicalDocument. It is now called PhysicalDocument for clarity, and it is a subclass of Artifact as agreed in the 10/16 serialization meeting.

PR #500 also includes the Payload non-element datatype class that defines serialized data. If there are "quite a few benefits to having these (name, comment) present in the serialized data", they can be added to the Payload class. But for now it includes only NamespaceMap and CreationInfoMap properties necessary to compact and beautify the serialized data. I'd recommend that name, comment, etc be included in the PhysicalDocument element that refers to serialized data rather than directly in the Payload data itself.

serialization/README.md Outdated Show resolved Hide resolved
serialization/README.md Outdated Show resolved Hide resolved

Within the model, we have an "XCollection" which represents the common properties of the collection of elements across all data formats.

We can refer to the actual serialized bytes as and Artifact in the model.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest slight tweak to "We can represent/characterize the actual serialized bytes using an Artifact element in the model."

serialization/README.md Outdated Show resolved Hide resolved

In that case, the "XCollection" will be in the serialized data with ONLY the properties not supported by the serialization format.

There is at most one "XCollection" in the serialization.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest slight tweak:

"There is at most one "XCollection" element defined in any given instance of serialization."

There can be any number of references via id but only a single XCollection defined within a given instance of serialization.

serialization/README.md Outdated Show resolved Hide resolved

The serialized "XCollection" shall not contain properties where that property value can be derived from the the serialized data itself.

The "XCollection" section in the specification markdown file describes how each of the properties are serialized.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest slight tweak:

"The specification markdown files describing rules for each serialization format will contain an "XCollection" section describing how each of the properties of "XCollection" are serialized for the given serialization format."

The RDF graph of an instance of the SPDX model shall contain all Element nodes (i.e. objects that are subclasses of Element) as a list on top-level under the "@graph" key.
This means that all references to Element nodes have to use the URI of the referenced Element.
Inlining/Embedding of Element nodes into other nodes is not allowed.
On the other hand, non-Element nodes (like those of type "ExternalReference" or similar complex data classes) have to be inlined (TODO: we may want to make an exception for CreationInfo, depending on the outcome of the surrounding discussion).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the consensus result regarding serialized compaction for CreationInfo is that WHEN SERIALIZED each instance of CreationInfo containing different data would be serialized (with its @type and @id) as an object within the '@graph' JSON-LD structure and referenced by any Elements with that same CreationInfo. When deserialized each serialized CreationInfo MUST be expanded back out to all of the elements it applies to such that each element MUST include its own CreationInfo (with separate @id from any other elements CreationInfo).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbarnum - Let's separate this out into a separate issue we can tackle after merging this. I did not intend to change the treatment of CreationInfo, just copying what was there before in the rdf.md file. I also have some concerns in how it is currently worded.

The RDF graph of an instance of the SPDX model shall contain all Element nodes (i.e. objects that are subclasses of Element) as a list on top-level under the "@graph" key.
This means that all references to Element nodes have to use the URI of the referenced Element.
Inlining/Embedding of Element nodes into other nodes is not allowed.
On the other hand, non-Element nodes (like those of type "ExternalReference" or similar complex data classes) have to be inlined (TODO: we may want to make an exception for CreationInfo, depending on the outcome of the surrounding discussion).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also needs calling out explicitly here that each embedded non-Element node MUST have an explicit id. This is true for ALL serialization formats.
The serialized embedding within the json-ld object structure is a method of compaction to make the content easier to read (and process by some parsers) but it does not inherently represent the native deserialized logical content which is a graph where EVERY class instance (Element nodes and non-Element nodes) are separate nodes in the graph connected using the globally unique id of each node which MUST be present on all nodes.

There is a possible serialization compaction approach where we COULD (though I would not recommend it) assert that in situations where a producer is minting new elements to be directly serialized without creation anywhere internally first which would require ids already be assigned they could use blank node ids (like has been discussed for CreationInfo) within a specific serialization instance. These blank nodes would then be REQUIRED to be skolemized to create full globally unique ids (like done for CreationInfo) upon any deserialization. This could potentially save some space within the serialized content but it would likely be minimal and the extra complexity in handling and potential for misinterpretation makes this approach a questionable option to pursue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above - i just copy/pasted what was there before.

serialization/json-ld.md Outdated Show resolved Hide resolved

## XCollection

The following XCollection properties are mapped to JSON-LD syntax specifications.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest small tweak:

"The following XCollection properties are mapped to native JSON-LD mechanisms defined within the JSON-LD syntax specifications."

Also renames XCollection to SpdxDocument

Signed-off-by: Gary O'Neall <gary@sourceauditor.com>
@goneall goneall marked this pull request as ready for review November 2, 2023 16:00
Copy link
Collaborator

@sbarnum sbarnum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks good to me.
It accurately covers the consensus results of serialization WG discussions on these topics and aligns with the rest of the model.

There is one issue identified in comments regarding like 23 of the json-ld.md file that needs to be raised and addressed to clarify serialization rules for non-Element classes.

@goneall
Copy link
Member Author

goneall commented Nov 7, 2023

Discussed on 7 November 2023 call - merging

@goneall goneall merged commit c80aab9 into main Nov 7, 2023
1 check passed
@bact bact deleted the serjsonld branch August 28, 2024 11:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
serialization Something about the representation of data in bytes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants