Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Must an InvestigativeAction always have a ProvenanceRecord among its outputs? #136

Open
ajnelson-nist opened this issue Nov 8, 2023 · 1 comment
Milestone

Comments

@ajnelson-nist
Copy link
Member

The CASE Ontology Committee had discussed in the past whether InvestigativeActions would always have a ProvenanceRecord among their outputs. I recall, informally, we had said yes, this is a requirement. However, we had not encoded this in SHACL or OWL.

I have found a few logistical issues with requiring a ProvenanceRecord as output. While I don't think these are necessarily counter-arguments, they seem to need clarification if we move towards encoding the generated-ProvenanceRecord expectation.

  1. A ProvenanceRecord must have at least one member. This is a requirement inherited from uco-core:ContextualCompilation, ProvenanceRecord's superclass. By my understanding of what CASE had not formally encoded, that ProvenanceRecord should have members that are either (1) inputs to the InvestigativeAction, or (2) other results of the InvestigativeAction.
  2. I do not think uco-action:subaction was considered as part of the discussion, because it had not been exercised in CASE-Examples or the CASE website.1 It is not quite clear how that property is supposed to be used with InvestigativeAction, namely whether any sub-action of an InvestigativeAction is also an InvestigativeAction. The answer to that question might complicate requiring a ProvenanceRecord as output.

Let's take this example graph, which renders an action that takes a JPEG file as input and uses a (made-up) tool, "ExampleJpegAnalyzer," to analyze the JPEG's contents in a couple ways. The tool unconditionally calls multiple independent, tool-internal functions as part of its execution, look_up_location, ocr and others. The ocr function yields a file. This graph omits some triples for the sake of discussion.

kb:tool-1
	a uco-tool:AnalyticTool ;
	uco-core:name "ExampleJpegAnalyzer" ;
	.

kb:jpeg-i1
	a uco-observable:RasterPicture ;
	.
kb:provenance-record-i1
	a case-investigation:ProvenanceRecord ;
	uco-core:object kb:jpeg-i1 ;
	.

kb:action-1
	a case-investigation:InvestigativeAction ;
	uco-action:instrument kb:tool-1 ;
	uco-action:object
		kb:jpeg-i1 ,
		kb:provenance-record-i1
		;
	uco-action:subaction kb:action-2 ;
	uco-action:result kb:provenance-record-o1 ;
	.

kb:action-2
	a uco-action:Action ;
	uco-core:description "Store any OCR-recognized text in a file." ;
	uco-action:object kb:jpeg-i1 ;
	uco-action:result kb:ocr-text-results-file-1 ;
	.
kb:ocr-text-results-file-1
	a uco-observable:File ;
	.

Question 1: Is kb:action-2 a InvestigativeAction? If so, and if all InvestigativeActions need to generate a ProvenanceRecord, how do the members of its ProvenanceRecord relate to the parent action's ProvenanceRecord?

Question 2: What are the members of the output ProvenanceRecord, kb:provenance-record-o1?

Question 2.1: Is kb:jpeg-i1 a member, recording that it was seen and/or handled?

Question 2.2: Is kb:ocr-text-results-file-1 in kb:provenance-record-o1? Is the answer to this influenced by whether kb:action-2 is or is not a InvestigativeAction?

I intend to take responses to these questions and propose OWL and SHACL encodings to capture the consensus.

Footnotes

  1. To date, subaction still has not been exercised in either of those repositories. It is exercised in CASE-Corpora, and a recent update in testing infrastructure triggered a data validation error in a sketch of mine, which led to this Question post.

ajnelson-nist added a commit to casework/CASE-Corpora that referenced this issue Nov 9, 2023
A local-to-CASE-Corpora shape requires `InvestigativeAction`s to have
`ProvenanceRecord`s as outputs.  (See `/shapes/local.ttl`,
`sh-case-corpora-local:InvestigativeAction-shape`.)
That shape used a SHACL mechanism that was not active in this
repository's testing until pySHACL Issue 213 was closed.

This patch adds a `ProvenanceRecord` as output to the last non-subaction
that had the phone as input.  Review of the results of this SPARQL query
indicate the device is not used in further actions, so this
`ProvenanceRecord` has no further impact on the graph today.

```sparql
SELECT ?nAction ?lDescription
WHERE {
  ?nAction
    uco-action:object kb:device-ea732801-7d0e-46ac-a028-69b782c97a46 ;
    .
  OPTIONAL {
    ?nAction
      uco-core:description ?lDescription ;
      .
  }
}
ORDER BY ?nAction
```

There is some open question on how to tie the subactions' outputs to the
parent action's `ProvenanceRecord`; that thread is on CASE Issue 136.

A follow-on patch will regenerate Make-managed files.

References:
* RDFLib/pySHACL#213
* casework/CASE#136

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
@sbarnum
Copy link
Contributor

sbarnum commented Dec 12, 2023

The CASE Ontology Committee had discussed in the past whether InvestigativeActions would always have a ProvenanceRecord among their outputs. I recall, informally, we had said yes, this is a requirement. However, we had not encoded this in SHACL or OWL.

Part of the design discussions around InvestigativeAction and why it was created as something distinct from just Action was that it was paired with ProvenanceRecord to form the mechanism for tracking provenance of objects through an investigative process.
It was definitely intended that an InvestigativeAction would always have a ProvenanceRecord among their action:result. I agree that this was never formally codified in the OWL or SHACL.

A ProvenanceRecord must have at least one member. This is a requirement inherited from uco-core:ContextualCompilation, ProvenanceRecord's superclass. By my understanding of what CASE had not formally encoded, that ProvenanceRecord should have members that are either (1) inputs to the InvestigativeAction, or (2) other results of the InvestigativeAction.

A ProvenanceRecord would never contain/reference objects that are/were inputs to the InvestigativeAction that produced the ProvenanceRecord. It would only contain objects resulting from the InvestigativeAction. InvestigativeAction1 could have ProvenanceRecord1 in its results and objects referenced within ProvenanceRecord1 (that resulted from InvestigativeAction1) could be used as inputs to InvestigativeAction2 in a way that lets you chain inputs to actions to results which may be inputs to other actions.

I do not think uco-action:subaction was considered as part of the discussion, because it had not been exercised in CASE-Examples or the CASE website.1 It is not quite clear how that property is supposed to be used with InvestigativeAction, namely whether any sub-action of an InvestigativeAction is also an InvestigativeAction. The answer to that question might complicate requiring a ProvenanceRecord as output.

subaction was considered as part of the discussion.
subaction allows complex actions to be described as a single overall Action made up of multiple more atomic subactions.
Consider something like Action="Start the car". This could be thought of as one overall action for some contexts but it likely consists of multiple subactions such as "insert key", "turn key to auxiliary position", and "turn key to start position". Each of those subactions is an independent action itself that can have performer, object, instrument, result, etc property values. The overall "Start the car" Action would reference each of the separate subactions using its action:subaction property and the action:result property could either be a union of the results of the subaction or just a subset of them depending on context.
All of this would also be true for InvestigativeAction except that the ProvenanceRecord for the overall action should likely always be a full union of the contents of the ProvenanceRecords for all of the subactions.
It should be pretty straightforward.

I find it a bit hard to interpret and comment on the example graph since neither action-1 or action-2 specify the core:name of the action (what is the action being performed).
Typically an Action that contains subactions would contain more than a single subaction but I guess that is not a hard requirement. It would just be more common and easier to understand.

Question 1: Is kb:action-2 a InvestigativeAction? If so, and if all InvestigativeActions need to generate a ProvenanceRecord, how do the members of its ProvenanceRecord relate to the parent action's ProvenanceRecord?

Given that InvestigativeAction is defined as "An investigative action is something that may be done or performed within the context of an investigation, typically to examine or analyze evidence or other data." it seems logical that any subaction of an InvestigativeAction should likely also be an InvestigativeAction.
The ProvenanceRecords of the subactions would relate to the ProvenanceRecord of the overall action as described in my comment above (overall ProvenanceRecord would be a union of contents of subaction ProvenanceRecords and likely should include the subaction ProvenanceRecords themselves).

Question 2: What are the members of the output ProvenanceRecord, kb:provenance-record-o1?

That would completely depend on what the actual actions were here which is not clearly specified.
In a simple case as shown in the example where the overall action contained only a single subaction (which as described in the above comment should likely be an InvestigativeAction with its own ProvenanceRecord) and did nothing else besides the subaction then the members of kb:provenance-record-o1 would likely be the ProvenanceRecord resulting from action-2 along with its contents.

Question 2.1: Is kb:jpeg-i1 a member, recording that it was seen and/or handled?

Again, I think that depends on the nature of the actual action in action-1 which is not specified.
If it did not change kb:jpeg-i1 in any way then it should not be included in kb:provenance-record-o1 as it is only an input to action-1 and not an output.
If it did change kb:jpeg-i1 in any way then it should be included in kb:provenance-record-o1 as it is not only an input but also an output.

Question 2.2: Is kb:ocr-text-results-file-1 in kb:provenance-record-o1? Is the answer to this influenced by whether kb:action-2 is or is not a InvestigativeAction?

As described in the above comments, yes it would be in kb:provenance-record-o1 because action-2 should be an InvestigativeAction with its own ProvenanceRecord (let's call it kb:provenance-record-o2). kb:provenance-record-o2 would contain kb:ocr-text-results-file-1 and kb:provenance-record-o1 would contain both kb:provenance-record-o2 and kb:ocr-text-results-file-1.

@ajnelson-nist ajnelson-nist added this to the CASE 1.x.0 milestone Jan 10, 2024
ajnelson-nist added a commit that referenced this issue Jan 23, 2024
This new shape stemmed from discussion on CASE Issue 136.

As a matter of preserving backwards compatibility, this patch introduces
the shape requiring `ProvenanceRecord`s with a `sh:Warning`-level
severity.  In CASE 2.0.0, this requirement will be strengthened into a
`sh:Violation`.

A separate proposal will be filed with UCO to test the minimum qualified
cardinality OWL structure.  A draft of that syntax review system was
used to test this patch.

This patch adds a version floor for pySHACL to ensure an update in
qualified value shape handling is included, which is necessary for the
new property shape to function when using pySHACL.

Disclaimer:

References:
* RDFLib/pySHACL#213
* #136
* #146

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants