Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project context [RPCX] #71

Closed
jpullmann opened this issue Jan 18, 2018 · 29 comments
Closed

Project context [RPCX] #71

jpullmann opened this issue Jan 18, 2018 · 29 comments
Assignees
Labels
dcat due for closing Issue that is going to be closed if there are no objection within 6 days provenance requirement roles status
Milestone

Comments

@jpullmann
Copy link

Project context [RPCX]

Provide a means to define a 'project' as a research, funding or work organzation context of a dataset.


Related use cases: Dataset business context [ID49] Modeling funding sources [ID31] 
@dr-shorthair
Copy link
Contributor

I have a draft proposal for a small vocabulary for Project, as a subclass of prov:Activity.
See
https://dr-shorthair.github.io/ont/project/

This would support linking a dataset to a project using the prov:wasGeneratedBy predicate, as mentioned in #77

@andrea-perego
Copy link
Contributor

andrea-perego commented Jan 19, 2018

Thanks, @dr-shorthair . I would also like to contribute some work we did to map DataCite to DCAT-AP, which includes the mapping of what in DataCite is called "Funding Reference". The mapping tables are here: https://ec-jrc.github.io/datacite-to-dcat-ap/

I think this is also one of the possible use cases for the use of qualified and non-qualified forms.
The non-qualified form is the basic case where you want to say that a given dataset has been created by a given project. However, if you also need to say that this is done in a given timeframe of the activity of a project you need to add a node in the graph, between "project" and "dataset", where to attach this information.

@dr-shorthair , in my understanding the vocabulary you contributed support both cases, right?

@dr-shorthair
Copy link
Contributor

Linking through to funding details is part of the proposed Project ontology. Not sure if I got it all right yet, so would be interested in working through other examples.

@dr-shorthair
Copy link
Contributor

Suggest removing the following labels:
profile, semantics, service, usage_control, version
??

@dr-shorthair
Copy link
Contributor

dr-shorthair commented Jul 11, 2018

Picking up the second example on #253 which describes a dataset from CSIRO's DAP, the following uses PROV to document the project context for the dataset. The PROV-O property prov:wasGeneratedBy points to dap:P366 which is a prov:Activity, which in turn is associated with dap:ATNF and used the dap:Parkes-radio-telescope.

dap:atnf-P366-2003SEPT
  rdf:type dcat:Dataset ;
# other properties omitted here
  dcterms:identifier "https://doi.org/10.4225/08/598dc08d07bb7"^^xsd:anyURI ;
  dcterms:relation [      dcterms:identifier "PH0090_0011.sf" ;    ] ;
  dcterms:relation [      dcterms:identifier "PH0090_0021.sf" ;    ] ;
  dcterms:relation [      dcterms:identifier "PH0090_0031.sf" ;    ] ;
  dcterms:title "Parkes observations for project P366 semester 2003SEPT" ;
  dcat:contactPoint dap:MartaBurgay-vcard ;
  dcat:keyword "pulsar" ;
  dcat:landingPage <https://data.csiro.au/dap/landingpage?pid=csiro:P366-2003SEPT> ;
  prov:wasGeneratedBy dap:P366 ;
.
dap:P366
  rdf:type prov:Activity ;
  dcterms:contributor dap:A_Lyne , dap:Andrea_Possenti , dap:B_Joshi , dap:F_Camilo , dap:G_Pearce , dap:M_Kramer , dap:M_McLaughlin , dap:Nichi_D'Amico , dap:R_Manchester ;
  dcterms:type "Observation" ;
  rdfs:comment "Parkes multibeam high-latitude pulsar survey" ;
  rdfs:label "P366 - Parkes multibeam high-latitude pulsar survey" ;
  prov:used dap:Parkes-radio-telescope ;
  prov:wasAssociatedWith dap:Marta_Burgay ;
  prov:wasInformedBy dap:ATNF ;
.
dap:ATNF
  rdf:type prov:Activity ;
  rdfs:label "Australia Telescope National Facility" ;
  prov:informed dap:P366 ;
.

Note that prov:wasGeneratedBy is axiomatized

prov:wasGeneratedBy
  rdf:type owl:ObjectProperty ;
  rdfs:domain prov:Entity ;
  rdfs:range prov:Activity ;
.

so this entails that dap:atnf-P366-2003SEPT is a prov:Entity.

@dr-shorthair
Copy link
Contributor

dr-shorthair commented Jul 11, 2018

... and not being entirely happy with the limitations of PROV and DC, here is the same activity described using my proposed Project Ontology, which specializes prov:Activity for planned and budgeted activities commonly known as Projects :-)

dap:P366-1
  rdf:type prov:Activity ;
  rdfs:comment "Parkes multibeam high-latitude pulsar survey" ;
  rdfs:label "P366 - Parkes multibeam high-latitude pulsar survey" ;
  proj:hasParticipant dap:A_Lyne , dap:Andrea_Possenti , dap:B_Joshi , dap:F_Camilo , dap:G_Pearce , dap:M_Kramer , dap:M_McLaughlin , dap:Nichi_D'Amico, dap:R_Manchester ;
  proj:hasPrincipalInvestigator dap:Marta_Burgay ;
  proj:isSubActivityOf dap:ATNF-1 ;
  proj:objective "Observation" ;
  prov:used dap:Parkes-radio-telescope ;
.
dap:ATNF-1
  rdf:type proj:Project ;
  rdf:type prov:Activity ;
  rdfs:label "Australia Telescope National Facility" ;
  proj:hasSubActivity dap:P366-1 ;
.

Also see #77 #76 #128

@davebrowning
Copy link
Contributor

Mmm. I haven't had a chance to look for something public - not aware of anything off the top of my head. I would have thought others would have come across this.

If I recall correctly, where my colleagues are using this internally, prov:Activity is always bounded (i.e. has a prov:endedAtTime). We took the view that that our feeds are operate continually rather than continuously - a record at a time, so to speak - so the publication activity was quite granular, but finite. (There are weaknesses in this approach, in my view but it suited our specific use case). The business context on the other hand, isn't finite - rather, it's indefinite, with no known end time.

@larsgsvensson
Copy link
Contributor

Our use case (might be slightly OT for this thread) is when a human assigns a language code to a document. Is every assignment its own prov:Activity or can we see the continuous assignment of language codes as one prov:Activity going on since we started doing that?

@dr-shorthair
Copy link
Contributor

dr-shorthair commented Sep 14, 2018

In the real world we have 'projects' (or similar ongoing activities), within which there are more specific or atomic activities. The PROV model focuses on the latter - the atomic events associated with each specific output. Nevertheless there appears to be a quite common requirement to describe the bigger (project) context instead of (or in addition to) the atomic events. It's not that the latter don't exist conceptually of course, just that the appropriate level of detail for the application may not necessarily match the viewpoint that guided the development of the PROV model and OWL implementation.

IMO however the prov model still applies at the coarser level - all the properties of a prov:Activity are relevant to projects or ongoing activities. The key addition needed is an activity-nesting or -composition predicate.

@dr-shorthair
Copy link
Contributor

dr-shorthair commented Sep 15, 2018

So the specific answer to @larsgsvensson is 'both', conceptually at least. But practiclaly it might be the case that you only want to describe the overall process and not each individual sub-activity.

@dr-shorthair
Copy link
Contributor

Should we close this issue?
Example here https://w3c.github.io/dxwg/dcat/#examples-dataset-provenance
Normative statements here https://w3c.github.io/dxwg/dcat/#Property:dataset_wasgeneratedby
(from #312, #338).

Or do we still intend to look at the proposed Project Ontology as a potential "Note"

@davebrowning
Copy link
Contributor

When we last looked at this I had the impression we could/wanted to do more (for example the continuous publication stuff), but its also true that that we can always do more... I do think we've addressed the requirement, and the other scenarios are tracked by actions here and here.

re: a potential "Note" - I have no strong opinion (though I do think the ontology is useful)

@dr-shorthair
Copy link
Contributor

So I put the note on the table more or less from the beginning of the DXWG.

Everyone who has looked at it seems to agree that it is useful, though it has attracted no formal reaction, positive or negative. I have not pushed it since we were clearly quite busy enough with the other things on our plate. But it could probably be finished up with 2-3 days work. So the question becomes: is there an appetite in the DXWG for another non-Rec-track deliverable? Should I raise a specific issue for this question, so we can decide one way or the other?

@agbeltran
Copy link
Member

@andrea-perego - I cannot access the link to the mapping that you referred here., is it available somewhere else?

Thanks, @dr-shorthair . I would also like to contribute some work we did to map DataCite to DCAT-AP, which includes the mapping of what in DataCite is called "Funding Reference". The mapping tables are here: https://webgate.ec.europa.eu/CITnet/stash/projects/ODCKAN/repos/datacite-to-dcat-ap/browse/documentation/Mappings.md

I think this is also one of the possible use cases for the use of qualified and non-qualified forms.
The non-qualified form is the basic case where you want to say that a given dataset has been created by a given project. However, if you also need to say that this is done in a given timeframe of the activity of a project you need to add a node in the graph, between "project" and "dataset", where to attach this information.

@dr-shorthair , in my understanding the vocabulary you contributed support both cases, right?

@agbeltran
Copy link
Member

So I put the note on the table more or less from the beginning of the DXWG.

Everyone who has looked at it seems to agree that it is useful, though it has attracted no formal reaction, positive or negative. I have not pushed it since we were clearly quite busy enough with the other things on our plate. But it could probably be finished up with 2-3 days work. So the question becomes: is there an appetite in the DXWG for another non-Rec-track deliverable? Should I raise a specific issue for this question, so we can decide one way or the other?

Given that projects (and related funding, see #66) are generic topics that go beyond DCAT, I think it is worth considering the Project ontology as another output of the WG, as it can be easily used within DCAT to cover the requirements (this on and #66). Not sure about the process, though, as strictly our purpose is the DCAT revision.

@agbeltran
Copy link
Member

As this issue needs more discussion, I'm moving it to the next milestone.

@andrea-perego
Copy link
Contributor

@andrea-perego - I cannot access the link to the mapping that you referred here., is it available somewhere else?

Sorry for the late reaction, @agbeltran . The work is now on GH (I just updated the link in the original comment):

https://ec-jrc.github.io/datacite-to-dcat-ap/

@davebrowning
Copy link
Contributor

This issue remains active, since there are a number of things we could do with either the project ontology or in providing additional examples that perhaps go beyond the common meaning of the word project. This would all be valueable but requires resource which won't be available in 3PWD timescales, so removing from this milestone and 'parking' it in 4PWD for now

@andrea-perego
Copy link
Contributor

As there has been no further discussion on this issue, I propose to close it.

@andrea-perego andrea-perego added the due for closing Issue that is going to be closed if there are no objection within 6 days label Oct 29, 2020
@dr-shorthair
Copy link
Contributor

Agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dcat due for closing Issue that is going to be closed if there are no objection within 6 days provenance requirement roles status
Projects
None yet
Development

No branches or pull requests

8 participants