Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify definition of dct:accrualPeriodicity in the context of DCAT #728

Closed
dr-shorthair opened this issue Feb 3, 2019 · 13 comments
Closed

Comments

@dr-shorthair
Copy link
Contributor

The DCMI definition of dct:accrualPeriodicity is [1]

"The frequency with which items are added to a collection"

However, the DCAT definition is [2]

"The frequency at which dataset is published"

which has a rather different meaning. In particular, the rate of addition of data to a time series might be quite different to the frequency of its release or publication.

This needs to be clarified.

[1] http://dublincore.org/documents/dcmi-terms/#terms-accrualPeriodicity
[2] https://w3c.github.io/dxwg/dcat/#Property:dataset_frequency

@dr-shorthair dr-shorthair changed the title Clairfy definition of dct:accrualPeriodicity in the context of DCAT Clarify definition of dct:accrualPeriodicity in the context of DCAT Feb 3, 2019
@dr-shorthair
Copy link
Contributor Author

This issue was triggered by a comment submitted by Daniel Pop [3]

[3] https://lists.w3.org/Archives/Public/public-dxwg-comments/2019Jan/0013.html

@andrea-perego
Copy link
Contributor

Concerning specifically the relevant Daniel Pop's comment:

  • Whether update frequency can be applied as well at the distribution level: this is very much up to how a dataset is updated and this information is documented. In my experience, I saw that this is typically done at the dataset level, but I don't see any issue why this shouldn't be done also for distributions - especially if they have different update frequencies

  • Daniel raises indirectly the issue of whether distributions with different update frequency should lead to the creation of distinct datasets. This is one of the cases where the approach to be chosen depends totally on the data provider and community practices. Nonetheless, the fact that in DCAT update frequency is at the dataset level must not by itself mean that we need to have different datasets to address this use case

@dr-shorthair
Copy link
Contributor Author

The concern applies primarily to datasets that are time-series, which might be issued on a different schedule to the rate of addition of data to the dataset. For example, a dataset showing rainfall at 5-minute intervals might be formally published and registered just once a day.

The DCAT definition appears to specify the latter, while the DCMI definition encompasses both. I think we need to either

  • make this explicit - add a usage note "In this context the use of dct:accrualPeriodicity concerns the rate at which the dataset as-a-whole is published, which is a narrower usage than permitted by the original DCMI definition. To describe the rate of addition of items within a dcat:Dataset. use XXX instead. "
  • provide XXX to describe the internal spacing of items within a dataset. (also see Summary statistics [RSS] #84 )

Else

  • revert to the DCMI definition, and clarify that this refers to the periodicity of accrual of items within a dataset, and recommend a different way to describe the publication schedule.

@andrea-perego , @kcoyle , @davebrowning do you have backward compatibility evidence here? How is dct:accrualPeriodicity used in practice?

@kcoyle
Copy link
Contributor

kcoyle commented Feb 15, 2019

Collection in Dublin Core relates to the Collection Application Profile. It clearly comes out of the archival community. I will ask if anyone has examples of use. The wording is specifically about the addition of entries to a collection or a catalog defining a collection, not about publication. Presumably the periodicity of publication could be the same as the periodicity that items are added to the collection, but obviously they could also differ.

@andrea-perego
Copy link
Contributor

@dr-shorthair asked:

@andrea-perego , @kcoyle , @davebrowning do you have backward compatibility evidence here? How is dct:accrualPeriodicity used in practice?

Yep. This is definitely used.

According to the report on usage statics of DCAT-AP in the European Data Portal (EDP) (Oct, 2017), dct:accrualPeriodicity is used in 334,497 records - corresponding to 45% of the records of the EDP.

So, there's indeed a backward compatibility issue to be taken into account.

@kcoyle
Copy link
Contributor

kcoyle commented Feb 15, 2019

The report says: "Use of dct:accrualPeriodicity
We analysed the use of the property dct:accrualPeriodicity which indicated at what frequency a dataset is updated by its owner."

This does seem different from the semantics of the dct definition. I've asked in the DC community if anyone there has examples of use.

Although the definitions differ, I wonder if the end result is not the same, which is "frequency of addition of resources to a catalog."

@dr-shorthair
Copy link
Contributor Author

Thanks @kcoyle - I thought my concern came down to thinking about whether the 'collection' under consideration is
(a) the Catalog, or
(b) the Dataset
But maybe that's not so clear cut.

And even "at what frequency a dataset is updated by its owner" still does not nail down "at what frequency items (values) are added to (spaced in) a time-series" since a daily dataset update might still include hourly data.

From a user's perspective, these differences really matter: The dataset-update rate is about currency, while the data-rate will probably determine whether the data is fit for purpose.

This is not an esoteric concern. It was prompted by an investigation in a recent metadata workshop, where we were trying to find datasets about related phenomena to do some modeling and analysis. Time-series with daily, seasonal and annual spacing can't be used together for most tasks, but that information is mostly only available by reading the abstract, or inspecting the downloaded data. If it was an explicit dataset statistic then a query could be constructed. That's why I referred over to #84 .

But I do think there is some clarification required for dcat:accrualPeriodicity.

@kcoyle
Copy link
Contributor

kcoyle commented Feb 16, 2019

@dr-shorthair I agree that there is a significant difference between adding items to a set and updating items in a set. The dct term, being from the archival world, it definitely about the former. So I could consider accrualPeriodicity to be about adding to the set, which in DCAT would be the catalog. Updating the dataset itself is closer in my mind to versioning, as the dataset undergoes a change. dqm's "expected update interval" seems closer to the latter. If both are needed then they obviously should be distinguished.

@smrgeoinfo
Copy link
Contributor

Isn't @dr-shorthair 's question about the temporal resolution of the data-- if the data set is reporting air temperature, are measurements recorded every minute, every hour, every day ... The accessible representation for the dataset might be updated once a day or once a year, but that is orthogonal to the temporal resolution of the data. There is a similar consideration for spatial resolution. These both appear to be different properties than what is intended by accrualPeriodicity.

@kcoyle
Copy link
Contributor

kcoyle commented Feb 17, 2019

@smrgeoinfo Re-reading his post, I think you are right, so in that case the dct property is definitely of a different nature (although still accruing and still periodic, but different subjects being acted on). The DCAT definition "frequency at which a dataset is published" is something else yet again. We may need a well-done chart to sort out the meanings and their combinations.

@dr-shorthair
Copy link
Contributor Author

Like I said - I think we probably just need to improve the wording around dct:accrualPeriodicity, or add a usage note, and then take care of the temporal (and spatial?) resolution separately (#84). Then we can cross-reference the two with a note that draws attention to the fact that these requirements are different, so are handled separately.

@andrea-perego
Copy link
Contributor

@kcoyle said:

@dr-shorthair I agree that there is a significant difference between adding items to a set and updating items in a set. The dct term, being from the archival world, it definitely about the former. So I could consider accrualPeriodicity to be about adding to the set, which in DCAT would be the catalog. Updating the dataset itself is closer in my mind to versioning, as the dataset undergoes a change. dqm's "expected update interval" seems closer to the latter. If both are needed then they obviously should be distinguished.

Not necessarily. If the dataset is updated by adding new data items (e.g., adding new rows in a table), without modifying the existing ones, then we are in the same case of the catalogue - where each dataset can be considered as a data item.

We are talking here about different ways (not mutually exclusive) on how a dataset can be updated, and how they are relevant for users. I think that, in the most general case, users are interested in knowing how frequently a dataset is updated, and they are not very much interested in how this is done. If we identify the need to also clarify whether the update was done by modifying the existing data, adding new data items, or both, fine. But we don't have to forget the most general use case.

About dct:accrualPeriodicity, I would support including any clarification on its meaning and recommended use, but we need to take into account that, since 2014, all datasets documented with DCAT have been using dct:accrualPeriodicity to cover all the cases of frequency of update of a dataset, so whatever we do must not break backward compatibility.

@dr-shorthair
Copy link
Contributor Author

dr-shorthair commented Mar 1, 2019

A new property dcat:temporalResolution is added by #776 - this accommodates the rate of addition of items in a time series. So I suggest that the definition of dct:accrualPeriodicity can be clarified by adding a cross-reference to dcat:temporalResolution as addressing a distinct requirement.

e.g.

NOTE: the value of dct:accrualPeriodicity gives the rate at which the dataset-as-a-whole is updated. This may be complemented by dcat:temporalResolution to give the rate at which items are added to a time-series. For example, a 15-minute time-series that is published daily could be described

<>
    a dcat:Dataset ;
    dct:accrualPeriodicity <http://purl.org/cld/freq/daily> ;
    dcat:temporalResolution "PT15M"^^xsd:duration ;
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants