Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unrecognised ISO 8601 for BC years #2148

Closed
GreenfishK opened this issue Nov 5, 2022 · 6 comments
Closed

Unrecognised ISO 8601 for BC years #2148

GreenfishK opened this issue Nov 5, 2022 · 6 comments

Comments

@GreenfishK
Copy link

Following triple cannot be parsed

from rdflib import Graph
g = Graph()
g.parse(data='<http://dbpedia.org/resource/Cologne> <http://dbpedia.org/ontology/foundingYear> "-0038"^^<http://www.w3.org/2001/XMLSchema#gYear> .')

Error message

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#gYear, Converter=<function parse_date at 0x7f84578fd040>
Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.8/site-packages/rdflib/term.py", line 2084, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/opt/anaconda3/lib/python3.8/site-packages/isodate/isodates.py", line 203, in parse_date
    raise ISO8601Error('Unrecognised ISO 8601 date format: %r' % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '-0038'
<Graph identifier=N8a4fd05511ab4ad59496949f369abbee (<class 'rdflib.graph.Graph'>)>

According to ISO 8601 BC dates should be written with a minus, e.g. "-0038". So the input should be correct

@ghost
Copy link

ghost commented Nov 6, 2022

The Exception message identifies the isodate library as the cause:

isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '-0038'

There has been some discussion of helping with / taking over maintenance of isodate but there is also an alternative, metomi.isodatetime published and maintained by the UK Meteorological Office:

>>> tp = "-0038"
>>> import metomi.isodatetime.parsers as parse
>>> parse.TimePointParser().parse(tp)
<metomi.isodatetime.data.TimePoint: -003800-01-01T00:00:00Z>

@GreenfishK
Copy link
Author

I see, thank you!
Is there any alternative for the rdflib parser? Because how can I eventually parse a whole graph without getting this issue? Is there a setting/parameter for rdflib's to configure this or do I really need to change the modules locally?

@ghost
Copy link

ghost commented Nov 7, 2022

The in-code documentation for isodate acknowledges a lack of support for dates before "0000-01-01"

The only limitations it has, are given by the Python datetime.date implementation, which does not support dates before 0001-01-01.

At present, Literal values are implemented as Python datatypes ...

>>> Literal("0001", datatype=XSD.gYear)
Literal('0001', datatype=URIRef('http://www.w3.org/2001/XMLSchema#gYear'))
>>> Literal("0001", datatype=XSD.gYear).value
datetime.date(1, 1, 1)

So, given that the basic limitation is Python itself and because integrating/switching to metomi.isodatetime would entail a fundamental change to the way Literal values are handled, it would likely be considered out of scope. After more detailed investigation, it would seem my mentioning it was ill-conceived.

All I can suggest is that if you want to work with this dataset in RDFLib/Python, you pre-filter the statements, remapping any offending negative XSD.gYear instances to XSD.integer.

@GreenfishK
Copy link
Author

Ok I see, thank you! Then this issue has nothing to do with rdflib and can be closed

@niklasl
Copy link
Member

niklasl commented Nov 9, 2022

Wouldn't you say that this still is issue with RDFLib, as it is not being conformant with ISO 8601? It is so through its choice of implementation, and isodate should be fixed (or RDFLib should switch underlying implementation). But that does not give RDFLib license to throw errors when parsing valid RDF containing correctly typed and formatted BCE dates.

@GreenfishK
Copy link
Author

Wouldn't you say that this still is issue with RDFLib, as it is not being conformant with ISO 8601? It is so through its choice of implementation, and isodate should be fixed (or RDFLib should switch underlying implementation). But that does not give RDFLib license to throw errors when parsing valid RDF containing correctly typed and formatted BCE dates.

I see your point and you are right. However, I think that I simply should raise this issue again on isodate's github page because I think that way it will be solved faster as date formats are probably not rdflib's priority.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants