Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make serialize() on a CONSTRUCT result act like normal g.serialize() #1338

Closed
nicholascar opened this issue Jun 11, 2021 · 5 comments
Closed
Labels

Comments

@nicholascar
Copy link
Member

Currently (RDFlib 6.0.0a0 AKA master), g.serialize() defaults to turtle format and is decoded, however calling serialize() on a SPARQL CONSTRUCT result (i.e. on a Result object) returns the RDFlib 5.0.0 form of an RDF/XML, UTF-8-encoded result.

The old style serialize() in Result should be updated to match the main Graph() serialize().

@aucampia
Copy link
Member

One slight conundrum here is that the default format was "xml", and this was valid for both tabular and graph results. I'm working on this now and I think the best solution is to default format to None instead, and then if None, and result is graph, use turtle, otherwise use txt.

Feedback on the idea will be welcome, but hopefully I will have a patch soon.

@nicholascar
Copy link
Member Author

nicholascar commented Sep 17, 2021

the best solution is to default format to None instead, and then if None, and result is graph, use turtle, otherwise use txt.

Sounds about right!

And I hope to introduce optional outputs to PANDAS Dataframes across all serialisation options soon as people often want to export results to PANDAS. However such an option will require PANDAS to be imported so has to be one of those optional, additional imports that I don’t quite know how to do but that I’ve seen elsewhere: import rdflib[pandas] or similar.

@aucampia
Copy link
Member

aucampia commented Sep 21, 2021

I have been working on this, but I'm at somewhat of an impasse here, because the serializers are actually quite inconsistent.

Currently the rdf serializers treat stream destinations as IO[str] (jsonld, n3, nquads, nt, rdfxml) while result Result serializers seems like they will work for both IO[str] and IO[bytes], except for txtresults, which will only work for IO[str]. I think the best solution is to make ResultSerializers work for both IO[str] and IO[bytes]. I will use type hints to prohibit the use of encoding when IO[str] is supplied, and mandate encoding when IO[bytes] is supplied.

Any feedback on the matter will be appreciated.

@nicholascar
Copy link
Member Author

I think we are seeing the result of a poor transition from Py2 to Py3 here. Presumably I and others missed serialization options for things other than the RDF serializers, i.e. ResultSerializers, and there may be others.

If any consistency can be achieved here, that would be great. I guess we'll then just have to look for other serializers too, perhaps the extra CVS ones, Trig, TriX etc.

@aucampia
Copy link
Member

aucampia commented Sep 26, 2021

I have done some digging to make sense of the IO types in python, and thought a bit how to actually deal with this situation.

Some related inquiries and issues, mostly related to typing:

I think if there was a clean slate, the best option would have been to only accept BinaryIO like buffers (i.e. io.RawIOBase or io.BufferedIOBase), with optional encoding, and if no encoding is supplied, and if the serializer supports multiple encodings, default to system preferred encoding (similar to what TextIOWrapper does).

However, given that some ResultSerializers work with BinaryIO, and some with TextIO, this will break compatibility, so probably the best compromise to maintain interface is for ResultSerializer.serialize() to accept both BinaryIO with optional encoding, and TextIO without any encoding. And then when ResultSerializer.serialize() defers to Graph.serialize(), just use TextIO.buffer and TextIO.encoding and pass that to Graph.serialize().

Further I will also try and ensure the default encoding is utf-8 throughout all encoders.

For formats that only allow one encoding (turtle) I think we should reconsider the behaviour if unsupported encodings are requested. Either we should always support arbitrary encodings, or always raise an exception if an unsupported encoding is supported. The current approach is to warn if an unsupported encoding is requested, and this may result in unexpected behaviour. I will however defer changing this for now, as it can be dealt with in another PR.

@ghost ghost locked and limited conversation to collaborators Dec 27, 2021
@ghost ghost converted this issue into discussion #1612 Dec 27, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants