Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quench nt test userwarn #1500

Merged
merged 4 commits into from Dec 15, 2021
Merged

Quench nt test userwarn #1500

merged 4 commits into from Dec 15, 2021

Conversation

ghost
Copy link

@ghost ghost commented Dec 12, 2021

Fed up with seeing Serializer Userwarnings in the test output. Is the latin-1 format testing anything specific? Can it be simply swapped out for utf-8 or am I missing something? (I'm not strong on the parse/serializer side of things).

@aucampia
Copy link
Member

Can it be simply swapped out for utf-8 or am I missing something? (I'm not strong on the parse/serializer side of things).

If we are not specifically testing character sets support and handing of passing invalid char sets then I think valid charsets (e.g. utf-8 in this case ) is more appropriate than charsets that cause warnings (e.g. ascii/1latin-1` in this case). So I think this change makes a lot of sense.

There should be tests to check the handling of passing of invalid charsets but they should be in a separate place and not mixed in with tests for something different, and they could (should?) even go as far as expecting the warnings. And I also think we should rather error out in some of these cases, especially if the internal handling is to just use utf-8 - I guess there are different approaches here but I think erroring out when an unsupported charset is supplied will result in less surprises for users than warning would. Though this is something we can consider as a seperate issue and discuss potentially for future major versions as this will break compatibility and warrant a major version bump.

I made a bunch of round trip tests that I still have to break off from #1418 that cover character sets, and tests for handling of invalid charsets could maybe go into something like that.

@nicholascar
Copy link
Member

nicholascar commented Dec 15, 2021

In all the 1.1 versions of the W3C Specifications we see UTF-8 mandated, e.g.:

So I think we can always use UTF-8. I wondered if there were any good reasons this wasn't already the case but presumably it's just for the historical reasons of different character support in Python < 3 and perhaps in W3C Specs < 1.1.

@nicholascar nicholascar merged commit eb151d3 into RDFLib:master Dec 15, 2021
@ghost ghost deleted the quench-nt-test-userwarn branch December 28, 2021 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants