Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plain text files with Unicode characters are mis-rendered #1275

Closed
gjanee opened this issue May 20, 2021 · 2 comments
Closed

plain text files with Unicode characters are mis-rendered #1275

gjanee opened this issue May 20, 2021 · 2 comments
Assignees
Labels
merritt Tech Debt transition activities involved in transition of Dryad architecture to Dryad environment
Milestone

Comments

@gjanee
Copy link

gjanee commented May 20, 2021

Bug description:

I'll be honest with you, I'm not entirely sure this is a Dryad bug. Still, it's problematic behavior. A plain text file that is part of a deposit (e.g., a README.txt) that is viewed on the Dryad website is not rendered properly by the web browser if the file contains Unicode characters, even if the file is UTF-8 encoded. This appears to be because the MIME type Dryad sends back is just text/plain, not text/plain;encoding=UTF-8. I can see that Dryad doesn't necessarily know what the charater encoding is, and so can't necessarily claim that the encoding is UTF-8. And I'm a little mystified as to why web browsers (at least, both Safari and Firefox on Macos Catalina) don't assume UTF-8 if no encoding is specified, but rather assume ASCII.

Steps to reproduce:

Take a look at 10.25349/D99W4T, say, the README.txt file in the May 17 version. You'll see something like "1A-18 double" where the A has a bar over it and the dash is actually an endash. It should be "1x18 double" where the x is a Unicode multiplication sign.

**Expected behavior: **

Such files are rendered as being UTF-8 encoded.

@sfisher
Copy link
Contributor

sfisher commented Mar 13, 2023

This is an interesting issue. The files are served by Merritt from S3 presigned URLs. Maybe there is something they could add for defaulting to UTF-8 when presigning URLs on AWS.

@ryscher ryscher added Tech Debt transition activities involved in transition of Dryad architecture to Dryad environment labels Nov 7, 2023
@ryscher ryscher added this to the Sprint 104 milestone Nov 7, 2023
@ryscher ryscher modified the milestones: Sprint 104, Sprint 105 Nov 20, 2023
@ryscher ryscher moved this from Backlog to In progress in Dryad Product Board Nov 22, 2023
@sfisher
Copy link
Contributor

sfisher commented Dec 4, 2023

It appears that these files are now opening as attachments rather than rendering in-page as part of the transition away from Merritt. I tried in Firefox, Chrome and Safari on MacOS and they all simply download or download and open with the system viewer in the case of Safari (which I'm guessing is a Mac setting).

I believe they look ok, though maybe I missed something and didn't look at every character.

I added #2993 which relates to this and previews for now.

@sfisher sfisher moved this from In progress to Review in Dryad Product Board Dec 4, 2023
@ryscher ryscher closed this as completed Dec 6, 2023
@github-project-automation github-project-automation bot moved this from Review to Completed in Dryad Product Board Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merritt Tech Debt transition activities involved in transition of Dryad architecture to Dryad environment
Projects
None yet
Development

No branches or pull requests

4 participants