plain text files with Unicode characters are mis-rendered #1275
Labels
merritt
Tech Debt
transition
activities involved in transition of Dryad architecture to Dryad environment
Milestone
Bug description:
I'll be honest with you, I'm not entirely sure this is a Dryad bug. Still, it's problematic behavior. A plain text file that is part of a deposit (e.g., a README.txt) that is viewed on the Dryad website is not rendered properly by the web browser if the file contains Unicode characters, even if the file is UTF-8 encoded. This appears to be because the MIME type Dryad sends back is just text/plain, not text/plain;encoding=UTF-8. I can see that Dryad doesn't necessarily know what the charater encoding is, and so can't necessarily claim that the encoding is UTF-8. And I'm a little mystified as to why web browsers (at least, both Safari and Firefox on Macos Catalina) don't assume UTF-8 if no encoding is specified, but rather assume ASCII.
Steps to reproduce:
Take a look at 10.25349/D99W4T, say, the README.txt file in the May 17 version. You'll see something like "1A-18 double" where the A has a bar over it and the dash is actually an endash. It should be "1x18 double" where the x is a Unicode multiplication sign.
**Expected behavior: **
Such files are rendered as being UTF-8 encoded.
The text was updated successfully, but these errors were encountered: