-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix encoding in website's home #185
Comments
These are the characters that seem to get messed up (non-breaking space and curly quotes)...
Seems like when README.md gets processed into README.Rmd, those characters are converted into something appropriate. But when converted to index.html, they get converted improperly. So that points to the gh-pages process, where a virtual server is spun-up and the |
Locally, I can run |
Thanks a lot! That brings me some ideas to fix it. You did enough. Right now I build the site on github actions. I think the action used macos and I changed it to ubuntu. I see the problem locally on my ubuntu. So I may fix quickly by building on macos, then think for a solution on ubuntu. |
We discussed on slack and these are some notes:
|
I did a bunch of hunting around and... I'm fairly certain this is caused by a bug in
text <- "<body>brûlée 鬼 test 'stuff' and ‘PACTA’ 2°C €</body>"
f <- tempfile()
utf8 <- enc2utf8(text)
con <- file(f, open = "w+", encoding = "native.enc")
writeLines(utf8, con = con, useBytes = TRUE)
close(con)
readLines(f, encoding = "UTF-8")
#> [1] "<body>brûlée 鬼 test 'stuff' and ‘PACTA’ 2°C €</body>"
xml2::read_html(text)
#> {html_document}
#> <html>
#> [1] <body>brûlée 鬼 test 'stuff' and ‘PACTA’ 2°C €</body>
xml2::read_html(f)
#> {html_document}
#> <html>
#> [1] <body>brûlée 鬼 test 'stuff' and â\u0080\u0098PACTAâ\u0080\u0099 2°C ...
unlink(f) text <- "<body>brûlée 鬼 test 'stuff' and ‘PACTA’ 2°C €</body>"
xml2::write_html(xml2::read_html(text), 'test.html')
xml2::read_html('test.html')
#> {html_document}
#> <html>
#> [1] <body>brûlée 鬼 test 'stuff' and â\u0080\u0098PACTAâ\u0080\u0099 2°C ...
readLines('test.html', encoding = "UTF-8")
#> [1] "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">"
#> [2] "<html><body>brûlée 鬼 test 'stuff' and ‘PACTA’ 2°C €</body></html>" |
and this is my pretty minimal reprex of this issue... install.packages('usethis')
install.packages('pkgdown')
usethis::create_package(getwd(), fields = NULL, rstudio = FALSE, open = FALSE)
usethis::use_readme_md(open = FALSE)
write("brûlée 鬼 test 'stuff' and ‘PACTA’ 2°C €", file = "README.md", append = TRUE)
usethis::use_pkgdown()
pkgdown::build_home(preview = TRUE) On my macOS, the resulting > pkgdown::build_home(preview = TRUE)
-- Building home ---------------------------------------------------------------
Writing 'authors.html'
UTF-8 decoding error in C:/Users/cjyetman/Documents/test4/README.md at byte offset 390 (fb).
The input must be a UTF-8 encoded text.
Error: pandoc document conversion failed with error 92
Error: [ENOENT] Failed to remove 'C:/Users/cjyetman/AppData/Local/Temp/RtmpSmRuLn/file71036f51e07.html': no such file or directory 🤷♂ |
here's an even more minimal reprex that still mangles the '...' in the example README that it creates when run on RStudio Cloud... usethis::create_package(getwd(), fields = NULL, rstudio = FALSE, open = FALSE)
usethis::use_readme_md(open = FALSE)
pkgdown::build_home(preview = TRUE) |
That's awesome! Thanks CJ! Do you plan to open an issue in xml2? (I reopen because I closed unintentionally via a sloppy use of the word "fix" in a commit message.) |
If you have the bandwidth to do it, please feel free to use this reprex. Not sure if/when I'll get around to it. Feel like I maxed out my time to screw around with this today for the next week or so. 😉 |
actually looks like it's a regression in xml2 v1.3.0 and it's already been reported... |
Best case scenario ;) |
Looks like this is probably fixed in r-lib/xml2@6543857 and on CRAN already xml2 v131 |
https://2degreesinvesting.github.io/r2dii.match/
Relates to https://github.com/2DegreesInvesting/r2dii.data/issues/36
The text was updated successfully, but these errors were encountered: