Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify that htmltools::htmlPreserve() should use the pandoc raw attribute #1965

Merged
merged 2 commits into from
Dec 1, 2020

Conversation

jjallaire
Copy link
Member

Rather than using preservation tokens (only when pandoc >= v2.0, which is required for the raw attribute)

Note that this option will have the intended effect only for versions of htmltools >= 0.5.0.9003.

This causes 1 test to fail b/c it's testing for htmlPreserve output that no longer conforms. The test will pass/fail depending on which version of htmltools is installed, so we may want to make it conditional.

cc @jcheng5 @cpsievert

…tribute rather than preservation tokens when pandoc >= v2.0.

Note that this option will have the intended effect only for versions of htmltools >= 0.5.0.9003.

This causes 1 test to fail b/c it's testing for htmlPreserve output that no longer conforms. The test will pass/fail depending on which version of htmltools is installed, so we may want to make it conditional.
Copy link
Member

@yihui yihui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me!

@jjallaire
Copy link
Member Author

Addressed the failing test by skipping it if htmltools >= v0.5.0.9003 (since in that case there is nothing to test b/c we won't be using the traditional token-based htmlPreserve)

@jjallaire
Copy link
Member Author

@kevinushey We now take advantage of a new htmltools option to preserve html using pandoc raw attributes rather than tokens (this among other things includes more of the actual output in keep_md, will be faster for larger documents/widgets, and enables pandoc filters to see the actual content rather than just a preservation token). Here is the code in htmltools: https://github.com/rstudio/htmltools/blob/master/R/tags.R#L1063-L1080

Does this have any implications for the R Notebook in RStudio or does it have a parallel universe that it fully controls?

@cderv
Copy link
Collaborator

cderv commented Nov 27, 2020

First a question: is the aim to replace all occurrences of <!--html_preserve--> ?
We are still using it for shiny prerendered

shiny_prerendered_append_context <- function(con, name, code) {
lines <- c('<!--html_preserve-->',
paste0('<script type="application/shiny-prerendered" ',
'data-context="', name ,'">'),
# escape code, see https://github.com/rstudio/rmarkdown/issues/943
gsub("</", "<\\u002f", code, fixed = TRUE),
'</script>',
'<!--/html_preserve-->')
writeLines(lines, con = con)
}

Maybe we should use htmltools::htmlPreserve here too to take advantage of raw attributes ? Or is this different usage?


On the notebook side, I know we are using htmltools::extractPreserveChunks and htmltools::restorePreserveChunks in notebook_render_html_widget() to modify the preserved content (since #1799). It is used to modify render chunk option hook for htmlwidget.
With new dev version of htmltools, how would that work ? Shouldn't this functions support also extraction when preserved by raw attributes ? Currently it won't and we would get something like that

<!-- rnb-htmlwidget-begin <base64-encoded-meta> -->
```{=html}
<div id="htmlwidget-dd56a28903c66375acb6">
(...)
</div>
<script type=\"application/json\" data-for=\"htmlwidget-dd56a28903c66375acb6\">
(...)
</script>
```
<!-- rnb-htmlwidget-end -->

I think this is why the test is failing because notebook annotations for htmlwidgets are expecting currently htmlwidget output using token-based htmlPreserve, to add some <!-- rnb-htmlwidget- tokens before and after the initial preserved content. With the new behavior, raw attributes are still here - so the test is failing.

Depending on what we need for html_notebook to work (maybe the fix in #1799 was not correct), we can modify the test to handle the new behavior too or we need to adapt notebook_render_html_widget() (or extraction function in htmltools).

@atusy
Copy link
Collaborator

atusy commented Nov 27, 2020

I think this will break some reverse dependencies such as flextable and gt.
They internally suspend htmlPreserve to enable bookdown cross-reference.
However, they will fail with this PR.

Error in extract(input_str) : Invalid nesting of html_preserve directives

https://github.com/rstudio/gt/pull/689/files#diff-0520db502ed64622ee643d0a7069b0ee53dc11dd7e7d74ba4fed48ddf400d040R180-R184

https://github.com/davidgohel/flextable/pull/205/files#diff-d6c591de630faf206d4ebf004627d738a92c808a424f916ea3af4fd84407e9caR15-R17

reprex

---
title: Untitled
output:
  bookdown::html_document2:
    keep_md: true
---

```{r foo}
library(flextable)
ft <- flextable(data.frame(x=1))
set_caption(ft, 'foo')
```

```{r bar}
gt::gt(data.frame(x=1), caption='bar')
```

@jjallaire
Copy link
Member Author

@kevinushey Will understand that notebook case better than I do -- let's wait to hear from him on that.

AFAIK it would be fine for the shiny prerendered code to also call htmltools::htmlPreserve (it's really just doing that get around pandoc's treatment of >= 4-space indented HTML, which ```{=html} will also accomplish just the same).

@jjallaire
Copy link
Member Author

Yes, those hacks are quite unfortunate! Happily there is a way out, in gt we now only do the hack if we know that ```{=html} isn't being used: rstudio/gt@bae32f4

flextable would need a similar conditional check here: https://github.com/davidgohel/flextable/pull/205/files#diff-d6c591de630faf206d4ebf004627d738a92c808a424f916ea3af4fd84407e9caR14

We certainly don't want to cause breakage but using ```{=html} is much cleaner and exposes more of the document to pandoc filters (which currently can't "see" the raw html b/c it's been preserved).

@atusy
Copy link
Collaborator

atusy commented Nov 27, 2020

@jjallaire Great to have a workaround! Thanks! I definitely agree Pandoc's raw attribute is the better solution in general.

@jjallaire
Copy link
Member Author

Interestingly, the original motivation for this was to make it easier for bookdown to cross reference tables expressed as raw html. While it's great that gt and flextable found an ingenious workaround, we definitely don't want to require every R function that emits a table to do the same.

cc @davidgohel

@atusy
Copy link
Collaborator

atusy commented Nov 27, 2020

@jjallaire Yes. I also had the same motivation but gave up when I implemented the cross referencing feature in flextable. Thanks again.

@kevinushey
Copy link
Contributor

kevinushey commented Nov 30, 2020

I'm trying to recall why notebook_render_html_widget() does this dance with the preserve chunk headers in the first place. It looks like it ultimately:

  1. Retrieves the HTML content within an <!--html_preserve--> chunk,
  2. Wraps that HTML output with our own Notebook HTML comment headers + footers,
  3. Re-preserves the newly-annotated Notebook output.

My only guess is that the intention here was to ensure that the R Notebook comment headers + footers could live within the <!--html_preserve--> fence, but that ultimately seems unnecessary? (For what it's worth, I think this is a behavior that is actually changed by #1799, since this old line would've basically "moved" the preserved HTML to include the Notebook comments:

https://github.com/rstudio/rmarkdown/pull/1799/files#diff-67d1332c1085b3582a60665bec781eab9ac30c53c00ac68b4a479ffc58237910L16

Whereas the change now restores the old preserved chunk as-is:

https://github.com/rstudio/rmarkdown/pull/1799/files#diff-67d1332c1085b3582a60665bec781eab9ac30c53c00ac68b4a479ffc58237910R11-R12

All that said ... it looks like this ultimately works regardless of how we try to arrange the HTML comment fences here. In addition, since these hooks just modify the generated Markdown (which is then later processed by Pandoc) it should be totally fine for the new-style Pandoc raw blocks to be left as-is.

tl;dr: I think (but haven't confirmed) we might actually be able to get rid of the whole preserve-unpreserve dance in notebook_render_html_widget(), and instead just explicitly wrap the entire HTML content with our Notebook comments.

@jjallaire
Copy link
Member Author

@kevinushey Thanks for taking a closer look. It sounds like while there may be some related changes to make in notebook_render_html_widget() this PR is fine to merge. I'll do that, LMK if there is follow-up work we should undertake once any other related changes are committed.

@jjallaire jjallaire merged commit d29ba32 into master Dec 1, 2020
@jjallaire jjallaire deleted the feature/html-preserve-raw branch December 1, 2020 11:26
yihui added a commit that referenced this pull request Jan 19, 2021
…en this option has been set, otherwise it is impossible for other packages to turn this option off, e.g., yihui/xaringan#293
jonathan-g added a commit to jonathan-g/rmarkdown that referenced this pull request Jan 25, 2021
Merge branch 'master' of https://github.com/rstudio/rmarkdown

* 'master' of https://github.com/rstudio/rmarkdown: (111 commits)
  get rid of names, otherwise the returned vector will have names like this:
  Only run testrmd if pandoc is available
  Warn that cropping is disabled if pdfcrop or ghostscript are not found. (rstudio#2017)
  Add shiny to remotes (rstudio#2014)
  file.path.ci is too loose with file matching (rstudio#2012)
  improve error messages for the new theme argument behavior
  Add Bootstrap 4 support (rstudio#1706)
  amend rstudio#1965: do not force `options(htmltools.preserve.raw = TRUE)` when this option has been set, otherwise it is impossible for other packages to turn this option off, e.g., yihui/xaringan#293
  htmltools 0.5.1 is on CRAN now
  remove download stats
  use on.exit to remove file (rstudio#2001)
  use only the TinyTeX-1 version in CI (rstudio#1998)
  Cache should work on Windows with recent version of R (rstudio#1997)
  exclude renv folder from render_site() copied resources (rstudio#1996)
  Update CI to last Pandoc 2.11.3.1 (rstudio#1992)
  Revert "return div not nil for non-qualifying latex div"
  Add missing NEWS item for v2.6 release
  return div not nil for non-qualifying latex div
  Add `output_format_filter` function to `default_site_generator()`. (rstudio#1989)
  Improvements to latex-div (rstudio#1984)
  ...

# Conflicts:
#	NEWS.md
#	R/render.R
#	man/render.Rd
jonathan-g added a commit to jonathan-g/rmarkdown that referenced this pull request Jan 25, 2021
* master: (111 commits)
  get rid of names, otherwise the returned vector will have names like this:
  Only run testrmd if pandoc is available
  Warn that cropping is disabled if pdfcrop or ghostscript are not found. (rstudio#2017)
  Add shiny to remotes (rstudio#2014)
  file.path.ci is too loose with file matching (rstudio#2012)
  improve error messages for the new theme argument behavior
  Add Bootstrap 4 support (rstudio#1706)
  amend rstudio#1965: do not force `options(htmltools.preserve.raw = TRUE)` when this option has been set, otherwise it is impossible for other packages to turn this option off, e.g., yihui/xaringan#293
  htmltools 0.5.1 is on CRAN now
  remove download stats
  use on.exit to remove file (rstudio#2001)
  use only the TinyTeX-1 version in CI (rstudio#1998)
  Cache should work on Windows with recent version of R (rstudio#1997)
  exclude renv folder from render_site() copied resources (rstudio#1996)
  Update CI to last Pandoc 2.11.3.1 (rstudio#1992)
  Revert "return div not nil for non-qualifying latex div"
  Add missing NEWS item for v2.6 release
  return div not nil for non-qualifying latex div
  Add `output_format_filter` function to `default_site_generator()`. (rstudio#1989)
  Improvements to latex-div (rstudio#1984)
  ...
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 1, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants