Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discuss] Migrate to using HTML5 as the renderer #249

Closed
miketheman opened this issue Jul 31, 2022 · 4 comments · Fixed by #253
Closed

[Discuss] Migrate to using HTML5 as the renderer #249

miketheman opened this issue Jul 31, 2022 · 4 comments · Fixed by #253

Comments

@miketheman
Copy link
Member

We've been using docutils' html4css1 renderer since this library was introduced in 0.1.0 9 years ago, f3cde47#diff-dbec05995ebd89ac0dd19fa6b9c4e63006152a3d3ab2fe049711670eca1d3f4eR19

docutils added an HTML5 writer in 0.13.1 https://docutils.sourceforge.io/RELEASE-NOTES.html#release-0-13-1-2016-12-09 and has been improving it over time, and will be the default in docutils 2.0 (no date known yet).

Exploratory work surfaced that the changes are likely to be minimal in the code, and largely updating the test_rst_*.py files with the HTML5-equivalents and add the new ones to clean.py to allow through.


If this is completed, then the updates to the warehouse code would likely only be to add the new attributes to the existing CSS classes to support HTML4 and HTML5 directives indefinitely, as during the upload process the incoming RST is transformed and stored once so the rendered HTML ends up stored in the DB, so we'd want to support any pre-rendered HTML.
(Alternately, we could kick off a background task to re-render everything with the new code, which has other tradeoffs.)
This part of the conversation could move to the warehouse if the HTML5 work is undertaken.

@di
Copy link
Member

di commented Aug 8, 2022

I'd be in favor of this.

PyPI also runs the update_description_html task once every 5 minutes to re-render descriptions that have been rendered with older versions of this library, so kicking off a task to re-render everything would probably be unnecessary.

@dstufft
Copy link
Member

dstufft commented Aug 8, 2022

Just a note that task only re-renders ~1000 descriptions at a time. I still don't think it means that we need kick off a background task specifically for this, but the CSS will need to support both for awhile.

@miketheman
Copy link
Member Author

PyPI also runs the update_description_html task once every 5 minutes to re-render descriptions that have been rendered with older versions of this library, so kicking off a task to re-render everything would probably be unnecessary.

Thanks! I wonder why I didn't find that before 🤔 . Good to know it happens automatically already, so the existing processing power and setup is already in place 🎈

Just a note that task only re-renders ~1000 descriptions at a time. I still don't think it means that we need kick off a background task specifically for this, but the CSS will need to support both for awhile.

Makes total sense. Do we have any metrics today on the duration/completion of the task run, so as to be able to estimate "a while"? Back of napkin math, probably a few weeks, maybe a month or two?

@di
Copy link
Member

di commented Aug 10, 2022

warehouse=> select count(*) from release_descriptions;
  count
---------
 4095411

So at 1000 per 5m or ~250,000 per day, it takes ~16 days to re-render everything.

miketheman added a commit to miketheman/readme_renderer that referenced this issue Aug 16, 2022
Convert the writer class to use HTML5 semantics.
Update allowed tags and attributes for HTML5 to pass the bleach cycle.

Refs: pypi/warehouse#12080
Resolves pypa#249

Signed-off-by: Mike Fiedler <miketheman@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants