Skip to content

Commit

Permalink
#11113: add a new "html5" dictionary containing the named character r…
Browse files Browse the repository at this point in the history
…eferences defined by the HTML5 standard and the equivalent Unicode character(s) to the html.entities module.
  • Loading branch information
ezio-melotti committed Jun 24, 2012
1 parent b698d8e commit dc44f55
Show file tree
Hide file tree
Showing 3 changed files with 2,259 additions and 2 deletions.
21 changes: 19 additions & 2 deletions Doc/library/html.entities.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,25 @@

--------------

This module defines three dictionaries, ``name2codepoint``, ``codepoint2name``,
and ``entitydefs``. ``entitydefs`` is used to provide the :attr:`entitydefs`
This module defines four dictionaries, :data:`html5`,
:data:`name2codepoint`, :data:`codepoint2name`, and :data:`entitydefs`.
:data:`entitydefs` is used to provide the :attr:`entitydefs`
attribute of the :class:`html.parser.HTMLParser` class. The definition provided
here contains all the entities defined by XHTML 1.0 that can be handled using
simple textual substitution in the Latin-1 character set (ISO-8859-1).


.. data:: html5

A dictionary that maps HTML5 named character references [#]_ to the
equivalent Unicode character(s), e.g. ``html5['gt;'] == '>'``.
Note that the trailing semicolon is included in the name (e.g. ``'gt;'``),
however some of the names are accepted by the standard even without the
semicolon: in this case the name is present with and without the ``';'``.

.. versionadded:: 3.3


.. data:: entitydefs

A dictionary mapping XHTML 1.0 entity definitions to their replacement text in
Expand All @@ -30,3 +42,8 @@ simple textual substitution in the Latin-1 character set (ISO-8859-1).
.. data:: codepoint2name

A dictionary that maps Unicode codepoints to HTML entity names.


.. rubric:: Footnotes

.. [#] See http://www.w3.org/TR/html5/named-character-references.html
Loading

0 comments on commit dc44f55

Please sign in to comment.