Text shaping (#820)

Co-authored-by: Lucas Cimon <925560+Lucas-C@users.noreply.github.com> Co-authored-by: Anderson HerzogenrathDaCosta <anderson.costa@cn.ca>
py-pdf · Aug 2, 2023 · b671cb6 · b671cb6
1 parent dface79
commit b671cb6
Showing 71 changed files with 761 additions and 187 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -21,6 +21,7 @@ This can also be enabled programmatically with `warnings.simplefilter('default',
 - [`FPDF.mirror()`](https://pyfpdf.github.io/fpdf2/fpdf/fpdf.html#fpdf.fpdf.FPDF.mirror) - New method: [documentation page](https://pyfpdf.github.io/fpdf2/Transformations.html) - Contributed by @sebastiantia
 - [`FPDF.table()`](https://pyfpdf.github.io/fpdf2/fpdf/fpdf.html#fpdf.fpdf.FPDF.table): new optional parameters `gutter_height`, `gutter_width` and `wrapmode`. Links can also be added to cells by passing a `link` parameter to [`Row.cell()`](https://pyfpdf.github.io/fpdf2/fpdf/table.html#fpdf.table.Row.cell)
 - [`FPDF.multi_cell()`](https://pyfpdf.github.io/fpdf2/fpdf/fpdf.html#fpdf.fpdf.FPDF.multi_cell): has a new optional `center` parameter to position the cell horizontally at the center of the page
+- [`FPDF.set_text_shaping()`](https://pyfpdf.github.io/fpdf2/fpdf/fpdf.html#fpdf.fpdf.FPDF.set_text_shaping): new method to perform text shaping using Harfbuzz - [documentation](https://pyfpdf.github.io/fpdf2/TextShaping.html).
 - Added tutorial in Khmer language: [ភាសខ្មែរ](https://pyfpdf.github.io/fpdf2/Tutorial-km.html) - thanks to @kuth-chi
 - Added tutorial in [日本語](https://pyfpdf.github.io/fpdf2/Tutorial-ja.html) - thanks to @alcnaka
 - Better documentation & errors when facing HTML rendering limitations for `<table>` tags: <https://pyfpdf.github.io/fpdf2/HTML.html>

diff --git a/docs/TextShaping.md b/docs/TextShaping.md
@@ -0,0 +1,84 @@
+# Text Shaping #
+
+## What is text shaping? ##
+Text shaping is a fundamental process in typography and computer typesetting that influences the aesthetics and readability of text in various languages and scripts. It involves the transformation of Unicode text into glyphs, which are then positioned for display or print. 
+
+For texts in latin script, text shaping can improve the aesthetics by replacing characters that would colide or overlap by a single glyph specially crafted to look harmonious.
+
+![](text-shaping-ligatures.png)
+
+This process is especially important for scripts that require complex layout, such as Arabic or Indic scripts, where characters change shape depending on their context.
+
+There are three primary aspects of text shaping that contribute to the overall appearance of the text: kerning, ligatures, and glyph substitution.
+
+
+### Kerning ###
+Kerning refers to the adjustment of space between individual letter pairs in a font. This process is essential to avoid awkward gaps or overlaps that may occur due to the default spacing of the font. By manually or programmatically modifying the kerning, we can ensure an even and visually pleasing distribution of letters, which significantly improves the readability and aesthetic quality of the text.
+
+![](text-shaping-kerning.png)
+
+
+### Ligatures ###
+Ligatures are special characters that are created by combining two or more glyphs. This is frequently used to avoid collision between characters or to adhere to the typographic traditions. For instance, in English typography, the most common ligatures are "fi" and "fl", which are often fused into single characters to provide a more seamless reading experience.
+
+
+### Glyph Substitution ###
+Glyph substitution is a mechanism that replaces one glyph or a set of glyphs with one or more alternative glyphs. This is a crucial aspect of text shaping, especially for complex scripts where the representation of a character can significantly vary based on its surrounding characters. For example, in Arabic script, a letter can have different forms depending on whether it's at the beginning, middle, or end of a word.
+
+Another common use of glyph substitution is to replace a sequence of characters by a symbol that better represent the meaning of those characters on a specialized context (mathematical, programming, etc.).
+
+![](text-shaping-substitution.png)
+
+
+
+
+## Usage ##
+Text shaping is disabled by default to keep backwards compatibility, reduce resource requirements and not make uharfbuzz a hard dependency.
+
+If you want to use text shaping, the first step is installing the uharfbuzz package via pip.
+
+```python
+pip install uharfbuzz
+```
+
+⚠️ Text shaping is *not* available for type 1 fonts.
+
+### Basic usage ###
+The method `set_text_shaping()` is used to control text shaping on a document. The only mandatory argument, `use_shaping_engine` can be set to `True` to enable the shaping mechaning or `False` to disable it.
+
+```python
+pdf = FPDF()
+pdf.add_page()
+pdf.add_font(family="ViaodaLibre", fname=HERE / "ViaodaLibre-Regular.ttf")
+pdf.set_font("ViaodaLibre", size=40)
+pdf.set_text_shaping(True)
+pdf.cell(txt="final soft stuff")
+pdf.output("Example.pdf")
+```
+
+### Features ###
+On most languages, Harfbuzz enables all features by default. If you want to enable or disable a specific feature you can pass a dictionary containing the 4 digit OpenType feature code as key and a boolean value to indicate if it should be enabled or disable.
+
+Example:
+```python
+pdf.set_text_shaping(use_shaping_engine=True, features={"kern": False, "liga": False})
+```
+
+The full list of OpenType feature codes can be found [here](https://learn.microsoft.com/en-us/typography/opentype/spec/featuretags)
+
+### Additional options ###
+To perform the text shaping, harfbuzz needs to know some information like the language and the direction (right-to-left, left-to-right, etc) in order to apply the correct rules. Those information can be guessed based on the text being shaped, but you can also set the information to make sure the correct rules will be applied.
+
+Examples:
+```python
+pdf.set_text_shaping(use_shaping_engine=True, direction="rtl", script="arab", language="ara")
+```
+```python
+pdf.set_text_shaping(use_shaping_engine=True, direction="ltr", script="latn", language="eng")
+```
+
+Direction can be `ltr` (left to right) or `rtl` (right to left). The `ttb` (top to bottom) and `btt` (bottom to top) directions are not supported by fpdf2 for now.
+
+[Valid OpenType script tags](https://learn.microsoft.com/en-us/typography/opentype/spec/scripttags)
+
+[Valid OpenType language codes](https://learn.microsoft.com/en-us/typography/opentype/spec/languagetags)
diff --git a/docs/text-shaping-kerning.png b/docs/text-shaping-kerning.png
diff --git a/docs/text-shaping-ligatures.png b/docs/text-shaping-ligatures.png
diff --git a/docs/text-shaping-substitution.png b/docs/text-shaping-substitution.png
diff --git a/fpdf/fonts.py b/fpdf/fonts.py
diff --git a/fpdf/fpdf.py b/fpdf/fpdf.py
@@ -86,10 +86,7 @@ class Image:
 from .svg import Percent, SVGObject
 from .syntax import DestinationXYZ, PDFDate
 from .table import Table
-from .util import (
-    escape_parens,
-    get_scale_factor,
-)
+from .util import get_scale_factor
 
 # Public global variables:
 FPDF_VERSION = "2.7.4"
@@ -572,6 +569,76 @@ def set_display_mode(self, zoom, layout="continuous"):
             raise FPDFException(f"Incorrect zoom display mode: {zoom}")
         self.page_layout = LAYOUT_ALIASES.get(layout, layout)
 
+    # Disabling this check - importing outside toplevel to check module is present
+    # pylint: disable=import-outside-toplevel, unused-import
+    def set_text_shaping(
+        self,
+        use_shaping_engine: bool = True,
+        features: dict = None,
+        direction: str = None,
+        script: str = None,
+        language: str = None,
+    ):
+        """
+        Enable or disable text shaping engine when rendering text.
+        If features, direction, script or language are not specified the shaping engine will try
+        to guess the values based on the input text.
+
+        Args:
+            use_shaping_engine: enable or disable the use of the shaping engine to process the text
+            features: a dictionary containing 4 digit OpenType features and whether each feature
+                should be enabled or disabled
+                example: features={"kern": False, "liga": False}
+            direction: the direction the text should be rendered, either "ltr" (left to right)
+                or "rtl" (right to left).
+            script: a valid OpenType script tag like "arab" or "latn"
+            language: a valid OpenType language tag like "eng" or "fra"
+        """
+        if use_shaping_engine:
+            try:
+                import uharfbuzz
+            except ImportError as exc:
+                raise FPDFException(
+                    "The uharfbuzz package could not be imported, but is required for text shaping. Try: pip install uharfbuzz"
+                ) from exc
+        else:
+            self._text_shaping = None
+            return
+        #
+        # Features must be a dictionary contaning opentype features and a boolean flag
+        # stating wether the feature should be enabled or disabled.
+        #
+        # e.g. features={"liga": True, "kern": False}
+        #
+        # https://harfbuzz.github.io/shaping-opentype-features.html
+        #
+
+        if features and not isinstance(features, dict):
+            raise FPDFException(
+                "Features must be a dictionary. See text shaping documentation"
+            )
+        if not features:
+            features = {}
+
+        # Buffer properties (direction, script and language)
+        # if the properties are not provided, Harfbuzz "guessing" logic is used.
+        # https://harfbuzz.github.io/setting-buffer-properties.html
+        # Valid harfbuzz directions are lrt (left to right), rtl (right to left),
+        # ttb (top to bottom) or btt (bottom to top)
+
+        if direction and direction not in ("ltr", "rtl"):
+            raise FPDFException(
+                "FPDF2 only accept ltr (left to right) or rtl (right to left) directions for now."
+            )
+
+        self._text_shaping = {
+            "use_shaping_engine": True,
+            "features": features,
+            "direction": direction,
+            "script": script,
+            "language": language,
+        }
+
     @property
     def page_layout(self):
         return self._page_layout
@@ -2316,20 +2383,10 @@ def text(self, x, y, txt=""):
         if not self.font_family:
             raise FPDFException("No font set, you need to call set_font() beforehand")
         txt = self.normalize_text(txt)
-        if self.is_ttf_font:
-            txt_mapped = ""
-            for char in txt:
-                uni = ord(char)
-                # Instead of adding the actual character to the stream its code is
-                # mapped to a position in the font's subset
-                txt_mapped += chr(self.current_font.subset.pick(uni))
-            txt2 = escape_parens(txt_mapped.encode("utf-16-be").decode("latin-1"))
-        else:
-            txt2 = escape_parens(txt)
         sl = [f"BT {x * self.k:.2f} {(self.h - y) * self.k:.2f} Td"]
         if self.text_mode != TextMode.FILL:
             sl.append(f" {self.text_mode} Tr {self.line_width:.2f} w")
-        sl.append(f"({txt2}) Tj ET")
+        sl.append(f"{self.current_font.encode_text(txt)} ET")
         if (self.underline and txt != "") or self._record_text_quad_points:
             w = self.get_string_width(txt, normalized=True, markdown=False)
             if self.underline and txt != "":
@@ -2870,8 +2927,6 @@ def _render_styled_text_line(
             if self.fill_color != self.text_color:
                 sl.append(self.text_color.serialize().lower())
 
-            # do this once in advance
-            u_space = escape_parens(" ".encode("utf-16-be").decode("latin-1"))
             word_spacing = 0
             if text_line.justify:
                 # Don't rely on align==Align.J here.
@@ -2913,48 +2968,28 @@ def _render_styled_text_line(
                     current_text_mode = frag.text_mode
                     sl.append(f"{frag.text_mode} Tr {frag.line_width:.2f} w")
 
-                if frag.is_ttf_font:
-                    mapped_text = ""
-                    for char in frag.string:
-                        uni = ord(char)
-                        mapped_text += chr(frag.font.subset.pick(uni))
-                    if word_spacing:
-                        # "Tw" only has an effect on the ASCII space character and ignores
-                        # space characters from unicode (TTF) fonts. As a workaround,
-                        # we do word spacing using an adjustment before each space.
-                        # Determine the index of the space character (" ") in the current
-                        # subset and split words whenever this mapping code is found
-                        words = mapped_text.split(chr(frag.font.subset.pick(ord(" "))))
-                        words_strl = []
-                        for word_i, word in enumerate(words):
-                            # pylint: disable=redefined-loop-name
-                            word = escape_parens(
-                                word.encode("utf-16-be").decode("latin-1")
-                            )
-                            if word_i == 0:
-                                words_strl.append(f"({word})")
-                            else:
-                                adj = -(frag_ws * frag.k) * 1000 / frag.font_size_pt
-                                words_strl.append(f"{adj:.3f}({u_space}{word})")
-                        escaped_text = " ".join(words_strl)
-                        sl.append(f"[{escaped_text}] TJ")
-                    else:
-                        escaped_text = escape_parens(
-                            mapped_text.encode("utf-16-be").decode("latin-1")
-                        )
-                        sl.append(f"({escaped_text}) Tj")
-                else:  # core fonts
-                    if frag_ws != current_ws:
-                        sl.append(f"{frag_ws * frag.k:.3f} Tw")
-                        current_ws = frag_ws
-                    escaped_text = escape_parens(frag.string)
-                    sl.append(f"({escaped_text}) Tj")
+                r_text = frag.render_pdf_text(
+                    frag_ws,
+                    current_ws,
+                    word_spacing,
+                    self.x + dx + s_width,
+                    self.y + (0.5 * h + 0.3 * max_font_size),
+                    self.h,
+                )
+                if r_text:
+                    sl.append(r_text)
+
                 frag_width = frag.get_width(
                     initial_cs=i != 0
                 ) + word_spacing * frag.characters.count(" ")
                 if frag.underline:
                     underlines.append(
-                        (self.x + dx + s_width, frag_width, frag.font, frag.font_size)
+                        (
+                            self.x + dx + s_width,
+                            frag_width,
+                            frag.font,
+                            frag.font_size,
+                        )
                     )
                 if frag.link:
                     self.link(
@@ -2964,6 +2999,8 @@ def _render_styled_text_line(
                         h=frag.font_size,
                         link=frag.link,
                     )
+                if not frag.is_ttf_font:
+                    current_ws = frag_ws
                 s_width += frag_width
 
             sl.append("ET")

diff --git a/fpdf/graphics_state.py b/fpdf/graphics_state.py
@@ -44,6 +44,7 @@ def __init__(self, *args, **kwargs):
                 sup_lift=0.4,
                 nom_lift=0.2,
                 denom_lift=0.0,
+                _text_shaping=None,
             ),
         ]
         super().__init__(*args, **kwargs)
@@ -313,6 +314,14 @@ def denom_lift(self, v):
         """
         self.__statestack[-1]["denom_lift"] = float(v)
 
+    @property
+    def _text_shaping(self):
+        return self.__statestack[-1]["_text_shaping"]
+
+    @_text_shaping.setter
+    def _text_shaping(self, v):
+        self.__statestack[-1]["_text_shaping"] = v
+
     def font_face(self):
         """
         Return a `fpdf.fonts.FontFace` instance

diff --git a/fpdf/line_break.py b/fpdf/line_break.py
@@ -10,6 +10,8 @@
 
 from .enums import CharVPos, WrapMode
 from .errors import FPDFException
+from .fonts import CoreFont, TTFFont
+from .util import escape_parens
 
 SOFT_HYPHEN = "\u00ad"
 HYPHEN = "\u002d"
@@ -45,7 +47,7 @@ def __repr__(self):
         )
 
     @property
-    def font(self):
+    def font(self) -> Union[CoreFont, TTFFont]:
         return self.graphics_state["current_font"]
 
     @font.setter
@@ -133,6 +135,10 @@ def lift(self):
             lift = 0.0
         return lift * self.graphics_state["font_size_pt"]
 
+    @property
+    def _text_shaping(self):
+        return self.graphics_state["_text_shaping"]
+
     @property
     def string(self):
         return "".join(self.characters)
@@ -170,22 +176,20 @@ def get_width(
 
         if chars is None:
             chars = self.characters[start:end]
-        if self.is_ttf_font:
-            w = sum(self.font.cw[ord(c)] for c in chars)
-        else:
-            w = sum(self.font.cw[c] for c in chars)
+        (char_len, w) = self.font.get_text_width(
+            chars, self.font_size_pt, self._text_shaping
+        )
         char_spacing = self.char_spacing
         if self.font_stretching != 100:
             w *= self.font_stretching * 0.01
             char_spacing *= self.font_stretching * 0.01
-        w *= self.font_size_pt * 0.001
         if self.char_spacing != 0:
             # initial_cs must be False if the fragment is located at the
             # beginning of a text object, because the first char won't get spaced.
             if initial_cs:
-                w += char_spacing * len(chars)
+                w += char_spacing * char_len
             else:
-                w += char_spacing * (len(chars) - 1)
+                w += char_spacing * (char_len - 1)
         return w / self.k
 
     def get_character_width(self, character: str, print_sh=False, initial_cs=True):
@@ -197,6 +201,114 @@ def get_character_width(self, character: str, print_sh=False, initial_cs=True):
             character = HYPHEN
         return self.get_width(chars=character, initial_cs=initial_cs)
 
+    def render_pdf_text(self, frag_ws, current_ws, word_spacing, adjust_x, adjust_y, h):
+        if self.is_ttf_font:
+            if self._text_shaping:
+                return self.render_with_text_shaping(
+                    adjust_x, adjust_y, h, word_spacing, self._text_shaping
+                )
+            return self.render_pdf_text_ttf(frag_ws, word_spacing)
+        return self.render_pdf_text_core(frag_ws, current_ws)
+
+    def render_pdf_text_ttf(self, frag_ws, word_spacing):
+        ret = ""
+        mapped_text = ""
+        for char in self.string:
+            mapped_char = self.font.subset.pick(ord(char))
+            if mapped_char:
+                mapped_text += chr(mapped_char)
+        if word_spacing:
+            # do this once in advance
+            u_space = escape_parens(" ".encode("utf-16-be").decode("latin-1"))
+
+            # According to the PDF reference, word spacing shall be applied to every
+            # occurrence of the single-byte character code 32 in a string when using
+            # a simple font or a composite font that defines code 32 as a single-byte code.
+            # It shall not apply to occurrences of the byte value 32 in multiple-byte codes.
+            # FPDF uses 2 bytes per character (UTF-16-BE encoding) so the "Tw" operator doesn't work
+            # As a workaround, we do word spacing using an adjustment before each space.
+            # Determine the index of the space character (" ") in the current
+            # subset and split words whenever this mapping code is found
+            #
+            words = mapped_text.split(chr(self.font.subset.pick(ord(" "))))
+            words_strl = []
+            for word_i, word in enumerate(words):
+                # pylint: disable=redefined-loop-name
+                word = escape_parens(word.encode("utf-16-be").decode("latin-1"))
+                if word_i == 0:
+                    words_strl.append(f"({word})")
+                else:
+                    adj = -(frag_ws * self.k) * 1000 / self.font_size_pt
+                    words_strl.append(f"{adj:.3f}({u_space}{word})")
+            escaped_text = " ".join(words_strl)
+            ret += f"[{escaped_text}] TJ"
+        else:
+            escaped_text = escape_parens(
+                mapped_text.encode("utf-16-be").decode("latin-1")
+            )
+            ret += f"({escaped_text}) Tj"
+        return ret
+
+    def render_with_text_shaping(
+        self, pos_x, pos_y, h, word_spacing, text_shaping_parms
+    ):
+        ret = ""
+        text = ""
+        space_mapped_code = self.font.subset.pick(ord(" "))
+
+        def adjust_pos(pos):
+            return (
+                pos
+                * self.font.scale
+                * self.font_size_pt
+                * (self.font_stretching / 100)
+                / 1000
+                / self.k
+            )
+
+        char_spacing = self.char_spacing * (self.font_stretching / 100) / self.k
+        for ti in self.font.shape_text(
+            self.string, self.font_size_pt, text_shaping_parms
+        ):
+            if ti["mapped_char"] is None:  # Missing glyph
+                continue
+            char = chr(ti["mapped_char"]).encode("utf-16-be").decode("latin-1")
+            if ti["x_offset"] != 0 or ti["y_offset"] != 0:
+                if text:
+                    ret += f"({text}) Tj "
+                    text = ""
+                offsetx = pos_x + adjust_pos(ti["x_offset"])
+                offsety = pos_y - adjust_pos(ti["y_offset"])
+                ret += (
+                    f"1 0 0 1 {(offsetx) * self.k:.2f} {(h - offsety) * self.k:.2f} Tm "
+                )
+            text += char
+            pos_x += adjust_pos(ti["x_advance"]) + char_spacing
+            pos_y += adjust_pos(ti["y_advance"])
+            if word_spacing and ti["mapped_char"] == space_mapped_code:
+                pos_x += word_spacing
+
+            # if only moving "x" we don't need to move the text matrix
+            if ti["force_positioning"] or (
+                word_spacing and ti["mapped_char"] == space_mapped_code
+            ):
+                if text:
+                    ret += f"({text}) Tj "
+                    text = ""
+                ret += f"1 0 0 1 {(pos_x) * self.k:.2f} {(h - pos_y) * self.k:.2f} Tm "
+
+        if text:
+            ret += f"({text}) Tj"
+        return ret
+
+    def render_pdf_text_core(self, frag_ws, current_ws):
+        ret = ""
+        if frag_ws != current_ws:
+            ret += f"{frag_ws * self.k:.3f} Tw "
+        escaped_text = escape_parens(self.string)
+        ret += f"({escaped_text}) Tj"
+        return ret
+
 
 class TextLine(NamedTuple):
     fragments: tuple

diff --git a/fpdf/output.py b/fpdf/output.py
@@ -22,7 +22,6 @@
 from .syntax import create_list_string as pdf_list
 from .syntax import iobj_ref as pdf_ref
 
-from fontTools import ttLib
 from fontTools import subset as ftsubset
 
 try:
@@ -537,35 +536,14 @@ def _add_fonts(self):
             elif font.type == "TTF":
                 fontname = f"MPDFAA+{font.name}"
 
-                # unicode_char -> new_code_char map for chars embedded in the PDF
-                uni_to_new_code_char = font.subset.dict()
-
-                # why we delete 0-element?
-                del uni_to_new_code_char[0]
-
-                # ---- FONTTOOLS SUBSETTER ----
-                # recalcTimestamp=False means that it doesn't modify the "modified" timestamp in head table
-                # if we leave recalcTimestamp=True the tests will break every time
-                fonttools_font = ttLib.TTFont(
-                    file=font.ttffile, recalcTimestamp=False, fontNumber=0, lazy=True
-                )
-
                 # 1. get all glyphs in PDF
-                cmap = fonttools_font["cmap"].getBestCmap()
-                glyph_names = [
-                    cmap[unicode] for unicode in uni_to_new_code_char if unicode in cmap
-                ]
+                glyph_names = font.subset.get_all_glyph_names()
 
-                missing_glyphs = [
-                    chr(unicode)
-                    for unicode in uni_to_new_code_char
-                    if unicode not in cmap
-                ]
-                if len(missing_glyphs) > 0:
+                if len(font.missing_glyphs) > 0:
                     LOGGER.warning(
                         "Font %s is missing the following glyphs: %s",
                         fontname,
-                        ", ".join(missing_glyphs),
+                        ", ".join(chr(x) for x in font.missing_glyphs),
                     )
 
                 # 2. make a subset
@@ -586,26 +564,23 @@ def _add_fonts(self):
                 ]
                 subsetter = ftsubset.Subsetter(options)
                 subsetter.populate(glyphs=glyph_names)
-                subsetter.subset(fonttools_font)
+                subsetter.subset(font.ttfont)
 
                 # 3. make codeToGlyph
                 # is a map Character_ID -> Glyph_ID
                 # it's used for associating glyphs to new codes
                 # this basically takes the old code of the character
                 # take the glyph associated with it
                 # and then associate to the new code the glyph associated with the old code
-                code_to_glyph = {}
-                for code, new_code_mapped in uni_to_new_code_char.items():
-                    # notdef is associated if no glyph was associated to the old code
-                    # it's not necessary to do this, it seems to be done by default
-                    glyph_name = cmap.get(code, ".notdef")
-                    code_to_glyph[new_code_mapped] = fonttools_font.getGlyphID(
-                        glyph_name
-                    )
+
+                code_to_glyph = {
+                    font.subset._map[glyph]: font.ttfont.getGlyphID(glyph.glyph_name)
+                    for glyph in font.subset._map.keys()
+                }
 
                 # 4. return the ttfile
                 output = BytesIO()
-                fonttools_font.save(output)
+                font.ttfont.save(output)
 
                 output.seek(0)
                 ttfontstream = output.read()
@@ -624,7 +599,7 @@ def _add_fonts(self):
                     subtype="CIDFontType2",
                     base_font=fontname,
                     d_w=font.desc.missing_width,
-                    w=_tt_font_widths(font, max(uni_to_new_code_char)),
+                    w=_tt_font_widths(font),
                 )
                 self._add_pdf_obj(cid_font_obj, "fonts")
                 composite_font_obj.descendant_fonts = PDFArray([cid_font_obj])
@@ -634,17 +609,21 @@ def _add_fonts(self):
                 # character that each used 16-bit code belongs to. It
                 # allows searching the file and copying text from it.
                 bfChar = []
-                uni_to_new_code_char = font.subset.dict()
-                for code, code_mapped in uni_to_new_code_char.items():
-                    if code > 0xFFFF:
+
+                def format_code(unicode):
+                    if unicode > 0xFFFF:
                         # Calculate surrogate pair
-                        code_high = 0xD800 | (code - 0x10000) >> 10
-                        code_low = 0xDC00 | (code & 0x3FF)
-                        bfChar.append(
-                            f"<{code_mapped:04X}> <{code_high:04X}{code_low:04X}>\n"
-                        )
-                    else:
-                        bfChar.append(f"<{code_mapped:04X}> <{code:04X}>\n")
+                        code_high = 0xD800 | (unicode - 0x10000) >> 10
+                        code_low = 0xDC00 | (unicode & 0x3FF)
+                        return f"{code_high:04X}{code_low:04X}"
+                    return f"{unicode:04X}"
+
+                for glyph, code_mapped in font.subset._map.items():
+                    if len(glyph.unicode) == 0:
+                        continue
+                    bfChar.append(
+                        f'<{code_mapped:04X}> <{"".join(format_code(code) for code in glyph.unicode)}>\n'
+                    )
 
                 to_unicode_obj = PDFContentStream(
                     "/CIDInit /ProcSet findresource begin\n"
@@ -699,6 +678,8 @@ def _add_fonts(self):
                 self._add_pdf_obj(font_file_cs_obj, "fonts")
                 font_descriptor_obj.font_file2 = font_file_cs_obj
 
+                font.close()
+
         return font_objs_per_index
 
     def _add_images(self):
@@ -957,48 +938,44 @@ def _log_final_sections_sizes(self):
             LOGGER.debug("- %s: %s", label, _sizeof_fmt(section_size))
 
 
-def _tt_font_widths(font, maxUni):
+def _tt_font_widths(font):
     rangeid = 0
     range_ = {}
     range_interval = {}
     prevcid = -2
     prevwidth = -1
     interval = False
-    startcid = 1
-    cwlen = maxUni + 1
-
-    # for each character
-    subset = font.subset.dict()
-    for cid in range(startcid, cwlen):
-        char_width = font.cw[cid]
-        cid_mapped = subset.get(cid)
-        if cid_mapped is None:
-            continue
+
+    # Glyphs sorted by mapped character id
+    glyphs = dict(sorted(font.subset._map.items(), key=lambda item: item[1]))
+
+    for glyph in glyphs:
+        cid_mapped = glyphs[glyph]
         if cid_mapped == (prevcid + 1):
-            if char_width == prevwidth:
-                if char_width == range_[rangeid][0]:
-                    range_.setdefault(rangeid, []).append(char_width)
+            if glyph.glyph_width == prevwidth:
+                if glyph.glyph_width == range_[rangeid][0]:
+                    range_.setdefault(rangeid, []).append(glyph.glyph_width)
                 else:
                     range_[rangeid].pop()
                     # new range
                     rangeid = prevcid
-                    range_[rangeid] = [prevwidth, char_width]
+                    range_[rangeid] = [prevwidth, glyph.glyph_width]
                 interval = True
                 range_interval[rangeid] = True
             else:
                 if interval:
                     # new range
                     rangeid = cid_mapped
-                    range_[rangeid] = [char_width]
+                    range_[rangeid] = [glyph.glyph_width]
                 else:
-                    range_[rangeid].append(char_width)
+                    range_[rangeid].append(glyph.glyph_width)
                 interval = False
         else:
             rangeid = cid_mapped
-            range_[rangeid] = [char_width]
+            range_[rangeid] = [glyph.glyph_width]
             interval = False
         prevcid = cid_mapped
-        prevwidth = char_width
+        prevwidth = glyph.glyph_width
     prevk = -1
     nextk = -1
     prevint = False

diff --git a/mkdocs.yml b/mkdocs.yml
@@ -129,6 +129,7 @@ nav:
   - 'Page breaks':                    'PageBreaks.md'
   - 'Text styling':                   'TextStyling.md'
   - 'Unicode':                        'Unicode.md'
+  - 'Text Shaping':                   'TextShaping.md'
   - 'Emojis, Symbols & Dingbats':     'EmojisSymbolsDingbats.md'
   - 'HTML':                           'HTML.md'
 - 'Graphics Content':

diff --git a/scripts/verapdf-ignore.json b/scripts/verapdf-ignore.json
@@ -7,6 +7,10 @@
         "6.2.3-2": "REASON: fpdf2 does not currently support PDF/A",
         "6.2.3-3": "REASON: fpdf2 does not currently support PDF/A",
         "6.2.3-4": "REASON: fpdf2 does not currently support PDF/A",
+        "6.2.3.2-1": "REASON: fpdf2 does not currently support PDF/A",
+        "6.2.3.3-1": "REASON: fpdf2 does not currently support PDF/A",
+        "6.2.3.3-2": "REASON: fpdf2 does not currently support PDF/A",
+        "6.2.3.3-3": "REASON: fpdf2 does not currently support PDF/A",
         "6.3.4-1": "REASON: fpdf2 still allows using the PostScript standard 14 fonts. Quoting PDF 1.7 spec from 2006: Beginning with PDF 1.5, the special treatment given to the standard 14 fonts is deprecated. All fonts used in a PDF document should be represented using a com- plete font descriptor. For backwards capability, viewer applications must still provide the special treatment identified for the standard 14 fonts.",
         "6.3.5-3": "FIXME: corresponding GitHub issue -> https://github.com/PyFPDF/fpdf2/issues/88",
         "6.4-1": "REASON: enabled by default, can be disabled by setting pdf.allow_images_transparency = False",
@@ -20,9 +24,15 @@
         "6.6.1-1": "REASON: fpdf2 allows to create Launch actions that VeraPDF forbid arbitrarily",
         "6.7.2-1": "REASON: setting XML/XMP metadata is entirely optional with fpdf2",
         "6.7.3-1": "REASON: setting XML/XMP metadata is entirely optional with fpdf2",
+        "6.7.3-2": "REASON: setting XML/XMP metadata is entirely optional with fpdf2",
+        "6.7.3-3": "REASON: setting XML/XMP metadata is entirely optional with fpdf2",
+        "6.7.3-4": "REASON: setting XML/XMP metadata is entirely optional with fpdf2",
+        "6.7.3-5": "REASON: setting XML/XMP metadata is entirely optional with fpdf2",
+        "6.7.3-6": "REASON: setting XML/XMP metadata is entirely optional with fpdf2",
+        "6.7.3-7": "REASON: setting XML/XMP metadata is entirely optional with fpdf2",
         "6.7.11-1": "REASON: up to fpdf2 v2.3.2, test_xmp_metadata included the PDF/A version and conformance level of the file, but then it started to break PDF Checker does-not-conform-to-claimed-pdfa-type rule. PENDING proper support for PDF/A",
         "6.9-2": "REASON: false positive on test/signing/sign_pkcs12.pdf",
         "6.1.11-1": "REASON: /EF is allowed in order for fpdf2 to be able to embed files",
         "6.1.11-2": "REASON: /EmbeddedFiles is allowed in order for fpdf2 to be able to embed files"
     }
-}
+}
diff --git a/test/embed_file_all_optionals.pdf b/test/embed_file_all_optionals.pdf
diff --git a/test/embed_file_self.pdf b/test/embed_file_self.pdf
diff --git a/test/encryption/encrypt_fonts.pdf b/test/encryption/encrypt_fonts.pdf
diff --git a/test/file_attachment_annotation.pdf b/test/file_attachment_annotation.pdf
diff --git a/test/fonts/add_font_unicode.pdf b/test/fonts/add_font_unicode.pdf
diff --git a/test/fonts/charmap_first_999_chars-DejaVuSans-Oblique.pdf b/test/fonts/charmap_first_999_chars-DejaVuSans-Oblique.pdf
diff --git a/test/fonts/charmap_first_999_chars-DejaVuSans.pdf b/test/fonts/charmap_first_999_chars-DejaVuSans.pdf
diff --git a/test/fonts/charmap_first_999_chars-DejaVuSansMono.pdf b/test/fonts/charmap_first_999_chars-DejaVuSansMono.pdf
diff --git a/test/fonts/charmap_first_999_chars-DroidSansFallback.pdf b/test/fonts/charmap_first_999_chars-DroidSansFallback.pdf
diff --git a/test/fonts/charmap_first_999_chars-Quicksand-Regular.pdf b/test/fonts/charmap_first_999_chars-Quicksand-Regular.pdf
diff --git a/test/fonts/charmap_first_999_chars-Roboto-Regular.pdf b/test/fonts/charmap_first_999_chars-Roboto-Regular.pdf
diff --git a/test/fonts/charmap_first_999_chars-TwitterEmoji.pdf b/test/fonts/charmap_first_999_chars-TwitterEmoji.pdf
diff --git a/test/fonts/charmap_first_999_chars-Waree.pdf b/test/fonts/charmap_first_999_chars-Waree.pdf
diff --git a/test/fonts/charmap_first_999_chars-cmss12.pdf b/test/fonts/charmap_first_999_chars-cmss12.pdf
diff --git a/test/fonts/fallback_font.pdf b/test/fonts/fallback_font.pdf
diff --git a/test/fonts/fallback_font_ignore_style.pdf b/test/fonts/fallback_font_ignore_style.pdf
diff --git a/test/fonts/fallback_font_with_overriden_get_fallback_font.pdf b/test/fonts/fallback_font_with_overriden_get_fallback_font.pdf
diff --git a/test/fonts/fonts_emoji_glyph.pdf b/test/fonts/fonts_emoji_glyph.pdf
diff --git a/test/fonts/fonts_otf.pdf b/test/fonts/fonts_otf.pdf
diff --git a/test/fonts/fonts_remap_nb.pdf b/test/fonts/fonts_remap_nb.pdf
diff --git a/test/fonts/fonts_two_mappings.pdf b/test/fonts/fonts_two_mappings.pdf
diff --git a/test/fonts/render_en_dash.pdf b/test/fonts/render_en_dash.pdf
diff --git a/test/fonts/test_font_remap.py b/test/fonts/test_font_remap.py
@@ -1,38 +1,12 @@
 from pathlib import Path
 
 from fpdf import FPDF
-from fpdf.fonts import SubsetMap
+
 from test.conftest import assert_pdf_equal
 
 HERE = Path(__file__).resolve().parent
 
 
-def test_subset_map():
-    subset_map = SubsetMap(range(0, 1024, 2))
-    assert len(subset_map.dict()) == 512
-
-    for i in range(0, 1024, 2):
-        assert i % 2 == 0
-        assert i == subset_map.pick(i)
-
-    for i in range(1023, 512, -2):
-        assert subset_map.pick(i) % 2 == 1
-    assert len(subset_map.dict()) == 512 + 256
-
-    for i in range(1, 1000, 2):
-        assert subset_map.pick(i) % 2 == 1
-
-    assert len(subset_map.dict()) == 1024
-
-    subset_dict = subset_map.dict()
-    for i in subset_dict:
-        for j in subset_dict:
-            if i != j:
-                assert subset_dict[i] != subset_dict[j]
-            else:
-                assert subset_dict[i] == subset_dict[i]
-
-
 def test_emoji_glyph(tmp_path):
     pdf = FPDF()
 

diff --git a/test/fonts/thai_text.pdf b/test/fonts/thai_text.pdf
diff --git a/test/html/html_custom_pre_code_font.pdf b/test/html/html_custom_pre_code_font.pdf
diff --git a/test/html/html_heading_hebrew.pdf b/test/html/html_heading_hebrew.pdf
diff --git a/test/html/issue_156.pdf b/test/html/issue_156.pdf
diff --git a/test/outline/russian_heading.pdf b/test/outline/russian_heading.pdf
diff --git a/test/requirements.txt b/test/requirements.txt
@@ -12,3 +12,4 @@ pytest-cov
 qrcode
 semgrep
 tabula-py
+uharfbuzz
diff --git a/test/table/table_with_ttf_font.pdf b/test/table/table_with_ttf_font.pdf
diff --git a/test/table/table_with_ttf_font_and_headings.pdf b/test/table/table_with_ttf_font_and_headings.pdf
diff --git a/test/text/cell_curfont_leak.pdf b/test/text/cell_curfont_leak.pdf
diff --git a/test/text/cell_markdown_right_aligned.pdf b/test/text/cell_markdown_right_aligned.pdf
diff --git a/test/text/cell_markdown_with_ttf_fonts.pdf b/test/text/cell_markdown_with_ttf_fonts.pdf
diff --git a/test/text/multi_cell_char_spacing.pdf b/test/text/multi_cell_char_spacing.pdf
diff --git a/test/text/multi_cell_font_leakage.pdf b/test/text/multi_cell_font_leakage.pdf
diff --git a/test/text/multi_cell_font_stretching.pdf b/test/text/multi_cell_font_stretching.pdf
diff --git a/test/text/multi_cell_j_paragraphs.pdf b/test/text/multi_cell_j_paragraphs.pdf
diff --git a/test/text/multi_cell_justified_with_unicode_font.pdf b/test/text/multi_cell_justified_with_unicode_font.pdf
diff --git a/test/text/multi_cell_markdown_with_ttf_fonts.pdf b/test/text/multi_cell_markdown_with_ttf_fonts.pdf
diff --git a/test/text/text_positioning.pdf b/test/text/text_positioning.pdf
diff --git a/test/text/varfrags_fonts.pdf b/test/text/varfrags_fonts.pdf
diff --git a/test/text/write_font_stretching.pdf b/test/text/write_font_stretching.pdf
diff --git a/test/text_shaping/Dumbledor3Thin.ttf b/test/text_shaping/Dumbledor3Thin.ttf
diff --git a/test/text_shaping/FiraCode-Regular.ttf b/test/text_shaping/FiraCode-Regular.ttf
diff --git a/test/text_shaping/KFGQPC Uthmanic Script HAFS Regular.otf b/test/text_shaping/KFGQPC Uthmanic Script HAFS Regular.otf
diff --git a/test/text_shaping/Mangal 400.ttf b/test/text_shaping/Mangal 400.ttf
diff --git a/test/text_shaping/SBL_Hbrw.ttf b/test/text_shaping/SBL_Hbrw.ttf
diff --git a/test/text_shaping/ViaodaLibre-Regular.ttf b/test/text_shaping/ViaodaLibre-Regular.ttf
diff --git a/test/text_shaping/__init__.py b/test/text_shaping/__init__.py
diff --git a/test/text_shaping/arabic.pdf b/test/text_shaping/arabic.pdf
diff --git a/test/text_shaping/features.pdf b/test/text_shaping/features.pdf
diff --git a/test/text_shaping/hebrew_diacritics.pdf b/test/text_shaping/hebrew_diacritics.pdf
diff --git a/test/text_shaping/kerning.pdf b/test/text_shaping/kerning.pdf
diff --git a/test/text_shaping/ligatures.pdf b/test/text_shaping/ligatures.pdf
diff --git a/test/text_shaping/multi_cell_markdown_with_styling.pdf b/test/text_shaping/multi_cell_markdown_with_styling.pdf
diff --git a/test/text_shaping/shaping_hindi.pdf b/test/text_shaping/shaping_hindi.pdf
diff --git a/test/text_shaping/test_text_shaping.py b/test/text_shaping/test_text_shaping.py
@@ -0,0 +1,141 @@
+from pathlib import Path
+
+from fpdf import FPDF
+from test.conftest import assert_pdf_equal
+
+HERE = Path(__file__).resolve().parent
+FONTS_DIR = HERE.parent / "fonts"
+
+
+def test_indi_text(tmp_path):
+    # issue #365
+    pdf = FPDF()
+    pdf.add_page()
+    pdf.add_font(family="Mangal", fname=HERE / "Mangal 400.ttf")
+    pdf.set_font("Mangal", size=40)
+    pdf.set_text_shaping(False)
+    pdf.cell(txt="इण्टरनेट पर हिन्दी के साधन", new_x="LEFT", new_y="NEXT")
+    pdf.ln()
+    pdf.set_text_shaping(True)
+    pdf.cell(txt="इण्टरनेट पर हिन्दी के साधन", new_x="LEFT", new_y="NEXT")
+
+    assert_pdf_equal(pdf, HERE / "shaping_hindi.pdf", tmp_path)
+
+
+def test_text_replacement(tmp_path):
+    pdf = FPDF()
+    pdf.add_page()
+    pdf.add_font(family="FiraCode", fname=HERE / "FiraCode-Regular.ttf")
+    pdf.set_font("FiraCode", size=40)
+    pdf.set_text_shaping(False)
+    pdf.cell(txt="http://www 3 >= 2 != 1", new_x="LEFT", new_y="NEXT")
+    pdf.ln()
+    pdf.set_text_shaping(True)
+    pdf.cell(txt="http://www 3 >= 2 != 1", new_x="LEFT", new_y="NEXT")
+
+    assert_pdf_equal(pdf, HERE / "text_replacement.pdf", tmp_path)
+
+
+def test_kerning(tmp_path):
+    # issue #812
+    pdf = FPDF()
+    pdf.add_page()
+    pdf.add_font(family="Dumbledor3Thin", fname=HERE / "Dumbledor3Thin.ttf")
+    pdf.set_font("Dumbledor3Thin", size=40)
+    pdf.set_text_shaping(False)
+    pdf.cell(txt="Ты То Тф Та Тт Ти", new_x="LEFT", new_y="NEXT")
+    pdf.ln()
+    pdf.set_text_shaping(True)
+    pdf.cell(txt="Ты То Тф Та Тт Ти", new_x="LEFT", new_y="NEXT")
+
+    assert_pdf_equal(pdf, HERE / "kerning.pdf", tmp_path)
+
+
+def test_hebrew_diacritics(tmp_path):
+    # issue #549
+    pdf = FPDF()
+    pdf.add_page()
+    pdf.add_font(family="SBL_Hbrw", fname=HERE / "SBL_Hbrw.ttf")
+    pdf.set_font("SBL_Hbrw", size=40)
+    pdf.set_text_shaping(False)
+    pdf.cell(txt="בּ", new_x="LEFT", new_y="NEXT")
+    pdf.ln()
+    pdf.set_text_shaping(True)
+    pdf.cell(txt="בּ", new_x="LEFT", new_y="NEXT")
+
+    assert_pdf_equal(pdf, HERE / "hebrew_diacritics.pdf", tmp_path)
+
+
+def test_ligatures(tmp_path):
+    pdf = FPDF()
+    pdf.add_page()
+    pdf.add_font(family="ViaodaLibre", fname=HERE / "ViaodaLibre-Regular.ttf")
+    pdf.set_font("ViaodaLibre", size=40)
+    pdf.set_text_shaping(False)
+    pdf.cell(txt="final soft stuff", new_x="LEFT", new_y="NEXT")
+    pdf.ln()
+    pdf.set_text_shaping(True)
+    pdf.cell(txt="final soft stuff", new_x="LEFT", new_y="NEXT")
+
+    assert_pdf_equal(pdf, HERE / "ligatures.pdf", tmp_path)
+
+
+def test_arabic_right_to_left(tmp_path):
+    # issue #549
+    pdf = FPDF()
+    pdf.add_page()
+    pdf.add_font(
+        family="KFGQPC", fname=HERE / "KFGQPC Uthmanic Script HAFS Regular.otf"
+    )
+    pdf.set_font("KFGQPC", size=36)
+    pdf.set_text_shaping(False)
+    pdf.cell(txt="مثال على اللغة العربية. محاذاة لليمين.", new_x="LEFT", new_y="NEXT")
+    pdf.ln(36)
+    pdf.set_text_shaping(True)
+    pdf.cell(txt="مثال على اللغة العربية. محاذاة لليمين.", new_x="LEFT", new_y="NEXT")
+
+    assert_pdf_equal(pdf, HERE / "arabic.pdf", tmp_path)
+
+
+def test_multi_cell_markdown_with_shaping(tmp_path):
+    pdf = FPDF()
+    pdf.add_page()
+    pdf.add_font("Roboto", "", FONTS_DIR / "Roboto-Regular.ttf")
+    pdf.add_font("Roboto", "B", FONTS_DIR / "Roboto-Bold.ttf")
+    pdf.add_font("Roboto", "I", FONTS_DIR / "Roboto-Italic.ttf")
+    pdf.set_font("Roboto", size=32)
+    pdf.set_text_shaping(True)
+    text = (  # Some text where styling occur over line breaks:
+        # pylint: disable=implicit-str-concat
+        "Lorem ipsum dolor, **consectetur adipiscing** elit,"
+        " eiusmod __tempor incididunt__ ut labore et dolore --magna aliqua--."
+    )
+    pdf.multi_cell(
+        w=pdf.epw, txt=text, markdown=True
+    )  # This is tricky to get working well
+    pdf.ln()
+    pdf.multi_cell(w=pdf.epw, txt=text, markdown=True, align="L")
+    assert_pdf_equal(pdf, HERE / "multi_cell_markdown_with_styling.pdf", tmp_path)
+
+
+def test_features(tmp_path):
+    pdf = FPDF()
+    pdf.add_page()
+    pdf.add_font(family="ViaodaLibre", fname=HERE / "ViaodaLibre-Regular.ttf")
+    pdf.set_font("ViaodaLibre", size=40)
+    pdf.set_text_shaping(use_shaping_engine=True)
+    pdf.cell(txt="final soft stuff", new_x="LEFT", new_y="NEXT")
+    pdf.ln()
+    pdf.set_text_shaping(use_shaping_engine=True, features={"liga": False})
+    pdf.cell(txt="final soft stuff", new_x="LEFT", new_y="NEXT")
+    pdf.ln()
+    pdf.set_text_shaping(use_shaping_engine=True, features={"kern": False})
+    pdf.cell(txt="final soft stuff", new_x="LEFT", new_y="NEXT")
+    pdf.ln()
+    pdf.set_text_shaping(
+        use_shaping_engine=True, direction="rtl", script="Latn", language="en-us"
+    )
+    pdf.cell(txt="final soft stuff", new_x="LEFT", new_y="NEXT")
+    pdf.ln()
+
+    assert_pdf_equal(pdf, HERE / "features.pdf", tmp_path)
diff --git a/test/text_shaping/text_replacement.pdf b/test/text_shaping/text_replacement.pdf
-Original file line number
+Diff line change
@@ @@ -12,3 +12,4 @@ pytest-cov @@
     qrcode
     semgrep
     tabula-py
+    uharfbuzz