Skip to content

Commit

Permalink
Text shaping (#820)
Browse files Browse the repository at this point in the history
Co-authored-by: Lucas Cimon <925560+Lucas-C@users.noreply.github.com>
Co-authored-by: Anderson HerzogenrathDaCosta <anderson.costa@cn.ca>
3 people authored Aug 2, 2023

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
1 parent dface79 commit b671cb6
Showing 71 changed files with 761 additions and 187 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -21,6 +21,7 @@ This can also be enabled programmatically with `warnings.simplefilter('default',
- [`FPDF.mirror()`](https://pyfpdf.github.io/fpdf2/fpdf/fpdf.html#fpdf.fpdf.FPDF.mirror) - New method: [documentation page](https://pyfpdf.github.io/fpdf2/Transformations.html) - Contributed by @sebastiantia
- [`FPDF.table()`](https://pyfpdf.github.io/fpdf2/fpdf/fpdf.html#fpdf.fpdf.FPDF.table): new optional parameters `gutter_height`, `gutter_width` and `wrapmode`. Links can also be added to cells by passing a `link` parameter to [`Row.cell()`](https://pyfpdf.github.io/fpdf2/fpdf/table.html#fpdf.table.Row.cell)
- [`FPDF.multi_cell()`](https://pyfpdf.github.io/fpdf2/fpdf/fpdf.html#fpdf.fpdf.FPDF.multi_cell): has a new optional `center` parameter to position the cell horizontally at the center of the page
- [`FPDF.set_text_shaping()`](https://pyfpdf.github.io/fpdf2/fpdf/fpdf.html#fpdf.fpdf.FPDF.set_text_shaping): new method to perform text shaping using Harfbuzz - [documentation](https://pyfpdf.github.io/fpdf2/TextShaping.html).
- Added tutorial in Khmer language: [ភាសខ្មែរ](https://pyfpdf.github.io/fpdf2/Tutorial-km.html) - thanks to @kuth-chi
- Added tutorial in [日本語](https://pyfpdf.github.io/fpdf2/Tutorial-ja.html) - thanks to @alcnaka
- Better documentation & errors when facing HTML rendering limitations for `<table>` tags: <https://pyfpdf.github.io/fpdf2/HTML.html>
84 changes: 84 additions & 0 deletions docs/TextShaping.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Text Shaping #

## What is text shaping? ##
Text shaping is a fundamental process in typography and computer typesetting that influences the aesthetics and readability of text in various languages and scripts. It involves the transformation of Unicode text into glyphs, which are then positioned for display or print.

For texts in latin script, text shaping can improve the aesthetics by replacing characters that would colide or overlap by a single glyph specially crafted to look harmonious.

![](text-shaping-ligatures.png)

This process is especially important for scripts that require complex layout, such as Arabic or Indic scripts, where characters change shape depending on their context.

There are three primary aspects of text shaping that contribute to the overall appearance of the text: kerning, ligatures, and glyph substitution.


### Kerning ###
Kerning refers to the adjustment of space between individual letter pairs in a font. This process is essential to avoid awkward gaps or overlaps that may occur due to the default spacing of the font. By manually or programmatically modifying the kerning, we can ensure an even and visually pleasing distribution of letters, which significantly improves the readability and aesthetic quality of the text.

![](text-shaping-kerning.png)


### Ligatures ###
Ligatures are special characters that are created by combining two or more glyphs. This is frequently used to avoid collision between characters or to adhere to the typographic traditions. For instance, in English typography, the most common ligatures are "fi" and "fl", which are often fused into single characters to provide a more seamless reading experience.


### Glyph Substitution ###
Glyph substitution is a mechanism that replaces one glyph or a set of glyphs with one or more alternative glyphs. This is a crucial aspect of text shaping, especially for complex scripts where the representation of a character can significantly vary based on its surrounding characters. For example, in Arabic script, a letter can have different forms depending on whether it's at the beginning, middle, or end of a word.

Another common use of glyph substitution is to replace a sequence of characters by a symbol that better represent the meaning of those characters on a specialized context (mathematical, programming, etc.).

![](text-shaping-substitution.png)




## Usage ##
Text shaping is disabled by default to keep backwards compatibility, reduce resource requirements and not make uharfbuzz a hard dependency.

If you want to use text shaping, the first step is installing the uharfbuzz package via pip.

```python
pip install uharfbuzz
```

⚠️ Text shaping is *not* available for type 1 fonts.

### Basic usage ###
The method `set_text_shaping()` is used to control text shaping on a document. The only mandatory argument, `use_shaping_engine` can be set to `True` to enable the shaping mechaning or `False` to disable it.

```python
pdf = FPDF()
pdf.add_page()
pdf.add_font(family="ViaodaLibre", fname=HERE / "ViaodaLibre-Regular.ttf")
pdf.set_font("ViaodaLibre", size=40)
pdf.set_text_shaping(True)
pdf.cell(txt="final soft stuff")
pdf.output("Example.pdf")
```

### Features ###
On most languages, Harfbuzz enables all features by default. If you want to enable or disable a specific feature you can pass a dictionary containing the 4 digit OpenType feature code as key and a boolean value to indicate if it should be enabled or disable.

Example:
```python
pdf.set_text_shaping(use_shaping_engine=True, features={"kern": False, "liga": False})
```

The full list of OpenType feature codes can be found [here](https://learn.microsoft.com/en-us/typography/opentype/spec/featuretags)

### Additional options ###
To perform the text shaping, harfbuzz needs to know some information like the language and the direction (right-to-left, left-to-right, etc) in order to apply the correct rules. Those information can be guessed based on the text being shaped, but you can also set the information to make sure the correct rules will be applied.

Examples:
```python
pdf.set_text_shaping(use_shaping_engine=True, direction="rtl", script="arab", language="ara")
```
```python
pdf.set_text_shaping(use_shaping_engine=True, direction="ltr", script="latn", language="eng")
```

Direction can be `ltr` (left to right) or `rtl` (right to left). The `ttb` (top to bottom) and `btt` (bottom to top) directions are not supported by fpdf2 for now.

[Valid OpenType script tags](https://learn.microsoft.com/en-us/typography/opentype/spec/scripttags)

[Valid OpenType language codes](https://learn.microsoft.com/en-us/typography/opentype/spec/languagetags)
Binary file added docs/text-shaping-kerning.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/text-shaping-ligatures.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/text-shaping-substitution.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
291 changes: 259 additions & 32 deletions fpdf/fonts.py

Large diffs are not rendered by default.

145 changes: 91 additions & 54 deletions fpdf/fpdf.py
Original file line number Diff line number Diff line change
@@ -86,10 +86,7 @@ class Image:
from .svg import Percent, SVGObject
from .syntax import DestinationXYZ, PDFDate
from .table import Table
from .util import (
escape_parens,
get_scale_factor,
)
from .util import get_scale_factor

# Public global variables:
FPDF_VERSION = "2.7.4"
@@ -572,6 +569,76 @@ def set_display_mode(self, zoom, layout="continuous"):
raise FPDFException(f"Incorrect zoom display mode: {zoom}")
self.page_layout = LAYOUT_ALIASES.get(layout, layout)

# Disabling this check - importing outside toplevel to check module is present
# pylint: disable=import-outside-toplevel, unused-import
def set_text_shaping(
self,
use_shaping_engine: bool = True,
features: dict = None,
direction: str = None,
script: str = None,
language: str = None,
):
"""
Enable or disable text shaping engine when rendering text.
If features, direction, script or language are not specified the shaping engine will try
to guess the values based on the input text.
Args:
use_shaping_engine: enable or disable the use of the shaping engine to process the text
features: a dictionary containing 4 digit OpenType features and whether each feature
should be enabled or disabled
example: features={"kern": False, "liga": False}
direction: the direction the text should be rendered, either "ltr" (left to right)
or "rtl" (right to left).
script: a valid OpenType script tag like "arab" or "latn"
language: a valid OpenType language tag like "eng" or "fra"
"""
if use_shaping_engine:
try:
import uharfbuzz
except ImportError as exc:
raise FPDFException(
"The uharfbuzz package could not be imported, but is required for text shaping. Try: pip install uharfbuzz"
) from exc
else:
self._text_shaping = None
return
#
# Features must be a dictionary contaning opentype features and a boolean flag
# stating wether the feature should be enabled or disabled.
#
# e.g. features={"liga": True, "kern": False}
#
# https://harfbuzz.github.io/shaping-opentype-features.html
#

if features and not isinstance(features, dict):
raise FPDFException(
"Features must be a dictionary. See text shaping documentation"
)
if not features:
features = {}

# Buffer properties (direction, script and language)
# if the properties are not provided, Harfbuzz "guessing" logic is used.
# https://harfbuzz.github.io/setting-buffer-properties.html
# Valid harfbuzz directions are lrt (left to right), rtl (right to left),
# ttb (top to bottom) or btt (bottom to top)

if direction and direction not in ("ltr", "rtl"):
raise FPDFException(
"FPDF2 only accept ltr (left to right) or rtl (right to left) directions for now."
)

self._text_shaping = {
"use_shaping_engine": True,
"features": features,
"direction": direction,
"script": script,
"language": language,
}

@property
def page_layout(self):
return self._page_layout
@@ -2316,20 +2383,10 @@ def text(self, x, y, txt=""):
if not self.font_family:
raise FPDFException("No font set, you need to call set_font() beforehand")
txt = self.normalize_text(txt)
if self.is_ttf_font:
txt_mapped = ""
for char in txt:
uni = ord(char)
# Instead of adding the actual character to the stream its code is
# mapped to a position in the font's subset
txt_mapped += chr(self.current_font.subset.pick(uni))
txt2 = escape_parens(txt_mapped.encode("utf-16-be").decode("latin-1"))
else:
txt2 = escape_parens(txt)
sl = [f"BT {x * self.k:.2f} {(self.h - y) * self.k:.2f} Td"]
if self.text_mode != TextMode.FILL:
sl.append(f" {self.text_mode} Tr {self.line_width:.2f} w")
sl.append(f"({txt2}) Tj ET")
sl.append(f"{self.current_font.encode_text(txt)} ET")
if (self.underline and txt != "") or self._record_text_quad_points:
w = self.get_string_width(txt, normalized=True, markdown=False)
if self.underline and txt != "":
@@ -2870,8 +2927,6 @@ def _render_styled_text_line(
if self.fill_color != self.text_color:
sl.append(self.text_color.serialize().lower())

# do this once in advance
u_space = escape_parens(" ".encode("utf-16-be").decode("latin-1"))
word_spacing = 0
if text_line.justify:
# Don't rely on align==Align.J here.
@@ -2913,48 +2968,28 @@ def _render_styled_text_line(
current_text_mode = frag.text_mode
sl.append(f"{frag.text_mode} Tr {frag.line_width:.2f} w")

if frag.is_ttf_font:
mapped_text = ""
for char in frag.string:
uni = ord(char)
mapped_text += chr(frag.font.subset.pick(uni))
if word_spacing:
# "Tw" only has an effect on the ASCII space character and ignores
# space characters from unicode (TTF) fonts. As a workaround,
# we do word spacing using an adjustment before each space.
# Determine the index of the space character (" ") in the current
# subset and split words whenever this mapping code is found
words = mapped_text.split(chr(frag.font.subset.pick(ord(" "))))
words_strl = []
for word_i, word in enumerate(words):
# pylint: disable=redefined-loop-name
word = escape_parens(
word.encode("utf-16-be").decode("latin-1")
)
if word_i == 0:
words_strl.append(f"({word})")
else:
adj = -(frag_ws * frag.k) * 1000 / frag.font_size_pt
words_strl.append(f"{adj:.3f}({u_space}{word})")
escaped_text = " ".join(words_strl)
sl.append(f"[{escaped_text}] TJ")
else:
escaped_text = escape_parens(
mapped_text.encode("utf-16-be").decode("latin-1")
)
sl.append(f"({escaped_text}) Tj")
else: # core fonts
if frag_ws != current_ws:
sl.append(f"{frag_ws * frag.k:.3f} Tw")
current_ws = frag_ws
escaped_text = escape_parens(frag.string)
sl.append(f"({escaped_text}) Tj")
r_text = frag.render_pdf_text(
frag_ws,
current_ws,
word_spacing,
self.x + dx + s_width,
self.y + (0.5 * h + 0.3 * max_font_size),
self.h,
)
if r_text:
sl.append(r_text)

frag_width = frag.get_width(
initial_cs=i != 0
) + word_spacing * frag.characters.count(" ")
if frag.underline:
underlines.append(
(self.x + dx + s_width, frag_width, frag.font, frag.font_size)
(
self.x + dx + s_width,
frag_width,
frag.font,
frag.font_size,
)
)
if frag.link:
self.link(
@@ -2964,6 +2999,8 @@ def _render_styled_text_line(
h=frag.font_size,
link=frag.link,
)
if not frag.is_ttf_font:
current_ws = frag_ws
s_width += frag_width

sl.append("ET")
9 changes: 9 additions & 0 deletions fpdf/graphics_state.py
Original file line number Diff line number Diff line change
@@ -44,6 +44,7 @@ def __init__(self, *args, **kwargs):
sup_lift=0.4,
nom_lift=0.2,
denom_lift=0.0,
_text_shaping=None,
),
]
super().__init__(*args, **kwargs)
@@ -313,6 +314,14 @@ def denom_lift(self, v):
"""
self.__statestack[-1]["denom_lift"] = float(v)

@property
def _text_shaping(self):
return self.__statestack[-1]["_text_shaping"]

@_text_shaping.setter
def _text_shaping(self, v):
self.__statestack[-1]["_text_shaping"] = v

def font_face(self):
"""
Return a `fpdf.fonts.FontFace` instance
128 changes: 120 additions & 8 deletions fpdf/line_break.py
Original file line number Diff line number Diff line change
@@ -10,6 +10,8 @@

from .enums import CharVPos, WrapMode
from .errors import FPDFException
from .fonts import CoreFont, TTFFont
from .util import escape_parens

SOFT_HYPHEN = "\u00ad"
HYPHEN = "\u002d"
@@ -45,7 +47,7 @@ def __repr__(self):
)

@property
def font(self):
def font(self) -> Union[CoreFont, TTFFont]:
return self.graphics_state["current_font"]

@font.setter
@@ -133,6 +135,10 @@ def lift(self):
lift = 0.0
return lift * self.graphics_state["font_size_pt"]

@property
def _text_shaping(self):
return self.graphics_state["_text_shaping"]

@property
def string(self):
return "".join(self.characters)
@@ -170,22 +176,20 @@ def get_width(

if chars is None:
chars = self.characters[start:end]
if self.is_ttf_font:
w = sum(self.font.cw[ord(c)] for c in chars)
else:
w = sum(self.font.cw[c] for c in chars)
(char_len, w) = self.font.get_text_width(
chars, self.font_size_pt, self._text_shaping
)
char_spacing = self.char_spacing
if self.font_stretching != 100:
w *= self.font_stretching * 0.01
char_spacing *= self.font_stretching * 0.01
w *= self.font_size_pt * 0.001
if self.char_spacing != 0:
# initial_cs must be False if the fragment is located at the
# beginning of a text object, because the first char won't get spaced.
if initial_cs:
w += char_spacing * len(chars)
w += char_spacing * char_len
else:
w += char_spacing * (len(chars) - 1)
w += char_spacing * (char_len - 1)
return w / self.k

def get_character_width(self, character: str, print_sh=False, initial_cs=True):
@@ -197,6 +201,114 @@ def get_character_width(self, character: str, print_sh=False, initial_cs=True):
character = HYPHEN
return self.get_width(chars=character, initial_cs=initial_cs)

def render_pdf_text(self, frag_ws, current_ws, word_spacing, adjust_x, adjust_y, h):
if self.is_ttf_font:
if self._text_shaping:
return self.render_with_text_shaping(
adjust_x, adjust_y, h, word_spacing, self._text_shaping
)
return self.render_pdf_text_ttf(frag_ws, word_spacing)
return self.render_pdf_text_core(frag_ws, current_ws)

def render_pdf_text_ttf(self, frag_ws, word_spacing):
ret = ""
mapped_text = ""
for char in self.string:
mapped_char = self.font.subset.pick(ord(char))
if mapped_char:
mapped_text += chr(mapped_char)
if word_spacing:
# do this once in advance
u_space = escape_parens(" ".encode("utf-16-be").decode("latin-1"))

# According to the PDF reference, word spacing shall be applied to every
# occurrence of the single-byte character code 32 in a string when using
# a simple font or a composite font that defines code 32 as a single-byte code.
# It shall not apply to occurrences of the byte value 32 in multiple-byte codes.
# FPDF uses 2 bytes per character (UTF-16-BE encoding) so the "Tw" operator doesn't work
# As a workaround, we do word spacing using an adjustment before each space.
# Determine the index of the space character (" ") in the current
# subset and split words whenever this mapping code is found
#
words = mapped_text.split(chr(self.font.subset.pick(ord(" "))))
words_strl = []
for word_i, word in enumerate(words):
# pylint: disable=redefined-loop-name
word = escape_parens(word.encode("utf-16-be").decode("latin-1"))
if word_i == 0:
words_strl.append(f"({word})")
else:
adj = -(frag_ws * self.k) * 1000 / self.font_size_pt
words_strl.append(f"{adj:.3f}({u_space}{word})")
escaped_text = " ".join(words_strl)
ret += f"[{escaped_text}] TJ"
else:
escaped_text = escape_parens(
mapped_text.encode("utf-16-be").decode("latin-1")
)
ret += f"({escaped_text}) Tj"
return ret

def render_with_text_shaping(
self, pos_x, pos_y, h, word_spacing, text_shaping_parms
):
ret = ""
text = ""
space_mapped_code = self.font.subset.pick(ord(" "))

def adjust_pos(pos):
return (
pos
* self.font.scale
* self.font_size_pt
* (self.font_stretching / 100)
/ 1000
/ self.k
)

char_spacing = self.char_spacing * (self.font_stretching / 100) / self.k
for ti in self.font.shape_text(
self.string, self.font_size_pt, text_shaping_parms
):
if ti["mapped_char"] is None: # Missing glyph
continue
char = chr(ti["mapped_char"]).encode("utf-16-be").decode("latin-1")
if ti["x_offset"] != 0 or ti["y_offset"] != 0:
if text:
ret += f"({text}) Tj "
text = ""
offsetx = pos_x + adjust_pos(ti["x_offset"])
offsety = pos_y - adjust_pos(ti["y_offset"])
ret += (
f"1 0 0 1 {(offsetx) * self.k:.2f} {(h - offsety) * self.k:.2f} Tm "
)
text += char
pos_x += adjust_pos(ti["x_advance"]) + char_spacing
pos_y += adjust_pos(ti["y_advance"])
if word_spacing and ti["mapped_char"] == space_mapped_code:
pos_x += word_spacing

# if only moving "x" we don't need to move the text matrix
if ti["force_positioning"] or (
word_spacing and ti["mapped_char"] == space_mapped_code
):
if text:
ret += f"({text}) Tj "
text = ""
ret += f"1 0 0 1 {(pos_x) * self.k:.2f} {(h - pos_y) * self.k:.2f} Tm "

if text:
ret += f"({text}) Tj"
return ret

def render_pdf_text_core(self, frag_ws, current_ws):
ret = ""
if frag_ws != current_ws:
ret += f"{frag_ws * self.k:.3f} Tw "
escaped_text = escape_parens(self.string)
ret += f"({escaped_text}) Tj"
return ret


class TextLine(NamedTuple):
fragments: tuple
107 changes: 42 additions & 65 deletions fpdf/output.py
Original file line number Diff line number Diff line change
@@ -22,7 +22,6 @@
from .syntax import create_list_string as pdf_list
from .syntax import iobj_ref as pdf_ref

from fontTools import ttLib
from fontTools import subset as ftsubset

try:
@@ -537,35 +536,14 @@ def _add_fonts(self):
elif font.type == "TTF":
fontname = f"MPDFAA+{font.name}"

# unicode_char -> new_code_char map for chars embedded in the PDF
uni_to_new_code_char = font.subset.dict()

# why we delete 0-element?
del uni_to_new_code_char[0]

# ---- FONTTOOLS SUBSETTER ----
# recalcTimestamp=False means that it doesn't modify the "modified" timestamp in head table
# if we leave recalcTimestamp=True the tests will break every time
fonttools_font = ttLib.TTFont(
file=font.ttffile, recalcTimestamp=False, fontNumber=0, lazy=True
)

# 1. get all glyphs in PDF
cmap = fonttools_font["cmap"].getBestCmap()
glyph_names = [
cmap[unicode] for unicode in uni_to_new_code_char if unicode in cmap
]
glyph_names = font.subset.get_all_glyph_names()

missing_glyphs = [
chr(unicode)
for unicode in uni_to_new_code_char
if unicode not in cmap
]
if len(missing_glyphs) > 0:
if len(font.missing_glyphs) > 0:
LOGGER.warning(
"Font %s is missing the following glyphs: %s",
fontname,
", ".join(missing_glyphs),
", ".join(chr(x) for x in font.missing_glyphs),
)

# 2. make a subset
@@ -586,26 +564,23 @@ def _add_fonts(self):
]
subsetter = ftsubset.Subsetter(options)
subsetter.populate(glyphs=glyph_names)
subsetter.subset(fonttools_font)
subsetter.subset(font.ttfont)

# 3. make codeToGlyph
# is a map Character_ID -> Glyph_ID
# it's used for associating glyphs to new codes
# this basically takes the old code of the character
# take the glyph associated with it
# and then associate to the new code the glyph associated with the old code
code_to_glyph = {}
for code, new_code_mapped in uni_to_new_code_char.items():
# notdef is associated if no glyph was associated to the old code
# it's not necessary to do this, it seems to be done by default
glyph_name = cmap.get(code, ".notdef")
code_to_glyph[new_code_mapped] = fonttools_font.getGlyphID(
glyph_name
)

code_to_glyph = {
font.subset._map[glyph]: font.ttfont.getGlyphID(glyph.glyph_name)
for glyph in font.subset._map.keys()
}

# 4. return the ttfile
output = BytesIO()
fonttools_font.save(output)
font.ttfont.save(output)

output.seek(0)
ttfontstream = output.read()
@@ -624,7 +599,7 @@ def _add_fonts(self):
subtype="CIDFontType2",
base_font=fontname,
d_w=font.desc.missing_width,
w=_tt_font_widths(font, max(uni_to_new_code_char)),
w=_tt_font_widths(font),
)
self._add_pdf_obj(cid_font_obj, "fonts")
composite_font_obj.descendant_fonts = PDFArray([cid_font_obj])
@@ -634,17 +609,21 @@ def _add_fonts(self):
# character that each used 16-bit code belongs to. It
# allows searching the file and copying text from it.
bfChar = []
uni_to_new_code_char = font.subset.dict()
for code, code_mapped in uni_to_new_code_char.items():
if code > 0xFFFF:

def format_code(unicode):
if unicode > 0xFFFF:
# Calculate surrogate pair
code_high = 0xD800 | (code - 0x10000) >> 10
code_low = 0xDC00 | (code & 0x3FF)
bfChar.append(
f"<{code_mapped:04X}> <{code_high:04X}{code_low:04X}>\n"
)
else:
bfChar.append(f"<{code_mapped:04X}> <{code:04X}>\n")
code_high = 0xD800 | (unicode - 0x10000) >> 10
code_low = 0xDC00 | (unicode & 0x3FF)
return f"{code_high:04X}{code_low:04X}"
return f"{unicode:04X}"

for glyph, code_mapped in font.subset._map.items():
if len(glyph.unicode) == 0:
continue
bfChar.append(
f'<{code_mapped:04X}> <{"".join(format_code(code) for code in glyph.unicode)}>\n'
)

to_unicode_obj = PDFContentStream(
"/CIDInit /ProcSet findresource begin\n"
@@ -699,6 +678,8 @@ def _add_fonts(self):
self._add_pdf_obj(font_file_cs_obj, "fonts")
font_descriptor_obj.font_file2 = font_file_cs_obj

font.close()

return font_objs_per_index

def _add_images(self):
@@ -957,48 +938,44 @@ def _log_final_sections_sizes(self):
LOGGER.debug("- %s: %s", label, _sizeof_fmt(section_size))


def _tt_font_widths(font, maxUni):
def _tt_font_widths(font):
rangeid = 0
range_ = {}
range_interval = {}
prevcid = -2
prevwidth = -1
interval = False
startcid = 1
cwlen = maxUni + 1

# for each character
subset = font.subset.dict()
for cid in range(startcid, cwlen):
char_width = font.cw[cid]
cid_mapped = subset.get(cid)
if cid_mapped is None:
continue

# Glyphs sorted by mapped character id
glyphs = dict(sorted(font.subset._map.items(), key=lambda item: item[1]))

for glyph in glyphs:
cid_mapped = glyphs[glyph]
if cid_mapped == (prevcid + 1):
if char_width == prevwidth:
if char_width == range_[rangeid][0]:
range_.setdefault(rangeid, []).append(char_width)
if glyph.glyph_width == prevwidth:
if glyph.glyph_width == range_[rangeid][0]:
range_.setdefault(rangeid, []).append(glyph.glyph_width)
else:
range_[rangeid].pop()
# new range
rangeid = prevcid
range_[rangeid] = [prevwidth, char_width]
range_[rangeid] = [prevwidth, glyph.glyph_width]
interval = True
range_interval[rangeid] = True
else:
if interval:
# new range
rangeid = cid_mapped
range_[rangeid] = [char_width]
range_[rangeid] = [glyph.glyph_width]
else:
range_[rangeid].append(char_width)
range_[rangeid].append(glyph.glyph_width)
interval = False
else:
rangeid = cid_mapped
range_[rangeid] = [char_width]
range_[rangeid] = [glyph.glyph_width]
interval = False
prevcid = cid_mapped
prevwidth = char_width
prevwidth = glyph.glyph_width
prevk = -1
nextk = -1
prevint = False
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
@@ -129,6 +129,7 @@ nav:
- 'Page breaks': 'PageBreaks.md'
- 'Text styling': 'TextStyling.md'
- 'Unicode': 'Unicode.md'
- 'Text Shaping': 'TextShaping.md'
- 'Emojis, Symbols & Dingbats': 'EmojisSymbolsDingbats.md'
- 'HTML': 'HTML.md'
- 'Graphics Content':
12 changes: 11 additions & 1 deletion scripts/verapdf-ignore.json
Original file line number Diff line number Diff line change
@@ -7,6 +7,10 @@
"6.2.3-2": "REASON: fpdf2 does not currently support PDF/A",
"6.2.3-3": "REASON: fpdf2 does not currently support PDF/A",
"6.2.3-4": "REASON: fpdf2 does not currently support PDF/A",
"6.2.3.2-1": "REASON: fpdf2 does not currently support PDF/A",
"6.2.3.3-1": "REASON: fpdf2 does not currently support PDF/A",
"6.2.3.3-2": "REASON: fpdf2 does not currently support PDF/A",
"6.2.3.3-3": "REASON: fpdf2 does not currently support PDF/A",
"6.3.4-1": "REASON: fpdf2 still allows using the PostScript standard 14 fonts. Quoting PDF 1.7 spec from 2006: Beginning with PDF 1.5, the special treatment given to the standard 14 fonts is deprecated. All fonts used in a PDF document should be represented using a com- plete font descriptor. For backwards capability, viewer applications must still provide the special treatment identified for the standard 14 fonts.",
"6.3.5-3": "FIXME: corresponding GitHub issue -> https://github.com/PyFPDF/fpdf2/issues/88",
"6.4-1": "REASON: enabled by default, can be disabled by setting pdf.allow_images_transparency = False",
@@ -20,9 +24,15 @@
"6.6.1-1": "REASON: fpdf2 allows to create Launch actions that VeraPDF forbid arbitrarily",
"6.7.2-1": "REASON: setting XML/XMP metadata is entirely optional with fpdf2",
"6.7.3-1": "REASON: setting XML/XMP metadata is entirely optional with fpdf2",
"6.7.3-2": "REASON: setting XML/XMP metadata is entirely optional with fpdf2",
"6.7.3-3": "REASON: setting XML/XMP metadata is entirely optional with fpdf2",
"6.7.3-4": "REASON: setting XML/XMP metadata is entirely optional with fpdf2",
"6.7.3-5": "REASON: setting XML/XMP metadata is entirely optional with fpdf2",
"6.7.3-6": "REASON: setting XML/XMP metadata is entirely optional with fpdf2",
"6.7.3-7": "REASON: setting XML/XMP metadata is entirely optional with fpdf2",
"6.7.11-1": "REASON: up to fpdf2 v2.3.2, test_xmp_metadata included the PDF/A version and conformance level of the file, but then it started to break PDF Checker does-not-conform-to-claimed-pdfa-type rule. PENDING proper support for PDF/A",
"6.9-2": "REASON: false positive on test/signing/sign_pkcs12.pdf",
"6.1.11-1": "REASON: /EF is allowed in order for fpdf2 to be able to embed files",
"6.1.11-2": "REASON: /EmbeddedFiles is allowed in order for fpdf2 to be able to embed files"
}
}
}
Binary file modified test/embed_file_all_optionals.pdf
Binary file not shown.
Binary file modified test/embed_file_self.pdf
Binary file not shown.
Binary file modified test/encryption/encrypt_fonts.pdf
Binary file not shown.
Binary file modified test/file_attachment_annotation.pdf
Binary file not shown.
Binary file modified test/fonts/add_font_unicode.pdf
Binary file not shown.
Binary file modified test/fonts/charmap_first_999_chars-DejaVuSans-Oblique.pdf
Binary file not shown.
Binary file modified test/fonts/charmap_first_999_chars-DejaVuSans.pdf
Binary file not shown.
Binary file modified test/fonts/charmap_first_999_chars-DejaVuSansMono.pdf
Binary file not shown.
Binary file modified test/fonts/charmap_first_999_chars-DroidSansFallback.pdf
Binary file not shown.
Binary file modified test/fonts/charmap_first_999_chars-Quicksand-Regular.pdf
Binary file not shown.
Binary file modified test/fonts/charmap_first_999_chars-Roboto-Regular.pdf
Binary file not shown.
Binary file modified test/fonts/charmap_first_999_chars-TwitterEmoji.pdf
Binary file not shown.
Binary file modified test/fonts/charmap_first_999_chars-Waree.pdf
Binary file not shown.
Binary file modified test/fonts/charmap_first_999_chars-cmss12.pdf
Binary file not shown.
Binary file modified test/fonts/fallback_font.pdf
Binary file not shown.
Binary file modified test/fonts/fallback_font_ignore_style.pdf
Binary file not shown.
Binary file modified test/fonts/fallback_font_with_overriden_get_fallback_font.pdf
Binary file not shown.
Binary file modified test/fonts/fonts_emoji_glyph.pdf
Binary file not shown.
Binary file modified test/fonts/fonts_otf.pdf
Binary file not shown.
Binary file modified test/fonts/fonts_remap_nb.pdf
Binary file not shown.
Binary file modified test/fonts/fonts_two_mappings.pdf
Binary file not shown.
Binary file modified test/fonts/render_en_dash.pdf
Binary file not shown.
28 changes: 1 addition & 27 deletions test/fonts/test_font_remap.py
Original file line number Diff line number Diff line change
@@ -1,38 +1,12 @@
from pathlib import Path

from fpdf import FPDF
from fpdf.fonts import SubsetMap

from test.conftest import assert_pdf_equal

HERE = Path(__file__).resolve().parent


def test_subset_map():
subset_map = SubsetMap(range(0, 1024, 2))
assert len(subset_map.dict()) == 512

for i in range(0, 1024, 2):
assert i % 2 == 0
assert i == subset_map.pick(i)

for i in range(1023, 512, -2):
assert subset_map.pick(i) % 2 == 1
assert len(subset_map.dict()) == 512 + 256

for i in range(1, 1000, 2):
assert subset_map.pick(i) % 2 == 1

assert len(subset_map.dict()) == 1024

subset_dict = subset_map.dict()
for i in subset_dict:
for j in subset_dict:
if i != j:
assert subset_dict[i] != subset_dict[j]
else:
assert subset_dict[i] == subset_dict[i]


def test_emoji_glyph(tmp_path):
pdf = FPDF()

Binary file modified test/fonts/thai_text.pdf
Binary file not shown.
Binary file modified test/html/html_custom_pre_code_font.pdf
Binary file not shown.
Binary file modified test/html/html_heading_hebrew.pdf
Binary file not shown.
Binary file modified test/html/issue_156.pdf
Binary file not shown.
Binary file modified test/outline/russian_heading.pdf
Binary file not shown.
1 change: 1 addition & 0 deletions test/requirements.txt
Original file line number Diff line number Diff line change
@@ -12,3 +12,4 @@ pytest-cov
qrcode
semgrep
tabula-py
uharfbuzz
Binary file modified test/table/table_with_ttf_font.pdf
Binary file not shown.
Binary file modified test/table/table_with_ttf_font_and_headings.pdf
Binary file not shown.
Binary file modified test/text/cell_curfont_leak.pdf
Binary file not shown.
Binary file modified test/text/cell_markdown_right_aligned.pdf
Binary file not shown.
Binary file modified test/text/cell_markdown_with_ttf_fonts.pdf
Binary file not shown.
Binary file modified test/text/multi_cell_char_spacing.pdf
Binary file not shown.
Binary file modified test/text/multi_cell_font_leakage.pdf
Binary file not shown.
Binary file modified test/text/multi_cell_font_stretching.pdf
Binary file not shown.
Binary file modified test/text/multi_cell_j_paragraphs.pdf
Binary file not shown.
Binary file modified test/text/multi_cell_justified_with_unicode_font.pdf
Binary file not shown.
Binary file modified test/text/multi_cell_markdown_with_ttf_fonts.pdf
Binary file not shown.
Binary file modified test/text/text_positioning.pdf
Binary file not shown.
Binary file modified test/text/varfrags_fonts.pdf
Binary file not shown.
Binary file modified test/text/write_font_stretching.pdf
Binary file not shown.
Binary file added test/text_shaping/Dumbledor3Thin.ttf
Binary file not shown.
Binary file added test/text_shaping/FiraCode-Regular.ttf
Binary file not shown.
Binary file not shown.
Binary file added test/text_shaping/Mangal 400.ttf
Binary file not shown.
Binary file added test/text_shaping/SBL_Hbrw.ttf
Binary file not shown.
Binary file added test/text_shaping/ViaodaLibre-Regular.ttf
Binary file not shown.
Empty file added test/text_shaping/__init__.py
Empty file.
Binary file added test/text_shaping/arabic.pdf
Binary file not shown.
Binary file added test/text_shaping/features.pdf
Binary file not shown.
Binary file added test/text_shaping/hebrew_diacritics.pdf
Binary file not shown.
Binary file added test/text_shaping/kerning.pdf
Binary file not shown.
Binary file added test/text_shaping/ligatures.pdf
Binary file not shown.
Binary file not shown.
Binary file added test/text_shaping/shaping_hindi.pdf
Binary file not shown.
141 changes: 141 additions & 0 deletions test/text_shaping/test_text_shaping.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
from pathlib import Path

from fpdf import FPDF
from test.conftest import assert_pdf_equal

HERE = Path(__file__).resolve().parent
FONTS_DIR = HERE.parent / "fonts"


def test_indi_text(tmp_path):
# issue #365
pdf = FPDF()
pdf.add_page()
pdf.add_font(family="Mangal", fname=HERE / "Mangal 400.ttf")
pdf.set_font("Mangal", size=40)
pdf.set_text_shaping(False)
pdf.cell(txt="इण्टरनेट पर हिन्दी के साधन", new_x="LEFT", new_y="NEXT")
pdf.ln()
pdf.set_text_shaping(True)
pdf.cell(txt="इण्टरनेट पर हिन्दी के साधन", new_x="LEFT", new_y="NEXT")

assert_pdf_equal(pdf, HERE / "shaping_hindi.pdf", tmp_path)


def test_text_replacement(tmp_path):
pdf = FPDF()
pdf.add_page()
pdf.add_font(family="FiraCode", fname=HERE / "FiraCode-Regular.ttf")
pdf.set_font("FiraCode", size=40)
pdf.set_text_shaping(False)
pdf.cell(txt="http://www 3 >= 2 != 1", new_x="LEFT", new_y="NEXT")
pdf.ln()
pdf.set_text_shaping(True)
pdf.cell(txt="http://www 3 >= 2 != 1", new_x="LEFT", new_y="NEXT")

assert_pdf_equal(pdf, HERE / "text_replacement.pdf", tmp_path)


def test_kerning(tmp_path):
# issue #812
pdf = FPDF()
pdf.add_page()
pdf.add_font(family="Dumbledor3Thin", fname=HERE / "Dumbledor3Thin.ttf")
pdf.set_font("Dumbledor3Thin", size=40)
pdf.set_text_shaping(False)
pdf.cell(txt="Ты То Тф Та Тт Ти", new_x="LEFT", new_y="NEXT")
pdf.ln()
pdf.set_text_shaping(True)
pdf.cell(txt="Ты То Тф Та Тт Ти", new_x="LEFT", new_y="NEXT")

assert_pdf_equal(pdf, HERE / "kerning.pdf", tmp_path)


def test_hebrew_diacritics(tmp_path):
# issue #549
pdf = FPDF()
pdf.add_page()
pdf.add_font(family="SBL_Hbrw", fname=HERE / "SBL_Hbrw.ttf")
pdf.set_font("SBL_Hbrw", size=40)
pdf.set_text_shaping(False)
pdf.cell(txt="בּ", new_x="LEFT", new_y="NEXT")
pdf.ln()
pdf.set_text_shaping(True)
pdf.cell(txt="בּ", new_x="LEFT", new_y="NEXT")

assert_pdf_equal(pdf, HERE / "hebrew_diacritics.pdf", tmp_path)


def test_ligatures(tmp_path):
pdf = FPDF()
pdf.add_page()
pdf.add_font(family="ViaodaLibre", fname=HERE / "ViaodaLibre-Regular.ttf")
pdf.set_font("ViaodaLibre", size=40)
pdf.set_text_shaping(False)
pdf.cell(txt="final soft stuff", new_x="LEFT", new_y="NEXT")
pdf.ln()
pdf.set_text_shaping(True)
pdf.cell(txt="final soft stuff", new_x="LEFT", new_y="NEXT")

assert_pdf_equal(pdf, HERE / "ligatures.pdf", tmp_path)


def test_arabic_right_to_left(tmp_path):
# issue #549
pdf = FPDF()
pdf.add_page()
pdf.add_font(
family="KFGQPC", fname=HERE / "KFGQPC Uthmanic Script HAFS Regular.otf"
)
pdf.set_font("KFGQPC", size=36)
pdf.set_text_shaping(False)
pdf.cell(txt="مثال على اللغة العربية. محاذاة لليمين.", new_x="LEFT", new_y="NEXT")
pdf.ln(36)
pdf.set_text_shaping(True)
pdf.cell(txt="مثال على اللغة العربية. محاذاة لليمين.", new_x="LEFT", new_y="NEXT")

assert_pdf_equal(pdf, HERE / "arabic.pdf", tmp_path)


def test_multi_cell_markdown_with_shaping(tmp_path):
pdf = FPDF()
pdf.add_page()
pdf.add_font("Roboto", "", FONTS_DIR / "Roboto-Regular.ttf")
pdf.add_font("Roboto", "B", FONTS_DIR / "Roboto-Bold.ttf")
pdf.add_font("Roboto", "I", FONTS_DIR / "Roboto-Italic.ttf")
pdf.set_font("Roboto", size=32)
pdf.set_text_shaping(True)
text = ( # Some text where styling occur over line breaks:
# pylint: disable=implicit-str-concat
"Lorem ipsum dolor, **consectetur adipiscing** elit,"
" eiusmod __tempor incididunt__ ut labore et dolore --magna aliqua--."
)
pdf.multi_cell(
w=pdf.epw, txt=text, markdown=True
) # This is tricky to get working well
pdf.ln()
pdf.multi_cell(w=pdf.epw, txt=text, markdown=True, align="L")
assert_pdf_equal(pdf, HERE / "multi_cell_markdown_with_styling.pdf", tmp_path)


def test_features(tmp_path):
pdf = FPDF()
pdf.add_page()
pdf.add_font(family="ViaodaLibre", fname=HERE / "ViaodaLibre-Regular.ttf")
pdf.set_font("ViaodaLibre", size=40)
pdf.set_text_shaping(use_shaping_engine=True)
pdf.cell(txt="final soft stuff", new_x="LEFT", new_y="NEXT")
pdf.ln()
pdf.set_text_shaping(use_shaping_engine=True, features={"liga": False})
pdf.cell(txt="final soft stuff", new_x="LEFT", new_y="NEXT")
pdf.ln()
pdf.set_text_shaping(use_shaping_engine=True, features={"kern": False})
pdf.cell(txt="final soft stuff", new_x="LEFT", new_y="NEXT")
pdf.ln()
pdf.set_text_shaping(
use_shaping_engine=True, direction="rtl", script="Latn", language="en-us"
)
pdf.cell(txt="final soft stuff", new_x="LEFT", new_y="NEXT")
pdf.ln()

assert_pdf_equal(pdf, HERE / "features.pdf", tmp_path)
Binary file added test/text_shaping/text_replacement.pdf
Binary file not shown.

0 comments on commit b671cb6

Please sign in to comment.