Text regions layout management (#897)

* first attempt at text regions * dynamic line break width * fix some existing tests * TextCollectorMixin + Paragraph * small fixes * small fixes * basic TextRegion working * Regions with Paragraphs, Fragments with align instead of justify * Columns docu and FPDF integration * paragraph docs * formatting * Delete .TextRegion.md.swo * Allow initial text argument for text regions * column bottom balancing * text regions with ln() and line_height; tuto4 in en+de * remove instrumentation from tuto4 * html via text regions first round * write_html via text regions all except tables * review feedback & additional tests * more text regions documentation * formatting * html table test files * remove text_column() * change html.py and tests to text_columns() * Apply suggestions from code review Co-authored-by: Lucas Cimon <925560+Lucas-C@users.noreply.github.com> * Review feedback and other fixes. * tuto4 update * Update tuto4.py --------- Co-authored-by: Lucas Cimon <925560+Lucas-C@users.noreply.github.com>
py-pdf · Oct 10, 2023 · 26910a6 · 26910a6
1 parent 2bda3de
commit 26910a6
Show file tree

Hide file tree

Showing 105 changed files with 1,773 additions and 552 deletions.
diff --git a/.gitignore b/.gitignore
@@ -64,4 +64,5 @@ nosetests.xml
 
 # Vim backup and swap files
 *.*~
+*.swo
 *.swp
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -19,15 +19,20 @@ This can also be enabled programmatically with `warnings.simplefilter('default',
 ## [2.7.6] - Not released yet
 This release is the first performed from the [@py-pdf GitHub org](https://github.com/py-pdf), where `fpdf2` migrated.
 ### Added
+* The new experimental method `text_columns()` allows to render text within a single or multiple columns, including height balancing.
 * [`FPDF.write_html()`](https://py-pdf.github.io/fpdf2/fpdf/fpdf.html#fpdf.fpdf.FPDF.write_html) now supports heading colors defined as attributes (_e.g._ `<h2 color="#00ff00">...`)
 * [`FPDF.table()`](https://py-pdf.github.io/fpdf2/fpdf/fpdf.html#fpdf.fpdf.FPDF.table): Now supports padding in cells
 * [`FPDF.table()`](https://py-pdf.github.io/fpdf2/fpdf/fpdf.html#fpdf.fpdf.FPDF.table): Now supports vertical alignment in cells 
 * [`FPDF.table()`](https://py-pdf.github.io/fpdf2/fpdf/fpdf.html#fpdf.fpdf.FPDF.table): Now supports outer border width for rendering the outer border of the table with a different line-width.
 * documentation on how to use `livereload` to enable a "watch" mode with PDF generation: [Combine with livereload](https://py-pdf.github.io/fpdf2/CombineWithLivereload.html)
 ### Changed
+* The formatting output by `write_html()` has changed in some aspects. Vertical spacing around headings and paragraphs may be slightly different, and elements at the top of the page don't have any extra spacing above anymore.
 * [`FPDF.table()`](https://py-pdf.github.io/fpdf2/fpdf/fpdf.html#fpdf.fpdf.FPDF.table): If the height of a row is governed by an image, then the default vertical alignment of the other cells is "center". This was "top". 
 This change was made for consistency between row-height governed by text or images. The old behaviour can be enforced using the new vertical alignment parameter.  
 ### Fixed
+* In multi_cells and table cells with horizontal padding, the text was not given quite enough space.
+* write_html() can now handle formatting tags within paragraphs without adding extra line breaks (except in table cells for now).
+* the font size in HTML <pre> and <code> tags is not fixed to 11 pica anymore, but adapts to the preceding text.
 * [`FPDF.ln()`](https://py-pdf.github.io/fpdf2/fpdf/fpdf.html#fpdf.fpdf.FPDF.ln), when called before any text has been written, will now use the current font height instead of doing nothing -  _cf._ issue [#937](https://github.com/py-pdf/fpdf2/issues/937)
 * [`FPDF.image()`](https://py-pdf.github.io/fpdf2/fpdf/fpdf.html#fpdf.fpdf.FPDF.image), when provided a `BytesIO` instance, does not close it anymore - _cf._ issue [#881](https://github.com/py-pdf/fpdf2/issues/881)
 * Invalid characters were being generated when a string contains parentheses - _cf._ issue [#884](https://github.com/py-pdf/fpdf2/issues/884)

diff --git a/docs/HTML.md b/docs/HTML.md
@@ -95,10 +95,9 @@ pdf.output("html.pdf")
 
 ## Known limitations
 
-`fpdf2` HTML renderer does not support many configuration of nested tags.
+`fpdf2` HTML renderer does not support some configurations of nested tags.
 For example:
 
-* `<center>` cannot be used as a parent for several elements - _cf._ [issue #640](https://github.com/py-pdf/fpdf2/issues/640)
 * `<table>` cells can contain `<td><b><em>nested tags forming a single text block</em></b></td>`, but **not** `<td><b>arbitrarily</b> nested <em>tags</em></td>` - _cf._ [issue #845](https://github.com/py-pdf/fpdf2/issues/845)
 
 You can also check the currently open GitHub issues with the tag `html`:

diff --git a/docs/Text.md b/docs/Text.md
@@ -2,6 +2,8 @@
 
 There are several ways in fpdf to add text to a PDF document, each of which comes with its own special features and its own set of advantages and disadvantages. You will need to pick the right one for your specific task.
 
+## Simple Text Methods
+
 | method | lines | markdown support | HTML support | accepts new current position | details                                                                                                                                                         |
 | -- | :--: | :--: | :--: | :--: |-----------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | [`.text()`](#text)  | one | no | no | fixed | Inserts a single-line text string with a precise location on the base line of the font.                                                                         |
@@ -10,6 +12,11 @@ There are several ways in fpdf to add text to a PDF document, each of which come
 | [`.write()`](#write) | several | no | no | auto | Inserts a multi-line text string within the boundaries of the page margins, starting at the current x/y location (typically the end of the last inserted text). |
 | [`.write_html()`](#write_html) | several | no | yes | auto | An extension to `.write()`, with additional parsing of basic HTML tags.                                                                                         
 
+## Flowable Text Regions
+
+Text regions allow to insert flowing text into a predefined region on the page. It is possible to change the formatting and even the font within paragraphs, which will still be aligned as one text block. 
+The currently implemented type of text regions is [text_columns()](TextColumns.html), which defines one or several columns that can be filled sequentially or height-balanced.
+
 ## Typography and Language Specific Concepts 
 ### Supported Features
 With supporting Unicode fonts, fpdf2 should handle the following text shaping features correctly. More details can be found in [TextShaping](TextShaping.html).

diff --git a/docs/TextColumns.md b/docs/TextColumns.md
@@ -0,0 +1,96 @@
+_New in [:octicons-tag-24: 2.7.6](https://github.com/py-pdf/fpdf2/blob/master/CHANGELOG.md)_
+
+**Notice:** As of fpdf2 release 2.7.6, this is an experimental feature. Both the API and the functionality may change before it is finalized, without prior notice.
+
+
+## Text Columns ##
+
+The `FPDF.text_columns()` method allows to create columnar layouts, with one or several columns. Columns will always be of equal width.
+
+Beyond the parameters common to all text regions, the following are available for text columns:
+
+* l_margin (float, optional) - override the current left page margin.
+* r_margin (float, optional) - override the current right page margin.
+* ncols (float, optional) - the number of columns to generate (Default: 2).
+* gutter (float, optional) - the horizontal space required between each two columns (Default 10).
+
+
+#### Single-Column Example ####
+
+In this example an inserted paragraph is used in order to format its content with justified alignment, while the rest of the text uses the default left alignment.
+
+```python
+from fpdf import FPDF
+
+pdf = FPDF()
+pdf.add_page()
+pdf.set_font("Times", size=12)
+
+cols = pdf.text_columns()
+with cols:
+    cols.write(text=LOREM_IPSUM[:400])
+    with cols.paragraph(
+            text_align="J",
+            top_margin=pdf.font_size,
+            bottom_margin=pdf.font_size
+            ) as par:
+        par.write(text=LOREM_IPSUM[:400])
+    cols.write(text=LOREM_IPSUM[:400])
+```
+![Single Text Column](tcols-single.png)
+
+#### Multi-Column Example
+
+Here we have a layout with three columns. Note that font type and text size can be varied within a text region, while still maintaining the justified (in this case) horizontal alignment.
+
+```python
+from fpdf import FPDF
+
+pdf = FPDF()
+pdf.add_page()
+pdf.set_font("Helvetica", size=16)
+
+with pdf.text_columns(text_align="J", ncols=3, gutter=5) as cols:
+    cols.write(text=LOREM_IPSUM[:600])
+    pdf.set_font("Times", "", 18)
+    cols.write(text=LOREM_IPSUM[:500])
+    pdf.set_font("Courier", "", 20)
+    cols.write(text=LOREM_IPSUM[:500])
+```
+![Three Text Columns](tcols-three.png)
+
+#### Balanced Columns
+
+Normally the columns will be filled left to right, and if the text ends before the page is full, the rightmost column will be shorter than the others.
+If you prefer that all columns on a page end on the same height, you can use the `balance=True` argument. In that case a simple algorithm will be applied that attempts to approximately balance their bottoms.
+
+```python
+from fpdf import FPDF
+
+pdf = FPDF()
+pdf.add_page()
+pdf.set_font("Times", size=12)
+
+cols = pdf.text_columns(text_align="J", ncols=3, gutter=5, balance=True)
+# fill columns with balanced text
+with cols:
+    pdf.set_font("Times", "", 14)
+    cols.write(text=LOREM_IPSUM[:300])
+# add an image below
+img_info = pdf.image(".../fpdf2/docs/regular_polygon.png",
+        x=pdf.l_margin, w=pdf.epw)
+# continue multi-column text
+with cols:
+    cols.write(text=LOREM_IPSUM[300:600])
+```
+![Balanced Columns](tcols-balanced.png)
+
+Note that column balancing only works reliably when the font size (specifically the line height) doesn't change. If parts of the text use a larger or smaller font than the rest, then the balancing will usually be out of whack. Contributions for a more refined balancing algorithm are welcome.
+
+
+### Possible future extensions
+
+Those features are currently not supported, but Pull Requests are welcome to implement them:
+
+* Columns with differing widths (no balancing possible in this case).
+
diff --git a/docs/TextRegion.md b/docs/TextRegion.md
@@ -0,0 +1,95 @@
+_New in [:octicons-tag-24: 2.7.6](https://github.com/py-pdf/fpdf2/blob/master/CHANGELOG.md)_
+# Text Flow Regions #
+
+**Notice:** As of fpdf2 release 2.7.6, this is an experimental feature. Both the API and the functionality may change before it is finalized, without prior notice.
+
+Text regions are a hierarchy of classes that enable to flow text within a given outline. In the simplest case, it is just the running text column of a page. But it can also be a sequence of outlines, such as several parallel columns or the cells of a table. Other outlines may be combined by addition or subtraction to create more complex shapes. 
+
+There are two general categories of regions. One defines boundaries for running text that will just continue in the same manner one the next page. Those include columns and tables. The second category are distinct shapes. Examples would be a circle, a rectangle, a polygon of individual shape or even an image. They may be used individually, in combination, or to modify the outline of a multipage column. Shape regions will typically not cause a page break when they are full. In the future, a possibility to chain them may be implemented, so that a new shape will continue with the text that didn't fit into the previous one.
+
+The currently implemented text regions are:
+* [Text Columns](TextColumns.html)
+
+Other types like Table cells, shaped regions and combinations are still in the design phase, see [Quo vadis, .write()?](https://github.com/py-pdf/fpdf2/discussions/339).
+
+
+## General Operation ##
+
+Using the different region types and combination always follows the same pattern. The main difference to the normal `FPDF.write()` method is that all added text will first be buffered, and only gets rendered on the page when the context of the region is closed. This is necessary so that text can be aligned within the given boundaries even if its font, style, or size are arbitrarily varied along the way.
+
+* Create the region instance with an `FPDF` method, , for example [text_columns()](https://py-pdf.github.io/fpdf2/fpdf/fpdf.html#fpdf.fpdf.text_columns).
+<!--
+* future: (_If desired, add or subtract other shapes from it (with geometric regions)_).
+-->
+* Use the `.write()` method of this text region in order to feed text into its buffer.
+* Best practise is to use the region instance as a context manager for filling.
+    * Text will be rendered automatically after closing the context.
+    * When used as a context manager, you can change all text styling parameters within that context, and they will be used by the added text, but won't leak to the surroundings
+* Alternatively, eg. for filling a single column of text with the already existing settings, just use the region instance as is. In that case, you'll have to explicitly use the `render()` method after adding the text.
+* Within a region, paragraphs can be inserted. The primary purpose of a paragraph is to apply a different horizontal alignment than the surrounding text. It is also possible to apply margins to the top and bottom of each paragraph.
+
+![](tcols-paragraphs.png)
+
+The graphic shows the relationship of page, text areas and paragraphs (with varying alignment) for the example of a two-column layout.
+
+
+### Text Start Position ###
+
+When rendering, the vertical start position of the text will be at the lowest one out of:
+* the current y position
+* the top of the region (if it has a defined top)
+* the top margin of the page.
+
+The horizontal start position will be either at the current x position, if that lies within the boundaries of the region/column, or at the left edge of the region.
+In both horizontal and vertical positioning, regions with multiple columns may follow additional rules and restrictions.
+
+
+### Interaction between Regions ###
+
+Several region instances can exist at the same time. But only one of them can act as context manager at any given time. It is not currently possible to activate them recursively. But it is possible to use them intermittingly. This will probably most often make sense between a columnar region and a table or a graphic. You may have some running text ending at a given height, then insert a table/graphic, and finally continue the running text at the new height below the table within the existing column(s).
+
+
+### Common parameters ###
+
+All types of text regions have the following constructor parameters in common:
+
+* text (str, optional) - text content to add to the region. This is a convenience parameter for cases when all text is available in one piece, and no partition into paragraphs (possibly with different parameters) is required. (Default: None)
+* text_align (Align/str, optional) - the horizontal alignment of the text in the region. (Default: Align.L)
+* line_height (float, optional) - This is a factor by which the line spacing will be different from the font height. It works similar to the attribute of the same name in HTML/CSS. (default: 1.0)
+* print_sh (bool, optional) - Treat a soft-hyphen (\\u00ad) as a printable character, instead of a line breaking opportunity. (Default: False)
+* skip_leading_spaces (default: False) - This flag is primarily used by `write_html()`, but may also have other uses. It removes all space characters at the beginning of each line.
+* wrapmode (default "WORD") - 
+
+All of those values can be overriden for each individual paragraph.
+
+
+### Common methods ###
+
+* `.paragraph()` [see characteristics parameters below] - establish a new paragraph in the text. The text added to this paragraph will start on a new line.
+* `.write(text: str, link: = None)` - write text to the region. This is only permitted when no explicit paragraph is currently active.
+* `.ln(h: float = None)` - Start a new line moving either by the current font height or by the parameter "h". Only permitted when no explicit paragraph is currently active.
+* `.render()` - if the region is not used as a context manager with "with", this method must be called to actually process the added text.
+
+
+## Paragraphs ##
+
+The primary purpose of paragraphs is to enable variations in horizontal text alignment, while the horizontal extents of the text are managed by the text region. To set the alignment, you can use the `align` argument when creating the paragraph. Valid values are defined in the [`Align enum`](https://py-pdf.github.io/fpdf2/fpdf/enums.html#fpdf.enums.Align).
+
+For more typographical control, you can use the following arguments. Most of those override the settings of the current region when set, and default to the value set there.
+
+* text_align (Align, optional) - The horizontal alignment of the paragraph.
+* line_height (float, optional) - factor by which the line spacing will be different from the font height. (default: by region) 
+* top_margin (float, optional) -  how much spacing is added above the paragraph. No spacing will be added at the top of the paragraph if the current y position is at (or above) the top margin of the page. (Default: 0.0)
+* bottom_margin (float, optional) - Those two values determine how much spacing is added below the paragraph. No spacing will be added at the bottom if it would result in overstepping the bottom margin of the page. (Default: 0.0)
+* skip_leading_spaces (float, optional) - removes all space characters at the beginning of each line.
+* wrapmode (WrapMode, optional)
+
+Other than text regions, paragraphs should always be used as context managers and never be reused. Violating those rules may result in the entered text turning up on the page out of sequence.
+
+
+### Possible future extensions
+
+Those features are currently not supported, but Pull Requests are welcome to implement them:
+
+* per-paragraph indentation
+* first-line indentation
diff --git a/docs/Tutorial-de.md b/docs/Tutorial-de.md
@@ -134,13 +134,9 @@ Alternativ kann man auch mit der rechten Maustaste auf das Dokument klicken und
 
 [Jules Verne Text](https://github.com/py-pdf/fpdf2/raw/master/tutorial/20k_c1.txt)
 
-Der Hauptunterschied zur vorherigen Lektion ist die Verwendung der Methoden 
-[`accept_page_break`](fpdf/fpdf.html#fpdf.fpdf.FPDF.accept_page_break) und `set_col`.
+Der Hauptunterschied zur vorherigen Lektion ist die Verwendung der Methode 
+[`text_columns`](fpdf/fpdf.html#fpdf.fpdf.FPDF.text_columns). Diese sammelt zunächst allen text, auch in mehreren Teilen, und verteilt ihn anschließend auf die angegebene Anzahl an Spalten. Eventuell notwendige Seitenumbrüche werden dabei automatisch vorgenommen. Beachtenswert dabei ist, dass während die `TextColumns` instanz als Kontextmanager offen ist, Schriftstile und ander Eigenschaften frei verändert werden können. Nach schließen des Kontextes werden die Einstellungen wieder auf den vorherigen Stand zurückgesetzt.
 
-Wird [`accept_page_break`](fpdf/fpdf.html#fpdf.fpdf.FPDF.accept_page_break) verwendet, wird die aktuelle Spaltennummer überprüft, sobald 
-die Zelle den zur Auslösung eines Seitenumbruchs festgelegten Abstand zum unteren Seitenrand (Standard 2cm) überschreitet. Ist die Spaltennummer kleiner als 2 (wir haben uns entschieden, die Seite in drei Spalten zu unterteilen), wird die Methode `set_col` aufgerufen. Sie erhöht die Spaltennummer auf die nächsthöhere und setzt die Schreibposition auf den Anfang der nächsten Spalte, damit der Text dort fortgesetzt werden kann.
-
-Sobald det Text der dritten den oben beschriebenen Abstand zum Seitenende erreicht, wird durch die Methode [`accept_page_break`](fpdf/fpdf.html#fpdf.fpdf.FPDF.accept_page_break) ein Seitenumbruch ausgelöst und die aktive Spalte sowie Schreibposition zurückgesetzt.
 
 ## Lektion 5 - Tabellen erstellen ##
 

diff --git a/docs/Tutorial.md b/docs/Tutorial.md
@@ -156,16 +156,9 @@ plug-in, is to right-click and select Document Properties.
 [Jules Verne text](https://github.com/py-pdf/fpdf2/raw/master/tutorial/20k_c1.txt)
 
 The key difference from the previous tutorial is the use of the 
-[accept_page_break](fpdf/fpdf.html#fpdf.fpdf.FPDF.accept_page_break) and the set_col methods.
+[`text_columns`](fpdf/fpdf.html#fpdf.fpdf.FPDF.text_column) method. 
+It collects all the text, possibly in increments, and distributes it across the requested number of columns, automatically inserting page breaks as necessary. Note that while the `TextColumns` instance is active as a context manager, text styles and other font properties can be changed. Those changes will be contained to the context. Once it is closed the previous settings will be reinstated.
 
-Using the [accept_page_break](fpdf/fpdf.html#fpdf.fpdf.FPDF.accept_page_break) method, once 
-the cell crosses the bottom limit of the page, it will check the current column number. If it 
-is less than 2 (we chose to divide the page in three columns) it will call the set_col method, 
-increasing the column number and altering the position of the next column so the text may continue there.
-
-Once the bottom limit of the third column is reached, the 
-[accept_page_break](fpdf/fpdf.html#fpdf.fpdf.FPDF.accept_page_break) method will reset and go 
-back to the first column and trigger a page break.
 
 ## Tuto 5 - Creating Tables ##
 

diff --git a/docs/tcols-balanced.png b/docs/tcols-balanced.png
diff --git a/docs/tcols-paragraphs.png b/docs/tcols-paragraphs.png
diff --git a/docs/tcols-single.png b/docs/tcols-single.png
diff --git a/docs/tcols-three.png b/docs/tcols-three.png
-Original file line number
+Diff line change
@@ Expand Up / @@ -64,4 +64,5 @@ nosetests.xml @@
     # Vim backup and swap files
     *.*~
+    *.swo
     *.swp