Text regions layout management #897

gmischler · 2023-08-16T23:22:38Z

This is the basic start of advanced text layout functionality for fpdf2, as outlined in #339.

Checklist:

The GitHub pipeline is OK (green),
meaning that both pylint (static code analyzer) and black (code formatter) are happy with the changes of this PR.
A unit test is covering the code added / modified by this PR
This PR is ready to be merged
In case of a new feature, docstrings have been added, with also some documentation in the docs/ folder
A mention of the change is present in CHANGELOG.md

Some ripples caused in the rest of the code base:

In order to allow for more dynamic text treatment, Fragments now include the full "align" information instead of just "justify". This means that _render_styled_text_line() has lost its "align" parameter.
_preload_font_styles() will need to accept a "link" argument (it probably needs a different name too, since it does a lot more then loading fonts).
MultiLineBreak() now has a "width" parameter, which can either be a fixed width or a callback function, so that the caller can dynamically specify the text width depending on (among other things) the current line height. Consequently, get_line_of_current_width() has morphed into simply get_line().

General remarks:

I haven't yet tested it in combination with text shaping, but I see no reason for conflict between the two. In fact, the combination will allow to have multi-lingual text in one paragraph without the need to split the parts algorythmically.
I think this is a adequately clean and workable proof of concept now, but much work is left to be done.

Further tasks:

~~Update write_html() to use text regions, in order to allow for smoother formatting.~~ Done, except for tables.
Turn table cells into text regions with the same goal.
~~Use the "text" parameter of table cells for all text regions.~~ Done.
Create "shaped" text regions of various types.
Lots more...

By submitting this pull request, I confirm that my contribution is made under the terms of the GNU LGPL 3.0 license.

codecov-commenter · 2023-08-17T00:15:36Z

Codecov Report

Attention: 26 lines in your changes are missing coverage. Please review.

Comparison is base (30eb1a4) 93.42% compared to head (a343921) 93.56%.
Report is 3 commits behind head on master.

❗ Current head a343921 differs from pull request most recent head a0cbd91. Consider uploading reports for the commit a0cbd91 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #897      +/-   ##
==========================================
+ Coverage   93.42%   93.56%   +0.14%     
==========================================
  Files          27       28       +1     
  Lines        7851     8209     +358     
  Branches     1433     1500      +67     
==========================================
+ Hits         7335     7681     +346     
- Misses        322      326       +4     
- Partials      194      202       +8

Files	Coverage Δ
fpdf/__init__.py	`100.00% <100.00%> (ø)`
fpdf/graphics_state.py	`98.78% <100.00%> (+0.01%)`	⬆️
fpdf/line_break.py	`99.12% <100.00%> (+0.09%)`	⬆️
fpdf/fpdf.py	`92.57% <96.66%> (+0.08%)`	⬆️
fpdf/html.py	`95.90% <97.22%> (+2.26%)`	⬆️
fpdf/text_region.py	`92.56% <92.56%> (ø)`

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Lucas-C · 2023-08-18T08:10:12Z

This seems very promising! 😊
I'll try to review this on Sunday or Monday.

docs/TextRegion.md

fpdf/text_region.py

Lucas-C · 2023-08-21T06:50:05Z

Turn table cells into text regions with the same goal.

I think that for text regions to become a successfull addition to fpdf2,
they should be "plugged" in as many possible places (when it's meaningful).
And having tables being based on text regions would make a lot of sense to me.

Hence, maybe "plugging" text regions in tables would be a good way to check that the design you crafted for text regions is sound, and integrates well with the existing lib?

Lucas-C · 2023-08-21T06:51:58Z

I think you made an excellent work overall to design this feature
and craft an elegant system to layout text regions: good job really! 👍

gmischler · 2023-08-21T18:28:56Z

Hence, maybe "plugging" text regions in tables would be a good way to check that the design you crafted for text regions is sound, and integrates well with the existing lib?

Only the table cells will need to be converted to text regions, the rest of the table classes can largely remain the same.

In that context: All the legacy text generation methods use the parameter spelling "txt", only table cells currently use "text".
Is that a deliberate distinction?
I'me currently using the latter one as well, but I have strong doubts if that's really a good idea.

Lucas-C · 2023-08-22T05:21:51Z

In that context: All the legacy text generation methods use the parameter spelling "txt", only table cells currently use "text".
Is that a deliberate distinction?
I'me currently using the latter one as well, but I have strong doubts if that's really a good idea.

Yes, this was reported recently in #853
It was really "deliberate".
Maybe we should have both forms allowed in all methods?

gmischler · 2023-08-22T18:44:02Z

It was really "deliberate".

Why?!?
In any case, we need to define a canonical form and declare the other deprecated.

Maybe we should have both forms allowed in all methods?

I'm not very good with decorators. Can you come up with one that does the conversion if necessary, and prevents the use of both together? Then we don't have to clutter the functional code with this weirdness.

andersonhc · 2023-08-23T00:45:39Z

I'm not very good with decorators. Can you come up with one that does the conversion if necessary, and prevents the use of both together? Then we don't have to clutter the functional code with this weirdness.

I guess I would be something like this:

def deprecated_txt(func):
    def inner(*args, **kwargs):
        override_kwargs = {}
        for k in kwargs:
            if k == "txt":
                LOGGER.warning("The argument 'txt' has been deprecated. Please use 'text'.")
                override_kwargs["text"] = kwargs[k]
                continue
            override_kwargs[k] = kwargs[k]
        return func(*args, **override_kwargs)
    return inner

Lucas-C · 2023-08-25T06:59:10Z

It was really "deliberate".
Why?!?
In any case, we need to define a canonical form and declare the other deprecated.

Sorry, I meant NOT deliberate 😅
It was basically a mistake.
But we already has a discrepancy with FPDF.write_html()

I'm not very good with decorators. Can you come up with one that does the conversion if necessary, and prevents the use of both together? Then we don't have to clutter the functional code with this weirdness.

I opened #903 to do that, could you please review it?

gmischler · 2023-10-01T14:45:59Z

Progress report:

I refactored html.py to use text_column().
It took a bit of trickery to use it without any with statements, but it works.
HTML2PDF now runs within its own local context, so no local state changes are leaked outside anymore. It also took some trickery to temporarily escape that local context for actually rendering text. write_html() used to add quite a few redundant font and color changes to the PDF files, which is now prevented. The whitespace handling is now also simpler, faster, and more reliable.

Among other things, this change allows formatting changes within paragraphs without any extra line breaks, fixing #151, #640, and #930 (and improving on #91). The other visible change is that the spacing between paragraphs, as well as above and below headings, is slightly different. I had worried about this for a while, but got it surprisingly close in the end. I actually think it is mor more consistent in relation to font size now. The extra empty line at the top of many pages has also gone.

When consolidating with the recent padding changes, I stumbled over some incongruencies.
While the docstring says that c_margin is ignored when there is horizontal padding, that wasn't actually the case, and it was always applied. I fixed that, and also changed it to decide about the use of left and right c_margin independently of each other.

Some of the general changes I had to make to enable this was that text color and the total line width are now always taken from the Fragments. The different handling of text color results in a (semantically inconsequential) different sequence of commands in the PDF, making it necessary to replace many test files. MultiLineBreak now also receives a set of clearance margins, so it can apply them without affecting the maximum_width returned with the TextLine.

gmischler · 2023-10-03T07:58:26Z

Time for a little poll:
I've come to doubt if it really makes sense to have both text_column() and text_columns() next to each other.
After all, if we change the default of "ncols" to 1, then both jobs can be done with the latter (they're really just different front-ends to the same code), and we get a simpler API.
I don't remember exactly why I decided to create both, other than "because I can".
What do you guys think?

gmischler · 2023-10-03T08:32:53Z

Ready for real life?

With the HTML milestone reached, I think this would be a good time to merge the current state (after 2.7.6 is released). Since I had to make significant changes to quite a few other parts of the code base, that would also reduce potential conflicts with upcoming changes and fixes by others in those areas.
So review away!

The next milestone will be to turn table cells into text regions. In preparation for that, I'll have to come up with a good way to add images. I think the simplest way is to just treat them like a type of paragraph. With a little luck, that may get finished before releasing 2.7.7 as well.

Shaped text regions will have to wait until after that.

Lucas-C · 2023-10-07T10:35:27Z

Thank you for the progress report @gmischler!

I've come to doubt if it really makes sense to have both text_column() and text_columns() next to each other.
[...] What do you guys think?

I agree with you, and I don't really see the point of having both methods, we should probably only keep text_columns().

With the HTML milestone reached, I think this would be a good time to merge the current state (after 2.7.6 is released).

Agreed!
You really did an impressive job on this PR 👍

Reviewing 100 file changes is bit intimidating 😅
I will try to review it all on Monday morning.

docs/TextColumns.md

Co-authored-by: Lucas Cimon <925560+Lucas-C@users.noreply.github.com>

gmischler · 2023-10-10T04:23:27Z

Unless you want to exchange over some of my feedbacks, you can merge this PR whenever you are happy with it @gmischler

Ok, I am happy (and a bit exhausted...)

Lucas-C · 2023-10-10T05:39:03Z

Good job @gmischler!
👍

Lucas-C requested review from andersonhc and Lucas-C August 18, 2023 08:09

Lucas-C reviewed Aug 20, 2023

View reviewed changes

Lucas-C reviewed Aug 21, 2023

View reviewed changes

fpdf/text_region.py Show resolved Hide resolved

Lucas-C mentioned this pull request Aug 25, 2023

Deprecating txt= arg in favor of text= - solves #853 #903

Merged

gmischler mentioned this pull request Sep 8, 2023

Take a long time after choose an unicode font #907

Closed

gmischler force-pushed the TextRegion branch from 3bb29d4 to ac4335c Compare September 17, 2023 20:41

gmischler force-pushed the TextRegion branch from 7c7f77e to 184c809 Compare September 30, 2023 19:54

gmischler mentioned this pull request Sep 30, 2023

ln() does nothing at the beginning of a new file #937

Closed

gmischler force-pushed the TextRegion branch from 184c809 to 2b70b6d Compare October 1, 2023 13:15

gmischler marked this pull request as ready for review October 3, 2023 08:30

gmischler force-pushed the TextRegion branch 2 times, most recently from a1cebdc to 42efea2 Compare October 3, 2023 15:00

gmischler requested a review from Lucas-C October 6, 2023 13:03

gmischler mentioned this pull request Oct 8, 2023

Handle preceding whitespaces in write() for line break #947

Merged

4 tasks

Lucas-C reviewed Oct 9, 2023

View reviewed changes

docs/TextColumns.md Outdated Show resolved Hide resolved

Lucas-C reviewed Oct 9, 2023

View reviewed changes

docs/TextColumns.md Outdated Show resolved Hide resolved

gmischler and others added 19 commits October 10, 2023 00:33

Regions with Paragraphs, Fragments with align instead of justify

97b2bbb

Columns docu and FPDF integration

0e294ba

paragraph docs

1110724

formatting

7ee76c8

Delete .TextRegion.md.swo

70061aa

Allow initial text argument for text regions

d90c63f

column bottom balancing

0887029

text regions with ln() and line_height; tuto4 in en+de

a22427e

remove instrumentation from tuto4

0854e5c

html via text regions first round

a3f510a

write_html via text regions all except tables

98eebd6

review feedback & additional tests

680a94d

more text regions documentation

27ad99d

formatting

3a13b7b

html table test files

80b16de

remove text_column()

81306ec

change html.py and tests to text_columns()

f0de05c

Apply suggestions from code review

94a77c6

Co-authored-by: Lucas Cimon <925560+Lucas-C@users.noreply.github.com>

Review feedback and other fixes.

0e0a1c7

gmischler force-pushed the TextRegion branch from 9175d31 to 0e0a1c7 Compare October 9, 2023 22:37

gmischler added 2 commits October 10, 2023 00:42

tuto4 update

452de79

Update tuto4.py

67b77b3

gmischler merged commit 26910a6 into py-pdf:master Oct 10, 2023
7 of 10 checks passed

This was referenced Oct 10, 2023

Bug: <center> does not support internal HTML tags #640

Closed

Right-aligned HTML paragraph breaks inline font styling #151

Closed

Q: Does fpdf2 support different font size in .multi_cell()? #786

Closed

gmischler deleted the TextRegion branch October 12, 2023 14:59

Lucas-C mentioned this pull request May 24, 2024

write_html: support <sup> & <sup> tags inside <table> #860

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text regions layout management #897

Text regions layout management #897

gmischler commented Aug 16, 2023 •

edited

Loading

codecov-commenter commented Aug 17, 2023 •

edited

Loading

Lucas-C commented Aug 18, 2023

Lucas-C commented Aug 21, 2023

Lucas-C commented Aug 21, 2023

gmischler commented Aug 21, 2023

Lucas-C commented Aug 22, 2023

gmischler commented Aug 22, 2023

andersonhc commented Aug 23, 2023

Lucas-C commented Aug 25, 2023 •

edited

Loading

gmischler commented Oct 1, 2023

gmischler commented Oct 3, 2023

gmischler commented Oct 3, 2023

Lucas-C commented Oct 7, 2023 •

edited

Loading

gmischler commented Oct 10, 2023

Lucas-C commented Oct 10, 2023

Text regions layout management #897

Text regions layout management #897

Conversation

gmischler commented Aug 16, 2023 • edited Loading

codecov-commenter commented Aug 17, 2023 • edited Loading

Codecov Report

Lucas-C commented Aug 18, 2023

Lucas-C commented Aug 21, 2023

Lucas-C commented Aug 21, 2023

gmischler commented Aug 21, 2023

Lucas-C commented Aug 22, 2023

gmischler commented Aug 22, 2023

andersonhc commented Aug 23, 2023

Lucas-C commented Aug 25, 2023 • edited Loading

gmischler commented Oct 1, 2023

gmischler commented Oct 3, 2023

gmischler commented Oct 3, 2023

Lucas-C commented Oct 7, 2023 • edited Loading

gmischler commented Oct 10, 2023

Lucas-C commented Oct 10, 2023

gmischler commented Aug 16, 2023 •

edited

Loading

codecov-commenter commented Aug 17, 2023 •

edited

Loading

Lucas-C commented Aug 25, 2023 •

edited

Loading

Lucas-C commented Oct 7, 2023 •

edited

Loading