Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text regions layout management #897

Merged
merged 28 commits into from
Oct 10, 2023
Merged

Conversation

gmischler
Copy link
Collaborator

@gmischler gmischler commented Aug 16, 2023

This is the basic start of advanced text layout functionality for fpdf2, as outlined in #339.

Checklist:

  • The GitHub pipeline is OK (green),
    meaning that both pylint (static code analyzer) and black (code formatter) are happy with the changes of this PR.
  • A unit test is covering the code added / modified by this PR
  • This PR is ready to be merged
  • In case of a new feature, docstrings have been added, with also some documentation in the docs/ folder
  • A mention of the change is present in CHANGELOG.md

Some ripples caused in the rest of the code base:

  • In order to allow for more dynamic text treatment, Fragments now include the full "align" information instead of just "justify". This means that _render_styled_text_line() has lost its "align" parameter.
  • _preload_font_styles() will need to accept a "link" argument (it probably needs a different name too, since it does a lot more then loading fonts).
  • MultiLineBreak() now has a "width" parameter, which can either be a fixed width or a callback function, so that the caller can dynamically specify the text width depending on (among other things) the current line height. Consequently, get_line_of_current_width() has morphed into simply get_line().

General remarks:

  • I haven't yet tested it in combination with text shaping, but I see no reason for conflict between the two. In fact, the combination will allow to have multi-lingual text in one paragraph without the need to split the parts algorythmically.
  • I think this is a adequately clean and workable proof of concept now, but much work is left to be done.

Further tasks:

  • Update write_html() to use text regions, in order to allow for smoother formatting. Done, except for tables.
  • Turn table cells into text regions with the same goal.
  • Use the "text" parameter of table cells for all text regions. Done.
  • Create "shaped" text regions of various types.
  • Lots more...

By submitting this pull request, I confirm that my contribution is made under the terms of the GNU LGPL 3.0 license.

@codecov-commenter
Copy link

codecov-commenter commented Aug 17, 2023

Codecov Report

Attention: 26 lines in your changes are missing coverage. Please review.

Comparison is base (30eb1a4) 93.42% compared to head (a343921) 93.56%.
Report is 3 commits behind head on master.

❗ Current head a343921 differs from pull request most recent head a0cbd91. Consider uploading reports for the commit a0cbd91 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #897      +/-   ##
==========================================
+ Coverage   93.42%   93.56%   +0.14%     
==========================================
  Files          27       28       +1     
  Lines        7851     8209     +358     
  Branches     1433     1500      +67     
==========================================
+ Hits         7335     7681     +346     
- Misses        322      326       +4     
- Partials      194      202       +8     
Files Coverage Δ
fpdf/__init__.py 100.00% <100.00%> (ø)
fpdf/graphics_state.py 98.78% <100.00%> (+0.01%) ⬆️
fpdf/line_break.py 99.12% <100.00%> (+0.09%) ⬆️
fpdf/fpdf.py 92.57% <96.66%> (+0.08%) ⬆️
fpdf/html.py 95.90% <97.22%> (+2.26%) ⬆️
fpdf/text_region.py 92.56% <92.56%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Lucas-C Lucas-C requested review from andersonhc and Lucas-C August 18, 2023 08:09
@Lucas-C
Copy link
Member

Lucas-C commented Aug 18, 2023

This seems very promising! 😊
I'll try to review this on Sunday or Monday.

docs/TextRegion.md Show resolved Hide resolved
docs/TextRegion.md Show resolved Hide resolved
docs/TextRegion.md Outdated Show resolved Hide resolved
docs/TextRegion.md Outdated Show resolved Hide resolved
docs/TextRegion.md Outdated Show resolved Hide resolved
fpdf/text_region.py Outdated Show resolved Hide resolved
fpdf/text_region.py Outdated Show resolved Hide resolved
fpdf/text_region.py Outdated Show resolved Hide resolved
fpdf/text_region.py Outdated Show resolved Hide resolved
fpdf/text_region.py Outdated Show resolved Hide resolved
@Lucas-C
Copy link
Member

Lucas-C commented Aug 21, 2023

Turn table cells into text regions with the same goal.

I think that for text regions to become a successfull addition to fpdf2,
they should be "plugged" in as many possible places (when it's meaningful).
And having tables being based on text regions would make a lot of sense to me.

Hence, maybe "plugging" text regions in tables would be a good way to check that the design you crafted for text regions is sound, and integrates well with the existing lib?

@Lucas-C
Copy link
Member

Lucas-C commented Aug 21, 2023

I think you made an excellent work overall to design this feature
and craft an elegant system to layout text regions: good job really! 👍

@gmischler
Copy link
Collaborator Author

Hence, maybe "plugging" text regions in tables would be a good way to check that the design you crafted for text regions is sound, and integrates well with the existing lib?

Only the table cells will need to be converted to text regions, the rest of the table classes can largely remain the same.

In that context: All the legacy text generation methods use the parameter spelling "txt", only table cells currently use "text".
Is that a deliberate distinction?
I'me currently using the latter one as well, but I have strong doubts if that's really a good idea.

@Lucas-C
Copy link
Member

Lucas-C commented Aug 22, 2023

In that context: All the legacy text generation methods use the parameter spelling "txt", only table cells currently use "text".
Is that a deliberate distinction?
I'me currently using the latter one as well, but I have strong doubts if that's really a good idea.

Yes, this was reported recently in #853
It was really "deliberate".
Maybe we should have both forms allowed in all methods?

@gmischler
Copy link
Collaborator Author

It was really "deliberate".

Why?!?
In any case, we need to define a canonical form and declare the other deprecated.

Maybe we should have both forms allowed in all methods?

I'm not very good with decorators. Can you come up with one that does the conversion if necessary, and prevents the use of both together? Then we don't have to clutter the functional code with this weirdness.

@andersonhc
Copy link
Collaborator

I'm not very good with decorators. Can you come up with one that does the conversion if necessary, and prevents the use of both together? Then we don't have to clutter the functional code with this weirdness.

I guess I would be something like this:

def deprecated_txt(func):
    def inner(*args, **kwargs):
        override_kwargs = {}
        for k in kwargs:
            if k == "txt":
                LOGGER.warning("The argument 'txt' has been deprecated. Please use 'text'.")
                override_kwargs["text"] = kwargs[k]
                continue
            override_kwargs[k] = kwargs[k]
        return func(*args, **override_kwargs)
    return inner

@Lucas-C
Copy link
Member

Lucas-C commented Aug 25, 2023

It was really "deliberate".
Why?!?
In any case, we need to define a canonical form and declare the other deprecated.

Sorry, I meant NOT deliberate 😅
It was basically a mistake.
But we already has a discrepancy with FPDF.write_html()

I'm not very good with decorators. Can you come up with one that does the conversion if necessary, and prevents the use of both together? Then we don't have to clutter the functional code with this weirdness.

I opened #903 to do that, could you please review it?

@gmischler
Copy link
Collaborator Author

Progress report:

I refactored html.py to use text_column().
It took a bit of trickery to use it without any with statements, but it works.
HTML2PDF now runs within its own local context, so no local state changes are leaked outside anymore. It also took some trickery to temporarily escape that local context for actually rendering text. write_html() used to add quite a few redundant font and color changes to the PDF files, which is now prevented. The whitespace handling is now also simpler, faster, and more reliable.

Among other things, this change allows formatting changes within paragraphs without any extra line breaks, fixing #151, #640, and #930 (and improving on #91). The other visible change is that the spacing between paragraphs, as well as above and below headings, is slightly different. I had worried about this for a while, but got it surprisingly close in the end. I actually think it is mor more consistent in relation to font size now. The extra empty line at the top of many pages has also gone.

When consolidating with the recent padding changes, I stumbled over some incongruencies.
While the docstring says that c_margin is ignored when there is horizontal padding, that wasn't actually the case, and it was always applied. I fixed that, and also changed it to decide about the use of left and right c_margin independently of each other.

Some of the general changes I had to make to enable this was that text color and the total line width are now always taken from the Fragments. The different handling of text color results in a (semantically inconsequential) different sequence of commands in the PDF, making it necessary to replace many test files. MultiLineBreak now also receives a set of clearance margins, so it can apply them without affecting the maximum_width returned with the TextLine.

@gmischler
Copy link
Collaborator Author

Time for a little poll:
I've come to doubt if it really makes sense to have both text_column() and text_columns() next to each other.
After all, if we change the default of "ncols" to 1, then both jobs can be done with the latter (they're really just different front-ends to the same code), and we get a simpler API.
I don't remember exactly why I decided to create both, other than "because I can".
What do you guys think?

@gmischler gmischler marked this pull request as ready for review October 3, 2023 08:30
@gmischler
Copy link
Collaborator Author

Ready for real life?

With the HTML milestone reached, I think this would be a good time to merge the current state (after 2.7.6 is released). Since I had to make significant changes to quite a few other parts of the code base, that would also reduce potential conflicts with upcoming changes and fixes by others in those areas.
So review away!

The next milestone will be to turn table cells into text regions. In preparation for that, I'll have to come up with a good way to add images. I think the simplest way is to just treat them like a type of paragraph. With a little luck, that may get finished before releasing 2.7.7 as well.

Shaped text regions will have to wait until after that.

@gmischler gmischler force-pushed the TextRegion branch 2 times, most recently from a1cebdc to 42efea2 Compare October 3, 2023 15:00
@gmischler gmischler requested a review from Lucas-C October 6, 2023 13:03
@Lucas-C
Copy link
Member

Lucas-C commented Oct 7, 2023

Thank you for the progress report @gmischler!

I've come to doubt if it really makes sense to have both text_column() and text_columns() next to each other.
[...] What do you guys think?

I agree with you, and I don't really see the point of having both methods, we should probably only keep text_columns().

With the HTML milestone reached, I think this would be a good time to merge the current state (after 2.7.6 is released).

Agreed!
You really did an impressive job on this PR 👍

Reviewing 100 file changes is bit intimidating 😅
I will try to review it all on Monday morning.

docs/TextColumns.md Outdated Show resolved Hide resolved
docs/TextColumns.md Outdated Show resolved Hide resolved
@gmischler
Copy link
Collaborator Author

Unless you want to exchange over some of my feedbacks, you can merge this PR whenever you are happy with it @gmischler

Ok, I am happy (and a bit exhausted...)

@Lucas-C
Copy link
Member

Lucas-C commented Oct 10, 2023

Good job @gmischler!
👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants