-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Enable short_caption in to_latex #35668
ENH: Enable short_caption in to_latex #35668
Conversation
Method _compose_caption_and_label_macro unifies creation of caption and label macros for both tabular and longtable envs.
Kwarg short_caption allows one to add short caption to LaTeX \caption macro. The final caption macro would look like this: ``` \caption[short_caption]{caption} ```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ivanovmg for the PR. needs a release note (enhancements in 1.2)
pandas/core/generic.py
Outdated
short_caption : str, optional | ||
The LaTeX short caption. | ||
Full caption output would look like this: | ||
``\caption[short_caption]{caption}``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs a versionadd tag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I put this?
.. versionadded:: 1.2.0
I was not sure about the version.
pandas/core/generic.py
Outdated
@@ -2966,7 +2967,10 @@ def to_latex( | |||
The LaTeX caption to be placed inside ``\caption{}`` in the output. | |||
|
|||
.. versionadded:: 1.0.0 | |||
|
|||
short_caption : str, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than another keyword parameter, maybe caption could be changed to accept Optional[Union[str, Tuple[str,str]]] to allow the short caption to be passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do you this if you think this is the best way.
My concern is that in this case caption kwarg will be too long (another concern is that further unpacking into caption and short_caption from either string or tuple will look imperfect, but it is manageable).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonjayhawkins, can you please confirm that this is the option to proceed with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't bother about the caption kwarg being too long, while I do find that the additional argument might be slightly easier to document. But all in all I have a slight preference for avoiding the new argument (and the new error message) and resorting to the tuple version.
I would actually phrase the docs as something like "a tuple (short_caption, full_caption), which will result in \caption[short_caption]{caption}; if a single string is passed, no short caption will be set".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I wait for the decision from the remaining stakeholders?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong preference. Looks like everyone so far has preferred the tuple version though so I'd say go ahead with that
There is some ambiguity as to whether "xy" is a single caption or two if you unpack. Maybe not a big deal but worth calling out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the changes required, including testing corner cases like "xy" caption.
There was a discussion after pull request in regards to the new kwarg ``short_caption``. It was decided not to introduce the new kwarg into method ``pd.DataFrame.to_latex()``, but rather optionally unpack caption into a tuple (caption, short_caption). So, if caption = str, then short_caption is None and caption macros will look like this: ``` \caption{caption} ``` If caption = (long_caption, short_caption), then caption macros will look like this: ``` \caption[short_caption]{caption} ```
For some reason CI / Checks (pull_request) failed with the typing validation.
But I never touched DataFrameFormatter myself. Can you help? |
pandas/core/generic.py
Outdated
@@ -2887,7 +2887,8 @@ def to_latex( | |||
multicolumn=None, | |||
multicolumn_format=None, | |||
multirow=None, | |||
caption=None, | |||
caption: Optional[Union[str, Tuple[str, str]]] = None, | |||
short_caption=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed from the signature right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did that, but the build failed due to exceeding time of 1 hour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
restarted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failure of TestRegistration.test_pandas_plots_register
, which does not seem to be related to the commits.
pandas/core/generic.py
Outdated
@@ -3123,6 +3124,18 @@ def to_latex( | |||
if multirow is None: | |||
multirow = config.get_option("display.latex.multirow") | |||
|
|||
if caption: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of doing this here, can you do it in the formatter itself (in the constructor is ok)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pandas/core/generic.py
Outdated
@@ -3074,11 +3074,12 @@ def to_latex( | |||
centered labels (instead of top-aligned) across the contained | |||
rows, separating groups via clines. The default will be read | |||
from the pandas config module. | |||
caption : str, optional | |||
The LaTeX caption to be placed inside ``\caption{}`` in the output. | |||
caption : str, tuple, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
str or tuple, optional (change the signature as well)
Optional[Union[str, Tuple[str, str]]]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I updated typing on caption in DataFrameFormatter.to_latex() and LatexFormatter.
pandas/io/formats/format.py
Outdated
@@ -931,6 +931,7 @@ def to_latex( | |||
multicolumn_format: Optional[str] = None, | |||
multirow: bool = False, | |||
caption: Optional[str] = None, | |||
short_caption: Optional[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pandas/io/formats/latex.py
Outdated
self._caption, self.short_caption = caption | ||
except ValueError as err: | ||
msg = "caption must be either str or tuple of two strings" | ||
raise ValueError(msg) from err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test that hits this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is there, if you mean raising ValueError.
String 561.
# test that wrong number of params is raised
with pytest.raises(ValueError):
df.to_latex(caption=(the_caption, the_short_caption, "extra_string"))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the test to ensure that the error message matches expectations.
I guess, I need to add release notes into Enhancements section, as @simonjayhawkins highlighted. Would it be v1.2.0? |
Correct - 1.2 whatsnew |
@jreback, @WillAyd, @simonjayhawkins, @toobaz, ping. Is it good to go? |
""" | ||
assert result_cl == expected_cl | ||
|
||
# test when the caption, the short_caption and the label are provided |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here - modular tests are much easier to debug in case of future issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I absolutely agree. I would prefer this #36528 to be merged first, and then rebase on top of that with the separate tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split tests.
""" | ||
assert result_cl == expected_cl | ||
|
||
# test when the short_caption is provided alongside caption |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you split each comment here into a separate test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did that.
pandas/io/formats/latex.py
Outdated
self.multicolumn = multicolumn | ||
self.multicolumn_format = multicolumn_format | ||
self.multirow = multirow | ||
self.caption = caption | ||
self.caption = caption # type: ignore[assignment] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a little confused by this though - are you sure this required? str
should be fine on the right hand side of an assignment based off of the comment.
pandas/io/formats/latex.py
Outdated
@@ -41,6 +41,12 @@ def __init__( | |||
self.multirow = multirow | |||
self.clinebuf: List[List[int]] = [] | |||
self.strcols = self._get_strcols() | |||
|
|||
# Here is a reason for ignoring typing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the output of _get_strcols()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_get_strcols()
returns List[List[str]], where the inner list contains the elements of the column.
pandas/io/formats/latex.py
Outdated
@@ -657,7 +706,29 @@ def _select_builder(self) -> Type[TableBuilderAbstract]: | |||
return TabularBuilder | |||
|
|||
@property | |||
def column_format(self) -> str: | |||
def caption(self) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason we use setters / getters rather than just typing in teh class? seems much simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that there is no particular reason as the class is supposed to be initialized only once. If this is so, then I will update this into a function, which is run at init.
Test for longtable with shorcaption was changed to reflect the recent changes on master branch. The changes in longtable environment were introduced recently, see GH pandas-dev#34360.
pandas/io/formats/latex.py
Outdated
\caption{caption_string}. | ||
""" | ||
if self.caption: | ||
return f"\\caption{self._short_caption_macro}{{{self.caption}}}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return f"\\caption{self._short_caption_macro}{{{self.caption}}}" | |
return f"\\caption{self._short_caption or ''}{{{self.caption}}}" |
This can be simplified to get rid of the property
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
short_caption_macro
is either an empty string or [short caption text]
(in square brackets).
Putting this logic into f-string seems to be rather complicated.
IMHO having a separate property (_short_caption_macro
) with clear intent is more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WillAyd, does my answer make sense?
@jreback, @simonjayhawkins, @toobaz, any plans on merging this?
Or does it require additional work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ivanovmg am looking now, we have 200 open PRs; these take time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the way you have it is ok; the f-string actually is pretty complicated to grok on the substitutions. if you can make it simpler would take (e.g. use concatenation maybe)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made string joining (and substitution in place). And removed property _short_caption_macro
as @WillAyd suggested. Would you consider it more readable?
pandas/io/formats/latex.py
Outdated
return f"\\caption{self._short_caption_macro}{{{self.caption}}}" | ||
return "" | ||
|
||
@property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above - I think can remove this altogether
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comments, ping on green
pandas/io/formats/latex.py
Outdated
msg = "caption must be either a string or a tuple of two strings" | ||
raise ValueError(msg) from err | ||
else: | ||
long_caption = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make this a free function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this function to module level.
pandas/io/formats/latex.py
Outdated
\caption{caption_string}. | ||
""" | ||
if self.caption: | ||
return f"\\caption{self._short_caption_macro}{{{self.caption}}}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the way you have it is ok; the f-string actually is pretty complicated to grok on the substitutions. if you can make it simpler would take (e.g. use concatenation maybe)
@jreback all checks are good except Windows py38_np18, which failed to create anaconda environment. |
All checks are good except Windows:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm ping on green.
print(table) | ||
|
||
Usage of keyword ``caption`` is extended. | ||
Besides taking a single string as an argument, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could add a link to to_latest here: https://pandas.pydata.org/docs/user_guide/io.html (followon as prob need a short section as well).
thanks @ivanovmg |
Enable short_caption for
DataFrame.to_latex
by expanding the meaning of kwargcaption
.Optionally
caption
can be aTuple[str, str] = full_caption, short_caption
.The final caption macro would look like this: