Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Enable short_caption in to_latex #35668

Merged
merged 39 commits into from
Oct 17, 2020

Conversation

ivanovmg
Copy link
Member

@ivanovmg ivanovmg commented Aug 11, 2020

Enable short_caption for DataFrame.to_latex by expanding the meaning of kwarg caption.
Optionally caption can be a Tuple[str, str] = full_caption, short_caption.

The final caption macro would look like this:

\caption[short_caption]{full_caption}

Method _compose_caption_and_label_macro unifies
creation of caption and label macros for both tabular
and longtable envs.
Kwarg short_caption allows one to add short caption
to LaTeX \caption macro.

The final caption macro would look like this:
```
\caption[short_caption]{caption}
```
@ivanovmg ivanovmg changed the title Feature/latex shortcaption Enable short_caption in to_latex Aug 11, 2020
Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ivanovmg for the PR. needs a release note (enhancements in 1.2)

short_caption : str, optional
The LaTeX short caption.
Full caption output would look like this:
``\caption[short_caption]{caption}``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a versionadd tag

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I put this?

.. versionadded:: 1.2.0

I was not sure about the version.

@@ -2966,7 +2967,10 @@ def to_latex(
The LaTeX caption to be placed inside ``\caption{}`` in the output.

.. versionadded:: 1.0.0

short_caption : str, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than another keyword parameter, maybe caption could be changed to accept Optional[Union[str, Tuple[str,str]]] to allow the short caption to be passed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do you this if you think this is the best way.
My concern is that in this case caption kwarg will be too long (another concern is that further unpacking into caption and short_caption from either string or tuple will look imperfect, but it is manageable).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonjayhawkins, can you please confirm that this is the option to proceed with?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've added the api design and needs discussion labels. we should get some opinions from others before proceeding. Changing the api after the event is not so easy. cc @WillAyd @jreback @toobaz

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't bother about the caption kwarg being too long, while I do find that the additional argument might be slightly easier to document. But all in all I have a slight preference for avoiding the new argument (and the new error message) and resorting to the tuple version.
I would actually phrase the docs as something like "a tuple (short_caption, full_caption), which will result in \caption[short_caption]{caption}; if a single string is passed, no short caption will be set".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I wait for the decision from the remaining stakeholders?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong preference. Looks like everyone so far has preferred the tuple version though so I'd say go ahead with that

There is some ambiguity as to whether "xy" is a single caption or two if you unpack. Maybe not a big deal but worth calling out

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the changes required, including testing corner cases like "xy" caption.

@simonjayhawkins simonjayhawkins added Enhancement IO LaTeX to_latex API Design Needs Discussion Requires discussion from core team before further action labels Aug 11, 2020
There was a discussion after pull request in regards
to the new kwarg ``short_caption``.
It was decided not to introduce the new kwarg into
method ``pd.DataFrame.to_latex()``,
but rather optionally unpack caption into a tuple
(caption, short_caption).

So, if caption = str, then short_caption is None
and caption macros will look like this:
```
\caption{caption}
```

If caption = (long_caption, short_caption),
then caption macros will look like this:
```
\caption[short_caption]{caption}
```
@ivanovmg
Copy link
Member Author

For some reason CI / Checks (pull_request) failed with the typing validation.

Run source activate pandas-dev
mypy --version
mypy 0.730
Performing static analysis using mypy
pandas/core/generic.py:3032: error: Argument 1 to "DataFrameFormatter" has incompatible type "NDFrame"; expected "DataFrame"  [arg-type]
Found 1 error in 1 file (checked 1030 source files)
Performing static analysis using mypy DONE
##[error]Process completed with exit code 1.

But I never touched DataFrameFormatter myself. Can you help?

@@ -2887,7 +2887,8 @@ def to_latex(
multicolumn=None,
multicolumn_format=None,
multirow=None,
caption=None,
caption: Optional[Union[str, Tuple[str, str]]] = None,
short_caption=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed from the signature right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did that, but the build failed due to exceeding time of 1 hour.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restarted

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failure of TestRegistration.test_pandas_plots_register, which does not seem to be related to the commits.

@ivanovmg ivanovmg requested a review from WillAyd August 26, 2020 06:47
@ivanovmg
Copy link
Member Author

Please put this PR on hold before the PR related to full refactoring of latex.py is merged #35872 (unless abandoned).
It will be easy to implement the changes of the present PR if #35872 is merged first.

@@ -3123,6 +3124,18 @@ def to_latex(
if multirow is None:
multirow = config.get_option("display.latex.multirow")

if caption:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of doing this here, can you do it in the formatter itself (in the constructor is ok)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -3074,11 +3074,12 @@ def to_latex(
centered labels (instead of top-aligned) across the contained
rows, separating groups via clines. The default will be read
from the pandas config module.
caption : str, optional
The LaTeX caption to be placed inside ``\caption{}`` in the output.
caption : str, tuple, optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str or tuple, optional (change the signature as well)
Optional[Union[str, Tuple[str, str]]]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I updated typing on caption in DataFrameFormatter.to_latex() and LatexFormatter.

@@ -931,6 +931,7 @@ def to_latex(
multicolumn_format: Optional[str] = None,
multirow: bool = False,
caption: Optional[str] = None,
short_caption: Optional[str] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

self._caption, self.short_caption = caption
except ValueError as err:
msg = "caption must be either str or tuple of two strings"
raise ValueError(msg) from err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test that hits this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is there, if you mean raising ValueError.

String 561.

        # test that wrong number of params is raised
        with pytest.raises(ValueError):
            df.to_latex(caption=(the_caption, the_short_caption, "extra_string"))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the test to ensure that the error message matches expectations.

@ivanovmg
Copy link
Member Author

ivanovmg commented Sep 8, 2020

I guess, I need to add release notes into Enhancements section, as @simonjayhawkins highlighted. Would it be v1.2.0?

@WillAyd
Copy link
Member

WillAyd commented Sep 8, 2020

Correct - 1.2 whatsnew

@ivanovmg
Copy link
Member Author

@jreback, @WillAyd, @simonjayhawkins, @toobaz, ping. Is it good to go?

"""
assert result_cl == expected_cl

# test when the caption, the short_caption and the label are provided
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here - modular tests are much easier to debug in case of future issues

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I absolutely agree. I would prefer this #36528 to be merged first, and then rebase on top of that with the separate tests.

Copy link
Member Author

@ivanovmg ivanovmg Sep 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split tests.

"""
assert result_cl == expected_cl

# test when the short_caption is provided alongside caption
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you split each comment here into a separate test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did that.

self.multicolumn = multicolumn
self.multicolumn_format = multicolumn_format
self.multirow = multirow
self.caption = caption
self.caption = caption # type: ignore[assignment]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little confused by this though - are you sure this required? str should be fine on the right hand side of an assignment based off of the comment.

@@ -41,6 +41,12 @@ def __init__(
self.multirow = multirow
self.clinebuf: List[List[int]] = []
self.strcols = self._get_strcols()

# Here is a reason for ignoring typing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the output of _get_strcols()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_strcols() returns List[List[str]], where the inner list contains the elements of the column.

@@ -657,7 +706,29 @@ def _select_builder(self) -> Type[TableBuilderAbstract]:
return TabularBuilder

@property
def column_format(self) -> str:
def caption(self) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason we use setters / getters rather than just typing in teh class? seems much simpler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that there is no particular reason as the class is supposed to be initialized only once. If this is so, then I will update this into a function, which is run at init.

\caption{caption_string}.
"""
if self.caption:
return f"\\caption{self._short_caption_macro}{{{self.caption}}}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return f"\\caption{self._short_caption_macro}{{{self.caption}}}"
return f"\\caption{self._short_caption or ''}{{{self.caption}}}"

This can be simplified to get rid of the property

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

short_caption_macro is either an empty string or [short caption text] (in square brackets).
Putting this logic into f-string seems to be rather complicated.
IMHO having a separate property (_short_caption_macro) with clear intent is more readable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WillAyd, does my answer make sense?
@jreback, @simonjayhawkins, @toobaz, any plans on merging this?
Or does it require additional work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivanovmg am looking now, we have 200 open PRs; these take time

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the way you have it is ok; the f-string actually is pretty complicated to grok on the substitutions. if you can make it simpler would take (e.g. use concatenation maybe)

Copy link
Member Author

@ivanovmg ivanovmg Oct 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made string joining (and substitution in place). And removed property _short_caption_macro as @WillAyd suggested. Would you consider it more readable?

return f"\\caption{self._short_caption_macro}{{{self.caption}}}"
return ""

@property
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above - I think can remove this altogether

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comments, ping on green

msg = "caption must be either a string or a tuple of two strings"
raise ValueError(msg) from err
else:
long_caption = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make this a free function

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this function to module level.

\caption{caption_string}.
"""
if self.caption:
return f"\\caption{self._short_caption_macro}{{{self.caption}}}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the way you have it is ok; the f-string actually is pretty complicated to grok on the substitutions. if you can make it simpler would take (e.g. use concatenation maybe)

@ivanovmg
Copy link
Member Author

ivanovmg commented Oct 7, 2020

@jreback all checks are good except Windows py38_np18, which failed to create anaconda environment.

@ivanovmg ivanovmg requested a review from jreback October 7, 2020 11:08
pandas/io/formats/latex.py Show resolved Hide resolved
@ivanovmg ivanovmg requested a review from jreback October 12, 2020 19:36
@ivanovmg
Copy link
Member Author

All checks are good except Windows:

##[error]We stopped hearing from agent Azure Pipelines 8. 

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm ping on green.

print(table)

Usage of keyword ``caption`` is extended.
Besides taking a single string as an argument,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could add a link to to_latest here: https://pandas.pydata.org/docs/user_guide/io.html (followon as prob need a short section as well).

@jreback jreback merged commit f1b2bb1 into pandas-dev:master Oct 17, 2020
@jreback
Copy link
Contributor

jreback commented Oct 17, 2020

thanks @ivanovmg

JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Oct 26, 2020
kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020
@ivanovmg ivanovmg deleted the feature/latex-shortcaption branch November 6, 2020 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Enhancement IO LaTeX to_latex Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: short caption for LaTeX tables
5 participants