ReST table breaks if package description contains unusual characters #73

pohlt · 2020-07-16T16:55:56Z

Some packages have unusual characters in their description (e.g. poyo). These characters causes ReST tables to break their column alignment.

Filtering the input strings for regular alpha-numerical characters would fix this, I guess.

pohlt · 2020-07-16T17:09:09Z

I just saw that you actually let ptable do the table output. How about switching to ptable2 which is still actively developed and should have a compatible API. They might already have fixed this issue.

raimon49 · 2020-07-18T06:00:48Z

@pohlt Thanks for the report. Probably you are talking about " 🐓 " emoji.

In my Linux environment I got the following output:

$ pip-licenses -d --format=rest
+------+---------+---------+-----------------------------------------+
| Name | Version | License | Description                             |
+------+---------+---------+-----------------------------------------+
| poyo | 0.5.0   | MIT     | A lightweight YAML Parser for Python. ? |
+------+---------+---------+-----------------------------------------+

How is it broken in your environment?

raimon49 · 2020-07-18T06:04:54Z

How about switching to ptable2 which is still actively developed and should have a compatible API. They might already have fixed this issue.

If switching solves this issue, it's an attractive proposition.

However, there are many forked implementations of ptable on PyPI. Making a better choice among them is a difficult decision for me 😔

pohlt · 2020-07-18T18:42:11Z

Under Windows 10 (Python 3.7) I get

+------+---------+---------+-----------------------------------------+
| Name | Version | License | Description                             |
+------+---------+---------+-----------------------------------------+
| poyo | 0.5.0   | MIT     | A lightweight YAML Parser for Python. 🐓 |   <= under Win10, I see a chicken here and the table edge is shifted
+------+---------+---------+-----------------------------------------+

which is one character too much.

I agree: It's difficult to decide which prettytable fork is currently best maintained. PTable hasn't seen a commit for two years now and I'm worried it's basically dead.

I'll check if ptable really causes the problem and if ptable2 would fix it.

pohlt · 2020-07-18T18:50:25Z

There is an open issue kxxoling/PTable#20 which describes the same problem. No reaction for 2.5 years.

raimon49 · 2020-07-20T09:50:35Z

@pohlt Thanks for the working example on Windows 10.

It was reproduced in my macOS environment. Also, I tried installing source code for ptable2 from the latest master branch. This issue has not been resolved.

If this issue can be solved by using ptable2 in the future, it is better to implement a compatibility layer like P-R in the past.

See also) #52

And when I'm sure ptable2 is stable and well maintained, I plan on migrating from PTable.

pohlt · 2020-07-20T11:21:41Z

After some testing and internal discussions, I would now propose to filter the data which is retrieved from pypi (enabled with a flag).

Rationale:

Fixing upstream packages turns out to be tricky without active maintainers.
Handling Unicode (even in Python) is not as straightforward as one would think and seems to depend on the OS.
Filtering would not be a huge burden (I'm quite sure I'm oversimplifying things):
```
"Hello 🐓 World".encode("latin1", errors="ignore").decode("utf-8")
```
latin1 could be a configuration setting.

Would you be willing to consider a PR?

raimon49 · 2020-07-20T11:49:25Z

Patch is welcome!! 😄

pohlt · 2020-07-20T12:24:58Z

Ok, now we just need someone who generates the actual PR! ;-)

I'd like to get your input/guidance before starting. What I plan to do:

Add boolean filter-strings command line option for turning on filtering (default: off)
Add some code to do the actual filtering (for all strings retrieved from pip)
Add string filter-codec command line option for selecting a code page from this list (default: latin1)
Add docs and tests

Does that make sense to you?
And don't get your hopes up: I cannot commit to if and when this is happening. :-)

raimon49 · 2020-07-21T09:39:53Z

@pohlt All right, I'll express my opinion.

Add boolean filter-strings command line option for turning on filtering (default: off)

I think it's fine to adopt it as an opt-out or default behavior.

This is because we already do the equivalent when we load the package license file as a stream.
https://github.com/raimon49/pip-licenses/blob/v-2.2.1/piplicenses.py#L148-L149

Note: However, we have adopted errors="backslashreplace" instead of errors="ignore" and replaced it with code points.

Add some code to do the actual filtering (for all strings retrieved from pip)

Yes, I agree.

Add string filter-codec command line option for selecting a code page from this list (default: latin1)

Add docs and tests

I agree with default: latin1.

Testing and documenting that users can specify the filter-codec command line option is going to be a challenge. If you can, make it a goal.

raimon49 · 2020-08-02T06:32:34Z

This issue has been resolved in pip-licenses 2.3.0 😄

pohlt mentioned this issue Jul 23, 2020

fix: populate tables and test ReST table #74

Merged

raimon49 closed this as completed in b5cf53c Aug 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ReST table breaks if package description contains unusual characters #73

ReST table breaks if package description contains unusual characters #73

pohlt commented Jul 16, 2020

pohlt commented Jul 16, 2020

raimon49 commented Jul 18, 2020

raimon49 commented Jul 18, 2020

pohlt commented Jul 18, 2020 •

edited

Loading

pohlt commented Jul 18, 2020

raimon49 commented Jul 20, 2020

pohlt commented Jul 20, 2020 •

edited

Loading

raimon49 commented Jul 20, 2020

pohlt commented Jul 20, 2020

raimon49 commented Jul 21, 2020

raimon49 commented Aug 2, 2020

ReST table breaks if package description contains unusual characters #73

ReST table breaks if package description contains unusual characters #73

Comments

pohlt commented Jul 16, 2020

pohlt commented Jul 16, 2020

raimon49 commented Jul 18, 2020

raimon49 commented Jul 18, 2020

pohlt commented Jul 18, 2020 • edited Loading

pohlt commented Jul 18, 2020

raimon49 commented Jul 20, 2020

pohlt commented Jul 20, 2020 • edited Loading

raimon49 commented Jul 20, 2020

pohlt commented Jul 20, 2020

raimon49 commented Jul 21, 2020

raimon49 commented Aug 2, 2020

pohlt commented Jul 18, 2020 •

edited

Loading

pohlt commented Jul 20, 2020 •

edited

Loading