Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReST table breaks if package description contains unusual characters #73

Closed
pohlt opened this issue Jul 16, 2020 · 11 comments
Closed

ReST table breaks if package description contains unusual characters #73

pohlt opened this issue Jul 16, 2020 · 11 comments

Comments

@pohlt
Copy link

pohlt commented Jul 16, 2020

Some packages have unusual characters in their description (e.g. poyo). These characters causes ReST tables to break their column alignment.

Filtering the input strings for regular alpha-numerical characters would fix this, I guess.

@pohlt
Copy link
Author

pohlt commented Jul 16, 2020

I just saw that you actually let ptable do the table output. How about switching to ptable2 which is still actively developed and should have a compatible API. They might already have fixed this issue.

@raimon49
Copy link
Owner

@pohlt Thanks for the report. Probably you are talking about " 🐓 " emoji.

In my Linux environment I got the following output:

$ pip-licenses -d --format=rest
+------+---------+---------+-----------------------------------------+
| Name | Version | License | Description                             |
+------+---------+---------+-----------------------------------------+
| poyo | 0.5.0   | MIT     | A lightweight YAML Parser for Python. ? |
+------+---------+---------+-----------------------------------------+

How is it broken in your environment?

@raimon49
Copy link
Owner

How about switching to ptable2 which is still actively developed and should have a compatible API. They might already have fixed this issue.

If switching solves this issue, it's an attractive proposition.

However, there are many forked implementations of ptable on PyPI. Making a better choice among them is a difficult decision for me 😔

@pohlt
Copy link
Author

pohlt commented Jul 18, 2020

Under Windows 10 (Python 3.7) I get

+------+---------+---------+-----------------------------------------+
| Name | Version | License | Description                             |
+------+---------+---------+-----------------------------------------+
| poyo | 0.5.0   | MIT     | A lightweight YAML Parser for Python. 🐓 |   <= under Win10, I see a chicken here and the table edge is shifted
+------+---------+---------+-----------------------------------------+

which is one character too much.

I agree: It's difficult to decide which prettytable fork is currently best maintained. PTable hasn't seen a commit for two years now and I'm worried it's basically dead.

I'll check if ptable really causes the problem and if ptable2 would fix it.

@pohlt
Copy link
Author

pohlt commented Jul 18, 2020

There is an open issue kxxoling/PTable#20 which describes the same problem. No reaction for 2.5 years.

@raimon49
Copy link
Owner

@pohlt Thanks for the working example on Windows 10.

It was reproduced in my macOS environment. Also, I tried installing source code for ptable2 from the latest master branch. This issue has not been resolved.

If this issue can be solved by using ptable2 in the future, it is better to implement a compatibility layer like P-R in the past.

See also) #52

And when I'm sure ptable2 is stable and well maintained, I plan on migrating from PTable.

@pohlt
Copy link
Author

pohlt commented Jul 20, 2020

After some testing and internal discussions, I would now propose to filter the data which is retrieved from pypi (enabled with a flag).

Rationale:

  • Fixing upstream packages turns out to be tricky without active maintainers.
  • Handling Unicode (even in Python) is not as straightforward as one would think and seems to depend on the OS.
  • Filtering would not be a huge burden (I'm quite sure I'm oversimplifying things):
    "Hello 🐓 World".encode("latin1", errors="ignore").decode("utf-8")
    
    latin1 could be a configuration setting.

Would you be willing to consider a PR?

@raimon49
Copy link
Owner

Patch is welcome!! 😄

@pohlt
Copy link
Author

pohlt commented Jul 20, 2020

Ok, now we just need someone who generates the actual PR! ;-)

I'd like to get your input/guidance before starting. What I plan to do:

  • Add boolean filter-strings command line option for turning on filtering (default: off)
  • Add some code to do the actual filtering (for all strings retrieved from pip)
  • Add string filter-codec command line option for selecting a code page from this list (default: latin1)
  • Add docs and tests

Does that make sense to you?
And don't get your hopes up: I cannot commit to if and when this is happening. :-)

@raimon49
Copy link
Owner

@pohlt All right, I'll express my opinion.

  • Add boolean filter-strings command line option for turning on filtering (default: off)

I think it's fine to adopt it as an opt-out or default behavior.

This is because we already do the equivalent when we load the package license file as a stream.
https://github.com/raimon49/pip-licenses/blob/v-2.2.1/piplicenses.py#L148-L149

Note: However, we have adopted errors="backslashreplace" instead of errors="ignore" and replaced it with code points.

  • Add some code to do the actual filtering (for all strings retrieved from pip)

Yes, I agree.

  • Add string filter-codec command line option for selecting a code page from this list (default: latin1)
  • Add docs and tests

I agree with default: latin1.

Testing and documenting that users can specify the filter-codec command line option is going to be a challenge. If you can, make it a goal.

@raimon49
Copy link
Owner

raimon49 commented Aug 2, 2020

This issue has been resolved in pip-licenses 2.3.0 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants