-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: make tangled-up-in-unicode an optional dependency #1070
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
Hi @fabclmnt, thanks for getting back to me!
|
So @fabclmnt - with the above provisions, could this be reopened (for discussion?)? |
@akx the 2GB is a pain indeed and as I've mentioned before, we are happy to consider any other suggestion to have it sorted, different from the one suggested by this PR. This suggestion does not fit the profile of the user and use-cases for which this package was designed for. |
@fabclmnt I'm not sure I understand – can you explain what the profile and the use-cases are? For users who require precise knowledge of the distribution of Unicode blocks, categories and scripts in their data, they can install |
The point is precisely what you've mentioned, the user would have to know ahead that he needs or not to use the unicode at the moment of the installation which is not the case for our audience. I have particular examples and request of users that for the same use case they explore different datasets (with the need for the precise knowledge of the distribution of unicode blocks) but they only know it, whenever they use PP for the initial exploration and data understanding. |
Re-openning this PR - after having a look, this is indeed the best option. Can you please add a note to the documentation regarding this change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll just ask you to change the commit message and change to the develop
branch.
3b7f70d
to
bdfb907
Compare
@fabclmnt 👍 Changed base to |
Apart from the location of the check, lgtm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀 looking good to go. Thank you @akx!
Knowing the exact Unicode properties is likely not a priority for all users. Furthermore, unicodedata as of Python 3.10 is based on UCD 13, so the category data in there ought to be fairly recent.
This should probably be documented (as well as the
[unicode]
extra), but I'm not sure what the best place for it would be.Refs dylan-profiler/tangled-up-in-unicode#10 – importing
tangled
will create 2 gigabytes ofbytecode files and takes a while.