Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: make tangled-up-in-unicode an optional dependency #1070

Merged
merged 1 commit into from
Oct 18, 2022

Conversation

akx
Copy link
Contributor

@akx akx commented Sep 27, 2022

Knowing the exact Unicode properties is likely not a priority for all users. Furthermore, unicodedata as of Python 3.10 is based on UCD 13, so the category data in there ought to be fairly recent.

This should probably be documented (as well as the [unicode] extra), but I'm not sure what the best place for it would be.

Refs dylan-profiler/tangled-up-in-unicode#10 – importing tangled will create 2 gigabytes of
bytecode files and takes a while.

@fabclmnt

This comment was marked as outdated.

@fabclmnt fabclmnt closed this Sep 28, 2022
@akx
Copy link
Contributor Author

akx commented Sep 29, 2022

Hi @fabclmnt, thanks for getting back to me!

  1. If we clearly document that more in-depth character analysis is possible if one installs pandas-profiling[unicode] (at the expense of significant disk space), that should take care of that, I think?
  2. visions hasn't actually required tangled since May 2021, the dependency is there for no reason. I opened the PR Remove tangled_up_in_unicode dependency dylan-profiler/visions#197 in visions prior to opening this just for that. 😄

@akx
Copy link
Contributor Author

akx commented Oct 1, 2022

So @fabclmnt - with the above provisions, could this be reopened (for discussion?)?

@fabclmnt
Copy link
Contributor

fabclmnt commented Oct 2, 2022

@akx the 2GB is a pain indeed and as I've mentioned before, we are happy to consider any other suggestion to have it sorted, different from the one suggested by this PR.

This suggestion does not fit the profile of the user and use-cases for which this package was designed for.

@akx
Copy link
Contributor Author

akx commented Oct 3, 2022

This suggestion does not fit the profile of the user and use-cases for which this package was designed for.

@fabclmnt I'm not sure I understand – can you explain what the profile and the use-cases are? For users who require precise knowledge of the distribution of Unicode blocks, categories and scripts in their data, they can install pandas-profiling[unicode]. For the rest, everything remains the same.

@fabclmnt
Copy link
Contributor

fabclmnt commented Oct 3, 2022

This suggestion does not fit the profile of the user and use-cases for which this package was designed for.

@fabclmnt I'm not sure I understand – can you explain what the profile and the use-cases are? For users who require precise knowledge of the distribution of Unicode blocks, categories and scripts in their data, they can install pandas-profiling[unicode]. For the rest, everything remains the same.

The point is precisely what you've mentioned, the user would have to know ahead that he needs or not to use the unicode at the moment of the installation which is not the case for our audience. I have particular examples and request of users that for the same use case they explore different datasets (with the need for the precise knowledge of the distribution of unicode blocks) but they only know it, whenever they use PP for the initial exploration and data understanding.

@fabclmnt
Copy link
Contributor

fabclmnt commented Oct 7, 2022

Re-openning this PR - after having a look, this is indeed the best option.

Can you please add a note to the documentation regarding this change?

@fabclmnt fabclmnt reopened this Oct 7, 2022
@fabclmnt fabclmnt self-requested a review October 7, 2022 15:31
@fabclmnt fabclmnt self-requested a review October 7, 2022 15:31
Copy link
Contributor

@fabclmnt fabclmnt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll just ask you to change the commit message and change to the develop branch.

@akx akx changed the base branch from master to develop October 8, 2022 15:18
@akx akx force-pushed the detangle branch 2 times, most recently from 3b7f70d to bdfb907 Compare October 8, 2022 15:34
@akx
Copy link
Contributor Author

akx commented Oct 8, 2022

@fabclmnt 👍 Changed base to develop, rebased, and amended readme and docs too :)

@akx akx changed the title Make tangled-up-in-unicode an optional dependency fix: make tangled-up-in-unicode an optional dependency Oct 8, 2022
@sbrugman
Copy link
Collaborator

sbrugman commented Oct 9, 2022

Apart from the location of the check, lgtm.

@fabclmnt fabclmnt self-requested a review October 11, 2022 22:51
Copy link
Contributor

@fabclmnt fabclmnt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 looking good to go. Thank you @akx!

@aquemy
Copy link
Contributor

aquemy commented Oct 18, 2022

Given the fact that we need to release and that the concern raised by @sbrugman is not a blocker or vital, I merge this.

Kudos to @akx !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants