Some Unicode emoji (🪩🫠, maybe others?) are categorized as None, breaking HTML rendering #1325

fizmat opened this issue May 15, 2023 · 5 comments
feature request 💬 Requests for new features


fizmat commented May 15, 2023

Current Behaviour

Rendering a report to HTML fails completely:

Summarize dataset: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 15.10it/s, Completed]
Generate report structure:   0%|                                                                                                                                                    | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/Users/fizmat/Desktop/profiling-bug/", line 4, in <module>
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/ydata-bug/lib/python3.10/site-packages/typeguard/", line 1033, in wrapper
    retval = func(*args, **kwargs)
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/ydata-bug/lib/python3.10/site-packages/ydata_profiling/", line 461, in to_html
    return self.html
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/ydata-bug/lib/python3.10/site-packages/typeguard/", line 1033, in wrapper
    retval = func(*args, **kwargs)
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/ydata-bug/lib/python3.10/site-packages/ydata_profiling/", line 272, in html
    self._html = self._render_html()
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/ydata-bug/lib/python3.10/site-packages/typeguard/", line 1033, in wrapper
    retval = func(*args, **kwargs)
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/ydata-bug/lib/python3.10/site-packages/ydata_profiling/", line 380, in _render_html
    report =
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/ydata-bug/lib/python3.10/site-packages/typeguard/", line 1033, in wrapper
    retval = func(*args, **kwargs)
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/ydata-bug/lib/python3.10/site-packages/ydata_profiling/", line 266, in report
    self._report = get_report_structure(self.config, self.description_set)
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/ydata-bug/lib/python3.10/site-packages/ydata_profiling/report/structure/", line 383, in get_report_structure
    render_variables_section(config, summary),
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/ydata-bug/lib/python3.10/site-packages/ydata_profiling/report/structure/", line 159, in render_variables_section
    template_variables.update(render_map_type(config, template_variables))
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/ydata-bug/lib/python3.10/site-packages/ydata_profiling/report/structure/variables/", line 413, in render_categorical
    overview_table_char, unitab = render_categorical_unicode(config, summary, varid)
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/ydata-bug/lib/python3.10/site-packages/ydata_profiling/report/structure/variables/", line 139, in render_categorical_unicode
    category_alias_name = category_alias_name.replace("_", " ")
AttributeError: 'NoneType' object has no attribute 'replace'

Expected Behaviour

  • A report should run, these emoji should probably be categorized as "Other Symbol", just like 😀 and 🔥
  • In general, an unexpected emoji in data should not completely break rendering. Either one will work:
    1. unicode_summary_vc() should check if the returned category is None and replace it with a string. For example "None" or "Other Symbol", depending on your design philosophy.
    2. render_categorical_unicode() should work correctly when summary["category_alias_char_counts"] contains a None key instead of a string.

Maybe this is related to #1068 and #1070 and the two supported Unicode dependencies behaving differently?

Data Description

Originally encountered with, but even a minimal example works.

Code that reproduces the bug

import pandas as pd
from ydata_profiling import ProfileReport
rep = ProfileReport(pd.DataFrame({'a': ['🪩']}))

pandas-profiling version



MacOS 13.3.1, Google Colab


  • There is not yet another bug report for this issue in the issue tracker
  • The problem is reproducible from this bug report. This guide can help to craft a minimal bug report.
  • The issue has not been resolved by the entries listed under Common Issues.
fabclmnt added feature request 💬 label May 16, 2023
Same here. Not only the html, but it occurs in the notebook widget.

Copy link

fabclmnt commented Jun 3, 2023

@fizmat and @lala7573 have you tried following the instructions to install de unicode tangler?

pip install -U ydata-profiling[unicode]

fayewu commented Jun 5, 2023

same here

@fabclmnt I am also encountering this problem, and that installation did not solve the problem.

As a workaround, one can simply ignore these keys. At this line:

    for category_alias_name, category_alias_counts in sorted(
        summary["category_alias_char_counts"].items(), key=lambda x: -len(x[1])
        category_alias_name = category_alias_name.replace("_", " ")

Replace it with

    for category_alias_name, category_alias_counts in sorted(
        summary["category_alias_char_counts"].items(), key=lambda x: -len(x[1])
        if category_alias_name is None:
        category_alias_name = category_alias_name.replace("_", " ")


    for category_alias_name, category_alias_counts in sorted(
        summary["category_alias_char_counts"].items(), key=lambda x: -len(x[1])
        if category_alias_name is None:
            category_alias_name = "None"
        category_alias_name = category_alias_name.replace("_", " ")

desobolevsky commented Jul 30, 2024

Hey everyone! Made a PR #1632 on this matter, since the previous one isn't merged or supported. I'll be happy to update or correct everything to the latest code updates :)

feature request 💬 Requests for new features
