Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX html escape regex #394

Merged
merged 2 commits into from
Aug 29, 2023
Merged

FIX html escape regex #394

merged 2 commits into from
Aug 29, 2023

Conversation

Wauplin
Copy link
Contributor

@Wauplin Wauplin commented Aug 29, 2023

Related to #373 which broke the docs in huggingface_hub (see #373 (comment)). The issue is caused by a docstring in which both a "<" and a ">" are written but not as a HTML tag (e.g."... <5MB .... .... something -> something else"). This PR fixes it, hopefully without breaking other docs.

In details:

  • Update _re_lt_html:
    • remove re.DOTALL (unused in the regex)
    • add re.IGNORECASE
    • update from \w+ to [a-z] after the first "<" => the start of a tag must be a letter, not an alphanumeric. This way we don't capture string like "<5MB" anymore. This update fixes the huggingface_hub issue.
  • Move _re_lt_html and _re_lcub_svelte regexes outside of convert_md_to_mdx to compile them only once.
  • Add a regression test.

@Wauplin Wauplin requested review from mishig25 and xenova August 29, 2023 13:20
Copy link
Contributor

@mishig25 mishig25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@Wauplin
Copy link
Contributor Author

Wauplin commented Aug 29, 2023

Thanks for the quick review @mishig25 !

@Wauplin Wauplin merged commit c8152d4 into main Aug 29, 2023
@Wauplin Wauplin deleted the fix-html-escape-regex branch August 29, 2023 13:34
@mishig25
Copy link
Contributor

@Wauplin please let me know if hub client docsCI passes now 👍

@Wauplin
Copy link
Contributor Author

Wauplin commented Aug 29, 2023

@mishig25 Just confirmed in https://github.com/huggingface/huggingface_hub/actions/runs/6013363450/job/16310680128 ! CI is green again 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants