Skip to content

Commit

Permalink
fix: add keep_empty to st clean_text
Browse files Browse the repository at this point in the history
  • Loading branch information
engisalor committed Jul 16, 2024
1 parent 81bf898 commit 398a371
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions pipeline/stanza/base_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,10 +74,10 @@ def wrap_lines(match, hunspell: bool = False, dictionaries="en_US,es_ES,fr_FR"):
return match.group()


def clean_text(text: str) -> str:
def clean_text(text: str, keep_empty=False) -> str:
"""Cleans texts to prepare for passing to an NLP pipeline."""
lines = text.split("\n")
lines = [uninorm.normalize_line(x) for x in lines]
lines = [uninorm.normalize_line(x, keep_empty=keep_empty) for x in lines]
return "".join(lines)


Expand Down

0 comments on commit 398a371

Please sign in to comment.