Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix locale check when checking for non-printable chars while validating data dirs #4934

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rkjaran
Copy link
Contributor

@rkjaran rkjaran commented Sep 9, 2024

As far as I know locale -a returns results in the form C.utf8, en_US.utf8, etc not C.UTF-8, at least it does on the machines I have access to. This means that the check for non-printable characters in validate_data_dir.sh fails on any system that doesn't have the en-US.UTF-8 locale installed.

@lumpidu
Copy link

lumpidu commented Sep 10, 2024

Your MR should probably be changed to accept both forms.

E.g. if you are on OSX, locale -a returns always the suffix .UTF-8.

On my Ubuntu 20.04 machine, it returns en_US.utf8, but also C.UTF-8.

@rkjaran rkjaran force-pushed the fix-validate-data-dir branch from d909b33 to 4e2ec4d Compare September 10, 2024 11:34
@@ -126,7 +126,7 @@ fi
num_utts=`cat $tmpdir/utts | wc -l`
if ! $no_text; then
if ! $non_print; then
if locale -a | grep "C.UTF-8" >/dev/null; then
if locale -a | grep "C.utf8\|C.UTF-8" >/dev/null; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know that locales will accept different spellings?
It would be safer to add another clause to the if-statement to set L=C.utf8 if that's what is available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants