Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

t/Tools_Text.t test failing in Sympa 6.2.54 #892

Closed
dpc22 opened this issue Feb 26, 2020 · 3 comments · Fixed by #894
Closed

t/Tools_Text.t test failing in Sympa 6.2.54 #892

dpc22 opened this issue Feb 26, 2020 · 3 comments · Fixed by #894
Labels
bug ready A PR is waiting to be merged. Close to be solved test
Milestone

Comments

@dpc22
Copy link
Contributor

dpc22 commented Feb 26, 2020

Version

6.2.54

Installation method

Building from source

Expected behavior

"make check" should pass all tests.

Actual behavior

One of the tests in t/Tools_Text.t fails unless the Unicode::UTF8 Perl module is installed

That is currently a "recommends" rather than a "requires" in cpanfile.

#   Failed test at t/Tools_Text.t line 52.
#          got: '<U+D800>
# <U+10FFFE>
# <U+110000>
# <U+200000>
# '
#     expected: '���
# �
# ����
# �����
# '
# Looks like you failed 1 test of 13.
t/Tools_Text.t ........... 
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/13 subtests 

Additional information

Appears to be consequence of commit 43d3041

I attach a minimal test script which exhibits the problem on both Ubuntu 18.04 (Perl 5.26.1) and RHEL 7 (Perl 5.16.3)

test.pl.txt

@ikedas ikedas added this to the 6.2.56 milestone Feb 27, 2020
@ikedas ikedas added the ready A PR is waiting to be merged. Close to be solved label Feb 27, 2020
@ikedas
Copy link
Member

ikedas commented Feb 27, 2020

Hi @dpc22
Could you please check the PR above?

@dpc22
Copy link
Contributor Author

dpc22 commented Feb 27, 2020

That looks plausible to me if the test should only be run when the Unicode::UTF8 Perl module is installed.

I infer that the optional modules are attempting to clean up broken UTF-8? It isn't clear to me what happens with Encode::encode_utf8($utext) when they aren't available.

@ikedas
Copy link
Member

ikedas commented Feb 27, 2020

sub canonic_text {
my $text = shift;
return undef unless defined $text;
# Normalize text. See also discussion on
# https://listes.renater.fr/sympa/arc/sympa-developpers/2018-03/thrd1.html
#
# N.B.: Corresponding modules are optional by now, and should be
# mandatory in the future.
my $utext;
if (Encode::is_utf8($text)) {
$utext = $text;
} elsif ($Unicode::UTF8::VERSION) {
no warnings 'utf8';
$utext = Unicode::UTF8::decode_utf8($text);
} else {
$utext = Encode::decode_utf8($text);
}
if ($Unicode::Normalize::VERSION) {
$utext = Unicode::Normalize::normalize('NFC', $utext);
}
# Remove DOS linefeeds (^M) that cause problems with Outlook 98, AOL,
# and EIMS:
$utext =~ s/\r\n|\r/\n/g;
if (Encode::is_utf8($text)) {
return $utext;
} else {
return Encode::encode_utf8($utext);
}
}

As sanytization and normalization by Unicode::UTF8 and Unicode::Normalize handle Unicode text, all inputs of this function have to be converted to Unicode text.
OTOH, Unicode text (utf8 flag is on) and binary text (utf8 flag is off) must not be mixed, or text will be broken. That's why the output should be binary text as the input is. And that's why the last Encode::encode_utf8() is needed to convert output to binary,

ikedas added a commit that referenced this issue Feb 27, 2020
t/Tools_Text.t fails without optional modules (#892)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug ready A PR is waiting to be merged. Close to be solved test
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants