Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PostgreSQL: Issues related to utf8 flag #305

Closed
ldidry opened this issue May 15, 2018 · 20 comments
Closed

PostgreSQL: Issues related to utf8 flag #305

ldidry opened this issue May 15, 2018 · 20 comments
Labels
Milestone

Comments

@ldidry
Copy link
Contributor

ldidry commented May 15, 2018

Disclaimer I provided the problematic mail to some Sympa developers but will not provide it here since it contains personal informations. And I anonymized the informations in this bug report

For the record, here's a message I posted on sympa-fr:
https://listes.renater.fr/sympa/arc/sympa-fr/2017-10/msg00015.html

My configuration:
Debian 9, PostgreSQL 9.6, Sympa 6.2.16

Locales configuration:

# locale
LANG=fr_FR.UTF-8
LANGUAGE=
LC_CTYPE="fr_FR.UTF-8"
LC_NUMERIC="fr_FR.UTF-8"
LC_TIME="fr_FR.UTF-8"
LC_COLLATE="fr_FR.UTF-8"
LC_MONETARY="fr_FR.UTF-8"
LC_MESSAGES="fr_FR.UTF-8"
LC_PAPER="fr_FR.UTF-8"
LC_NAME="fr_FR.UTF-8"
LC_ADDRESS="fr_FR.UTF-8"
LC_TELEPHONE="fr_FR.UTF-8"
LC_MEASUREMENT="fr_FR.UTF-8"
LC_IDENTIFICATION="fr_FR.UTF-8"
LC_ALL=

For the DB:

    Nom    | Propriétaire | Encodage | Collationnement | Type caract.
-----------+--------------+----------+-----------------+-------------
 sympa     | sympa        | UTF8     | fr_FR.UTF-8     | fr_FR.UTF-8

Some characters crash sympa_msg: or . By deleting them, there's
no more crash.

Some characters are badly encoded in the database, even simple é
(which are current in french first name).

Here's a crash log:

DIED: Cannot decode string with wide characters at /usr/lib/x86_64-linux-gnu/perl/5.24/Encode.pm line 243.
  at /usr/lib/x86_64-linux-gnu/perl/5.24/Encode.pm line 243.
         Encode::decode_utf8('XXXXXX Fran\x{fffd}ois') called at /home/sympa/bin/Sympa/DatabaseDriver/PostgreSQL.pm line 93
         
Sympa::DatabaseDriver::PostgreSQL::do_prepared_query(Sympa::DatabaseDriver::PostgreSQL <db_host=localhost;db_name=XXX;db_port=5432;db_user=XXX>, 'INSERT INTO subscriber_table\x{a}                      (user_subs...', 'XXX@XXXX.net', 'XXXX Fran\x{fffd}ois', 'list_name', 'framalistes.org', 'mail', undef, ...) called at /home/sympa/bin/Sympa/List.pm line 3982
         Sympa::List::add_list_member(Sympa::List <list_name@framalistes.org>, HASH(0x558ee2091a70)) called at /home/sympa/bin/Sympa/Request/Handler/add.pm line 76
         
Sympa::Request::Handler::add::_twist(Sympa::Spindle::ProcessMessage=HASH(0x558ee241a6c8), Sympa::Request <action=add;context=list_name@framalistes.org;email=XXX@XXX.net>) called at /home/sympa/bin/Sympa/Spindle.pm line 92
         
Sympa::Spindle::spin(Sympa::Spindle::ProcessMessage=HASH(0x558ee241a6c8)) called at /home/sympa/bin/Sympa/Spindle/DoCommand.pm line 117
         
Sympa::Spindle::DoCommand::_twist(Sympa::Spindle::ProcessIncoming=HASH(0x558ee244b0b0), Sympa::Message <sympa@framalistes.org.1515358089.11027>) called at /home/sympa/bin/Sympa/Spindle.pm line 92
         
Sympa::Spindle::spin(Sympa::Spindle::ProcessIncoming=HASH(0x558ee244b0b0)) called at /home/sympa/bin/sympa_msg.pl line 240

I digged a little with a small perl script. This one don't work:

#!/usr/bin/perl
use strict;
use warnings;
use 5.10.0;
use Encode qw(decode_utf8);

open my $fh, '<', 'sympa@framalistes.org.1515533067.24289';
while(defined(my $foo = <$fh>)) {
     chomp $foo;
     $foo = decode_utf8($foo);
     say $foo if $foo =~ m/XXX.jean-rene/;
}
% ./test.pl
QUIET ADD list_name XXX.jean-rene@XXX.fr Jean-Ren�

But it works with binmode:

#!/usr/bin/perl
use strict;
use warnings;
use 5.10.0;

open my $fh, '<', 'sympa@framalistes.org.1515533067.24289';
while(defined(my $foo = <$fh>)) {
     binmode(STDIN,  ":utf8");
     chomp $foo;
     say $foo if $foo =~ m/XXX.jean-rene/;
}
% ./test.pl
QUIET ADD list_name XXX.jean-rene@XXX.fr Jean-René

That's all that I have and found.

@ldidry ldidry added the bug label May 15, 2018
@ikedas
Copy link
Member

ikedas commented May 16, 2018

Hi @ldidry,
Additionally, can you show us this information?

  • Result of sympa_wizard.pl --check (Note: run this command as non-root).

@ldidry
Copy link
Contributor Author

ldidry commented May 16, 2018

Ok

Vérification de la version de l'interpréteur Perl :

Votre version de Perl est suffisante (5.024001 >= 5.008)

Vérification des modules OBLIGATOIRES :

module Perl             depuis CPAN             STATUS
-----------             ---------               ------
Archive::Zip            Archive-Zip             OK (1.59   >= 1.05)
CGI                     CGI                     OK (4.35   >= 3.51)
Class::Singleton        Class-Singleton         OK (1.5    >= 1.03)
DBD::Pg                 DBD-Pg                  OK (3.5.3  >= 2.00)
DBI                     DBI                     OK (1.636  >= 1.48)
DateTime::Format::Mail  DateTime-Format-Mail    OK (0.403  >= 0.28)
DateTime::TimeZone      DateTime-TimeZone       OK (2.13   >= 0.59)
Digest::MD5             Digest-MD5              OK (2.54   >= 2.00)
Encode                  Encode                  OK (2.80_01 >= 1.0)
File::Copy::Recursive   File-Copy-Recursive     OK (0.38   >= 0.36)
File::NFSLock           File-NFSLock            OK (1.27   >= 1.0)
File::Path              File-Path               OK (2.1201 >= 2.08)
HTML::FormatText        HTML-Format             OK (2.12   >= 1.0)
HTML::StripScripts::ParserHTML-StripScripts-ParserOK (1.03   >= 1.03)
HTML::TreeBuilder       HTML-Tree               OK (5.03   >= 1.0)
IO::File                IO                      OK (1.16   >= 1.10)
IO::Scalar              IO-stringy              OK (2.111  >= 1.0)
LWP::UserAgent          libwww-perl             OK (6.15   >= 1.0)
List::Util::XS          Scalar-List-Utils       OK (1.49   >= 1.20)
Locale::Messages        libintl-perl            OK (1.26   >= 1.20)
MHonArc::UTF8           MHonArc                 OK (2.6.19 >= 2.6.18)
MIME::Base64            MIME-Base64             OK (3.15   >= 3.03)
MIME::Charset           MIME-Charset            OK (1.012  >= 1.011.3)
MIME::EncWords          MIME-EncWords           OK (1.014.3 >= 1.014)
MIME::Lite::HTML        MIME-Lite-HTML          OK (1.24   >= 1.23)
MIME::Tools             MIME-tools              OK (5.508  >= 5.423)
Mail::Address           MailTools               OK (2.18   >= 1.70)
Net::CIDR               Net-CIDR                OK (0.18   >= 0.16)
Sys::Syslog             Sys-Syslog              OK (0.33_01 >= 0.03)
Template                Template-Toolkit        OK (2.24   >= 2.21)
Term::ProgressBar       Term-ProgressBar        OK (2.18   >= 2.09)
Text::LineFold          Unicode-LineBreak       OK (2016.00702 >= 2011.05)
Time::HiRes             Time-HiRes              OK (1.9733 >= 1.29)
URI::Escape             URI                     OK (3.31   >= 3.28)
XML::LibXML             XML-LibXML              OK (2.0128 >= 1.70)

Vérification des modules OPTIONNELS :

module Perl             depuis CPAN             STATUS
-----------             ---------               ------
AuthCAS                 AuthCAS                 n'a pas été trouvé sur le sytème.
Setting FTP Passive mode
## Vous devez avoir les privilèges root pour installer le module AuthCAS. ##
## Appuyer sur Entrée pour continuer la vérfification des modules. ##

CGI::Fast               CGI-Fast                OK (2.12   >= 1.08)
Crypt::CipherSaber      Crypt-CipherSaber       OK (0.61   >= 0.50)
Crypt::OpenSSL::X509    Crypt-OpenSSL-X509      OK (1.807  >= 1.800.1)
Crypt::SMIME            Crypt-SMIME             OK (0.19   >= 0.15)
DBD::CSV                DBD-CSV                 version trop ancienne (0.49 < 1.0)
[Wed May 16 09:26:30 2018] sympa_wizard.pl: Use of uninitialized value in printf at ./sympa_wizard.pl line 572, <STDIN> line 2.
>>>>>>> Vous devez mettre à jour "DBD-CSV" vers la version "". <<<<<<
## Vous devez avoir les privilèges root pour installer le module DBD::CSV. ##
## Appuyer sur Entrée pour continuer la vérfification des modules. ##

DBD::ODBC               DBD-ODBC                n'a pas été trouvé sur le sytème.
## Vous devez avoir les privilèges root pour installer le module DBD::ODBC. ##
## Appuyer sur Entrée pour continuer la vérfification des modules. ##

DBD::Oracle             DBD-Oracle              n'a pas été trouvé sur le sytème.
## Vous devez avoir les privilèges root pour installer le module DBD::Oracle. ##
## Appuyer sur Entrée pour continuer la vérfification des modules. ##

DBD::SQLite             DBD-SQLite              OK (1.54   >= 1.31)
DBD::Sybase             DBD-Sybase              n'a pas été trouvé sur le sytème.
## Vous devez avoir les privilèges root pour installer le module DBD::Sybase. ##
## Appuyer sur Entrée pour continuer la vérfification des modules. ##

DBD::mysql              DBD-mysql               OK (4.041  >= 4.008)
Data::Password          Data-Password           OK (1.12   >= 1.07)
Encode::Locale          Encode-Locale           OK (1.05   >= 1.02)
FCGI                    CGI-Fast                OK (0.78   >= 0.67)
IO::Socket::SSL         IO-Socket-SSL           OK (2.044  >= 0.90)
Mail::DKIM::Verifier    Mail-DKIM               OK (0.4    >= 0.37)
Net::DNS                Net-DNS                 OK (1.07   >= 0.65)
Net::LDAP               perl-ldap               OK (0.65   >= 0.40)
Net::SMTP               libnet                  OK (3.08_01 >= 1.0)
SOAP::Lite              SOAP-Lite               OK (1.20   >= 0.712)

@ikedas
Copy link
Member

ikedas commented May 16, 2018

@ldidry,
According to log above, error is caused by a line including e-mail *@*.net and name François, but I couldn't find such line in the message you sent. Could you please confirm the message you sent is correct?

@ldidry
Copy link
Contributor Author

ldidry commented May 16, 2018

Well, the crash log was not related to that message, but it was the same kind of message (same sender, same commands).

This user usually send a lot of subscription for a lot of lists in a row (imagine ±500 subscriptions for ±20 lists, in 20 mails), I just picked one crash log and one mail while debugging its mails.

@ikedas
Copy link
Member

ikedas commented May 16, 2018

I want to see the message just related to log (or log related to message), unless such messages by that user always crashes Sympa. In such cases the problem can be related to the content of messages.

@ldidry
Copy link
Contributor Author

ldidry commented May 16, 2018

Well… characters like or (I'm not sure for ç, but I remember that I clean the mails from all characters other than some common accent like é or è) in mails from this user make Sympa crash. By removing them by editing the mail, it works, sympa_msg doesn't crash.

I just find something weird that give a clue for the source of badly encoded gecos in DB: a list (let's call it A) with apparently no pb in the gecos of users (I checked in web interface and DB) is a data source for another list (B). When syncing the data source, the B list got bad encoding!
And the strangest thing is that it's only one user that got its gecos badly encoded (an é), while other subscribers of the A list have é in their gecos too, and some other got î or ô and they are included just fine.
I checked, the included subscribers with no pb are only subscribed to the B list by the A list data source.

You may need to know that the user that send the mail that crashes sympa_msg use a lot of lists as data source for other lists (I don't know if that's relevant, but it's better to have too much informations than not enough).

@ldidry
Copy link
Contributor Author

ldidry commented May 16, 2018

By the way, I've got a script that tracks badly encoded gecos. Here's the relevant part:

SELECT distinct(comment_subscriber),user_subscriber,list_subscriber FROM subscriber_table WHERE comment_subscriber IS NOT NULL ORDER BY user_subscriber" sympa | grep -P "[^\x00-\x7FéèçàïëîêâûüùÉÈÊÀÇÏÎËÂÛÙÄùÜ ôÔöÖã¿ñóá°¿¿í¿¿¿¿¿æÆœŒồ“”’Рая¨õ¿ú⋅ɴᴅʀᴇ́Áč•òÿäριστός²–]

I don't think it will help you, but… better to have it.

I forgot to say that we got encoding pbs in subscriber_table and user_table but less frequently for user_table.

@ldidry
Copy link
Contributor Author

ldidry commented May 16, 2018

I confirm the list inclusion problem: I just saw the same behavior with another list inclusion in another list (completely different people).

@ldidry
Copy link
Contributor Author

ldidry commented May 16, 2018

New information: it seems that the problem with the encoding in user_table has been triggered this morning by changing the owner email from a list. Its gecos has then been badly encoded in DB.

I don't know if it's relevant, but the user first made a mistake in the new email address (gmail;com instead of gmail.com) before changing the address for a valid one.

@ikedas
Copy link
Member

ikedas commented May 16, 2018

Well… characters like or (I'm not sure for ç, but I remember that I clean the mails from all characters other than some common accent like é or è) in mails from this user make Sympa crash. By removing them by editing the mail, it works, sympa_msg doesn't crash.

I guess that byte is "\xE7", octal 347 (you can see values of bytes using od(1) etc.).

Anyways illegal bytes should not crash sympa.pl. I'll investigate code to fix it in a few days.

@ikedas
Copy link
Member

ikedas commented May 17, 2018

@ldidry, now I understand why you told about database.

Does this patch fix the problem?

@ldidry
Copy link
Contributor Author

ldidry commented May 17, 2018

It solved at least #308. I need to wait one or two days to see if it solved this issue.

After that, I'll try with DBD::Pg 3.6 or later.

@ldidry
Copy link
Contributor Author

ldidry commented May 17, 2018

Great job, thank you! 👍

@ikedas ikedas added this to the 6.2.34 milestone May 18, 2018
@ldidry
Copy link
Contributor Author

ldidry commented May 22, 2018

I tried with DBD::PG 3.7.4, it maked sympa_msg, archived and task_manager crash a few minutes after the restart. I'll try 3.6+ later.

@ikedas
Copy link
Member

ikedas commented May 23, 2018

Ok, if both 3.6.x and 3.7.4 crashes, could you please apply this additional patch and check if problem will be solved? Thanks.

@ikedas ikedas changed the title Encoding issue PostgreSQL: Issues related to utf8 flag May 28, 2018
@ldidry
Copy link
Contributor Author

ldidry commented May 28, 2018

Your additional patch seems to work very well: I applied it and installed DBD::PG 3.7.4, all is still working well.

👍

ikedas added a commit that referenced this issue May 28, 2018
Candidate fix for #305, #308: PostgreSQL: Issues related to utf8 flag
@ikedas
Copy link
Member

ikedas commented May 28, 2018

@ldidry, thanks!
IMHO Sympa has emphasized LAMP and tend to neglect other DB. Your help improved support for PostgrSQL much!

@ikedas ikedas closed this as completed May 28, 2018
@racke
Copy link
Contributor

racke commented May 28, 2018 via email

@ldidry
Copy link
Contributor Author

ldidry commented May 28, 2018

No need to thank me: I just had the pb and tried the proposed patches 😉

it didn't show this problem in the instances I administered.

We were in an older version of Sympa on an old Debian (wheezy I think), and migrated Sympa version AND migrated to another server with Jessie. Maybe something went wrong when I migrated the database.

Then we opened the service to a lot of people (we now have 238,548 users in user_table. We didn't had the problem for more than a year, but… the more users you have, the more likely you are to find a bug.

@racke
Copy link
Contributor

racke commented May 28, 2018

After all, we wouldn't have a fix if you didn't tried these patches 😁 @ldidry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants