-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
threads + utf8::all == boom #42
Comments
Hi Michael, So I am now thinking an option sounds best, but indeed has the problems you mention. Regardless of the way we implement this, we sure must write a warning in the documentation about threading on < v5.24. If we decide not to go for an option, I think I like your second option best, but then I would suggest to add another check (e.g., Your thoughts please 😃 |
Yes to clear docs about this. I don't like breaking backwards compat, but I can't think of a scenario where Another backwards compatibility consideration is utf8::all does not have a large chain of dependent modules relying on its behavior. Very few CPAN modules depend on utf8::all, and they're mostly apps which have no dependencies. So I don't see there being a large cascade of hidden consequences. How about something along the lines of...
The target audience for utf8::all is people who don't need or want to know the details, they just want to safely work with utf8 data. By leaving the published details of exactly what encoding it will choose vague, users are discouraged from relying on it. And it allows utf8::all to change the details of how it chooses the encoding, like adding a check for I deliberately left out an option for users to choose their own encoding. This would lead to two pieces of code in the same process wanting two different encodings. Because utf8::all has global effects (@argv, STDOUT, STDERR, STDIN) that will be difficult to safely swap out lexically. |
There is a third option here: use PerlIO::utf8_strict. It's about as performant as :utf8, but actually validates the data (similar to That means we don't have to sacrifice correctness and security for stability. |
@Leont Sounds like a good option, but I can't get it to work with
|
Leon suggested using PerlIO::utf8_strict as the default. doherty#42 (comment) It seems to work. I'm assuming there's still value in trying to use :encoding(UTF-8)? Maybe it's faster?
Apparently open.pm is special-cased so that for
|
Leon suggested using PerlIO::utf8_strict as the default. doherty#42 (comment) It seems to work. I'm assuming there's still value in trying to use :encoding(UTF-8)? Maybe it's faster?
Leon suggested using PerlIO::utf8_strict as the default. doherty#42 (comment) It seems to work. I'm assuming there's still value in trying to use :encoding(UTF-8)? Maybe it's faster?
As demonstrated in the code below, utf8::all and threads == segfault for Perl < 5.24.
It's a long, long, long standing bug that
:encoding(utf-8)
+ threads == segfault. It was only just fixed. See https://rt.perl.org/Public/Bug/Display.html?id=31923This is the root cause of a long standing issue in perl5i.
evalEmpire/perl5i#271
Since fork() on Windows uses threads, this also means utf8::all + fork() == boom on Windows.
One work around would be to switch to
:utf8
. This is one of my preferred options. It makes utf8::all just work (its goal). OTOH this introduces a backwards compat problem. OT3H most people don't know the difference between the two modes and will never notice.Another is to use
:utf8
when Perl < 5.24. This would also be my preferred option.Another is to use
:utf8
when threads are on. I don't like this because A) it introduces an ordering between utf8::all and threads and B) now utf8::all is acting differently depending on whether threads are on or off.Another is to add an option to use
:utf8
. I don't like this because A) it throws the problem onto the user; they probably don't even know there is a problem and will have to track down a frustrating segfault. B)STD*
and@ARGV
are effectively global state, what happens whenuse utf8::all
anduse utf8::all ":utf8"
are in the same process?LMK what you'd like and I'll code it up.
The text was updated successfully, but these errors were encountered: