diff --git a/Changes b/Changes index 3055306..1d5e2c6 100644 --- a/Changes +++ b/Changes @@ -2,59 +2,64 @@ Revision history for Perl module {{$dist->name}} {{$NEXT}} +0.018 2016-08-10 + - Now choosing a utf-8 encoding that will not break threads [gh-43; schwern] + - Handling utf-8 encoding/decoding errors [gh-35; HayoBaan] + - New maintainer: [HayoBaan] + 0.017 2015-11-13 - Tests now correctly handle the PERL_UNICODE env variable - and the -C perl command-line flag. [gh-40; HayoBaan] - - Implemented "no utf8::all" [gh-33; HayoBaan] - - Corrected a number of tests [HayoBaan] - - Added wrapper for readlink [gh-21; HayoBaan] - - Added test for readpipe, qx, and backtick operator [HayoBaan] - - Rewrote documentation [HayoBaan] + and the -C perl command-line flag. [gh-40; HayoBaan] + - Implemented "no utf8::all" [gh-33; HayoBaan] + - Corrected a number of tests [HayoBaan] + - Added wrapper for readlink [gh-21; HayoBaan] + - Added test for readpipe, qx, and backtick operator [HayoBaan] + - Rewrote documentation [HayoBaan] 0.016 2015-01-08 - - Not decoding @ARGV when perl is run with -CA [gh-32; HayoBaan] + - Not decoding @ARGV when perl is run with -CA [gh-32; HayoBaan] Thank you [saulery] for the tip! - - Fixed exclusion of Windows platform [HayoBaan] - - Excluding DOS and OS/2 platforms [HayoBaan] + - Fixed exclusion of Windows platform [HayoBaan] + - Excluding DOS and OS/2 platforms [HayoBaan] - Moved utf-8 aware implementation of File::Find and Cwd to their - own module (File::Find::utf8 and Cwd::utf8) [HayoBaan] + own module (File::Find::utf8 and Cwd::utf8) [HayoBaan] 0.015 2014-08-28 - Fixed changelog for 0.014 - Removed overly optimistic use of v5.20.0 syntax 0.014 2014-08-27 - - Skip locale tests on systems without locale support [gh-27; Hugmeir] - - Added wrapper for: [HayoBaan] - - glob [HayoBaan] - - File::Find::find, File::Find::finddepth, [HayoBaan] - - Cwd::cwd Cwd::fastcwd Cwd::getcwd Cwd::fastgetcwd [HayoBaan] - - Cwd::abs_path Cwd::realpath Cwd::fast_abs_path [HayoBaan] + - Skip locale tests on systems without locale support [gh-27; Hugmeir] + - Added wrapper for: [HayoBaan] + - glob [HayoBaan] + - File::Find::find, File::Find::finddepth, [HayoBaan] + - Cwd::cwd Cwd::fastcwd Cwd::getcwd Cwd::fastgetcwd [HayoBaan] + - Cwd::abs_path Cwd::realpath Cwd::fast_abs_path [HayoBaan] 0.013 2014-08-19 - Warn instead of bailing out of the test suite when - autodie is old [gh-26, gh-22] - - Only decoding @ARGV when called from the main package [gh-18] [HayoBaan] + autodie is old [gh-26, gh-22] + - Only decoding @ARGV when called from the main package [gh-18; HayoBaan] 0.012 2014-08-03 - - Disable wrapping readdir on Windows [gh-17] - - Don't ship files with names that aren't portable to Windows [gh-17] + - Disable wrapping readdir on Windows [gh-17] + - Don't ship files with names that aren't portable to Windows [gh-17] 0.011 2013-08-03 - - Only decode readdir entries if utf8::all is in effect [leont] - - Support direct dirhandles in readdir [leont] + - Only decode readdir entries if utf8::all is in effect [leont] + - Support direct dirhandles in readdir [leont] 0.010 2013-02-02 - Don't depend on localizable error strings 0.009 2012-10-27 - - Don't depend on filesystem ordering [leont, GH #14] + - Don't depend on filesystem ordering [leont, gh-14] 0.008 2012-10-24 - - Enable unicode_strings (see perldoc feature) [GH #2] - - Enable unicode_eval (see perldoc feature) [GH #2] - - Enable fc (see perldoc fc) [GH #2] - - Wrap CORE::readdir to provide UTF-8 filenames [GH #11] + - Enable unicode_strings (see perldoc feature) [gh-2] + - Enable unicode_eval (see perldoc feature) [gh-2] + - Enable fc (see perldoc fc) [gh-2] + - Wrap CORE::readdir to provide UTF-8 filenames [gh-11] 0.007 2012-08-01 - Use version.pm for comparing versions in the test suite @@ -62,19 +67,19 @@ Revision history for Perl module {{$dist->name}} - Don't fail the test suite if autodie is too old 0.006 2012-07-29 - - Be less strict with detecting fatal UTF-8 error in test suite [GH #12] + - Be less strict with detecting fatal UTF-8 error in test suite [gh-12] 0.005 2012-07-29 - - Use Import::Into instead of home-grown "solution" [GH #10] - - Don't permit running with autodie < 2.12, due to RT #54777 [GH #7] - - Promote utf8 warnings to fatal errors [GH #1] + - Use Import::Into instead of home-grown "solution" [gh-10] + - Don't permit running with autodie < 2.12, due to RT #54777 [gh-7] + - Promote utf8 warnings to fatal errors [gh-1] 0.004 2012-01-04 - - Fix test suite for less current versions of Perl [getty, doherty] + - Fix test suite for less current versions of Perl [getty, doherty] 0.003 2011-12-21 - Internal refactoring - - Load charnames [sartak] + - Load charnames [sartak] 0.002 2011-04-21 - Expand test suite slightly diff --git a/Makefile.PL b/Makefile.PL index d1ca83b..dc97678 100644 --- a/Makefile.PL +++ b/Makefile.PL @@ -1,4 +1,4 @@ -# This file was automatically generated by Dist::Zilla::Plugin::MakeMaker v5.036. +# This file was automatically generated by Dist::Zilla::Plugin::MakeMaker v6.006. use strict; use warnings; @@ -8,19 +8,16 @@ use ExtUtils::MakeMaker; my %WriteMakefileArgs = ( "ABSTRACT" => "turn on Unicode - all of it", - "AUTHOR" => "Michael Schwern , Mike Doherty ", - "BUILD_REQUIRES" => { - "Module::Build" => "0.28" - }, + "AUTHOR" => "Michael Schwern , Mike Doherty , Hayo Baan ", "CONFIGURE_REQUIRES" => { - "Module::Build" => "0.28" + "ExtUtils::MakeMaker" => 0 }, "DISTNAME" => "utf8-all", - "EXE_FILES" => [], "LICENSE" => "perl", "MIN_PERL_VERSION" => "5.010", "NAME" => "utf8::all", "PREREQ_PM" => { + "Carp" => 0, "Encode" => 0, "Import::Into" => 0, "Symbol" => 0, @@ -37,14 +34,18 @@ my %WriteMakefileArgs = ( "IO::Handle" => 0, "IPC::Open3" => 0, "PerlIO" => 0, + "Test::Exception" => 0, "Test::Fatal" => 0, "Test::More" => "0.96", "Test::Warn" => 0, "autodie" => 0, + "blib" => "1.01", "constant" => 0, + "threads" => 0, + "threads::shared" => 0, "version" => "0.77" }, - "VERSION" => "0.017", + "VERSION" => "0.018", "test" => { "TESTS" => "t/*.t" } @@ -52,24 +53,28 @@ my %WriteMakefileArgs = ( my %FallbackPrereqs = ( + "Carp" => 0, "Encode" => 0, "File::Spec" => 0, "IO::Handle" => 0, "IPC::Open3" => 0, "Import::Into" => 0, - "Module::Build" => "0.28", "PerlIO" => 0, "Symbol" => 0, + "Test::Exception" => 0, "Test::Fatal" => 0, "Test::More" => "0.96", "Test::Warn" => 0, "autodie" => 0, + "blib" => "1.01", "charnames" => 0, "constant" => 0, "feature" => 0, "open" => 0, "parent" => 0, "strict" => 0, + "threads" => 0, + "threads::shared" => 0, "utf8" => 0, "version" => "0.77", "warnings" => 0 diff --git a/README b/README index 4c22992..45a11ed 100644 --- a/README +++ b/README @@ -1,107 +1,145 @@ -=head1 SYNOPSIS +NAME - use utf8::all; # Turn on UTF-8, all of it. + utf8::all - turn on Unicode - all of it - open my $in, '<', 'contains-utf8'; # UTF-8 already turned on here - print length 'føø bār'; # 7 UTF-8 characters - my $utf8_arg = shift @ARGV; # @ARGV is UTF-8 too (only for main) +VERSION -=head1 DESCRIPTION + version 0.018 -The C pragma tells the Perl parser to allow UTF-8 in the -program text in the current lexical scope. This also means that you -can now use literal Unicode characters as part of strings, variable -names, and regular expressions. +SYNOPSIS -C goes further: + use utf8::all; # Turn on UTF-8, all of it. + + open my $in, '<', 'contains-utf8'; # UTF-8 already turned on here + print length 'føø bār'; # 7 UTF-8 characters + my $utf8_arg = shift @ARGV; # @ARGV is UTF-8 too (only for main) -=over 4 +DESCRIPTION -=item * + The use utf8 pragma tells the Perl parser to allow UTF-8 in the program + text in the current lexical scope. This also means that you can now use + literal Unicode characters as part of strings, variable names, and + regular expressions. -L|charnames> are imported so C<\N{...}> sequences can be -used to compile Unicode characters based on names. + utf8::all goes further: -=item * + * charnames are imported so \N{...} sequences can be used to compile + Unicode characters based on names. -On Perl C or higher, the C is -enabled. + * On Perl v5.11.0 or higher, the use feature 'unicode_strings' is + enabled. -=item * + * use feature fc and use feature unicode_eval are enabled on Perl + 5.16.0 and higher. -C and C are enabled on Perl -C<5.16.0> and higher. + * Filehandles are opened with UTF-8 encoding turned on by default + (including STDIN, STDOUT, STDERR). Meaning that they automatically + convert UTF-8 octets to characters and vice versa. If you don't want + UTF-8 for a particular filehandle, you'll have to set binmode + $filehandle. -=item * + * @ARGV gets converted from UTF-8 octets to Unicode characters (when + utf8::all is used from the main package). This is similar to the + behaviour of the -CA perl command-line switch (see perlrun). -Filehandles are opened with UTF-8 encoding turned on by default -(including STDIN, STDOUT, STDERR). Meaning that they automatically -convert UTF-8 octets to characters and vice versa. If you I -want UTF-8 for a particular filehandle, you'll have to set C. + * readdir, readlink, readpipe (including the qx// and backtick + operators), and glob (including the <> operator) now all work with + and return Unicode characters instead of (UTF-8) octets. -=item * + Lexical Scope -C<@ARGV> gets converted from UTF-8 octets to Unicode characters (when -C is used from the main package). This is similar to the -behaviour of the C<-CA> perl command-line switch (see L). + The pragma is lexically-scoped, so you can do the following if you had + some reason to: -=item * + { + use utf8::all; + open my $out, '>', 'outfile'; + my $utf8_str = 'føø bār'; + print length $utf8_str, "\n"; # 7 + print $out $utf8_str; # out as utf8 + } + open my $in, '<', 'outfile'; # in as raw + my $text = do { local $/; <$in>}; + print length $text, "\n"; # 10, not 7! -C, C, C (including the C and -backtick operators), and L|perlfunc/glob> (including the C<< <> ->> operator) now all work with and return Unicode characters instead -of (UTF-8) octets. + Instead of lexical scoping, you can also use no utf8::all to turn off + the effects. -=back + Note that the effect on @ARGV and the STDIN, STDOUT, and STDERR file + handles is always global! -=head2 Lexical scope + UTF-8 Errors -The pragma is lexically-scoped, so you can do the following if you had -some reason to: + By default, utf8::all will handle invalid code points (i.e., utf-8 that + does not map to a valid unicode "character"), as a fatal error. - { - use utf8::all; - open my $out, '>', 'outfile'; - my $utf8_str = 'føø bār'; - print length $utf8_str, "\n"; # 7 - print $out $utf8_str; # out as utf8 - } - open my $in, '<', 'outfile'; # in as raw - my $text = do { local $/; <$in>}; - print length $text, "\n"; # 10, not 7! + Note: On Perl < v5.24.0 a bug in handling the I/O encoding layers in + combination with threads (and, on Windows, forks) causes a segmentation + fault. To prevent this, utf8::all will use the non-strict :utf8 instead + of :encoding(UTF-8) for I/O in case of a thread enabled Perl < 5.24.0. + If threads are not enabled for your Perl version, or if you are using a + version >= v5.24.0, utf8::all will use the strict (and recommended) + :encoding(UTF-8) I/O layer. -Instead of lexical scoping, you can also use C to turn -off the effects. + Note: For glob, readdir, and readlink, one can decide how decoding + errors are handled by setting the attribute "$utf8::all::UTF8_CHECK". -Note that the effect on C<@ARGV> and the C, C, and -C file handles is always global! +ATTRIBUTES -=head1 COMPATIBILITY + $utf8::all::UTF8_CHECK -The filesystems of Dos, Windows, and OS/2 do not (fully) support -UTF-8. The C and C functions and C operators -will therefore not be replaced on these systems. + By default utf8::all marks decoding errors as fatal (default value for + this setting is Encode::FB_CROAK). If you want, you can change this by + setting $utf8::all::UTF8_CHECK. The value Encode::FB_WARN reports the + encoding errors as warnings, and Encode::FB_DEFAULT will completely + ignore them. Please see Encode for details. Note: Encode::LEAVE_SRC is + always enforced. -=head1 SEE ALSO + Important: Only controls the handling of decoding errors in glob, + readdir, readlink. -=over 4 +INTERACTION WITH AUTODIE -=item * + If you use autodie, which is a great idea, you need to use at least + version 2.12, released on June 26, 2012 + . Otherwise, + autodie obliterates the IO layers set by the open pragma. See RT #54777 + and GH #7 + . -L for fully utf-8 aware File::Find functions. +BUGS -=item * + Please report any bugs or feature requests on the bugtracker website + . -L for fully utf-8 aware Cwd functions. + When submitting a bug or request, please include a test-file or a patch + to an existing test-file that illustrates the bug or desired feature. -=back +COMPATIBILITY -=head1 INTERACTION WITH AUTODIE + The filesystems of Dos, Windows, and OS/2 do not (fully) support UTF-8. + The readlink and readdir functions and glob operators will therefore + not be replaced on these systems. -If you use L, which is a great idea, you need to use at least version -B<2.12>, released on L. -Otherwise, autodie obliterates the IO layers set by the L pragma. See -L and -L. +SEE ALSO + + * File::Find::utf8 for fully utf-8 aware File::Find functions. + + * Cwd::utf8 for fully utf-8 aware Cwd functions. + +AUTHORS + + * Michael Schwern + + * Mike Doherty + + * Hayo Baan + +COPYRIGHT AND LICENSE + + This software is copyright (c) 2009 by Michael Schwern + ; he originated it. + + This is free software; you can redistribute it and/or modify it under + the same terms as the Perl 5 programming language system itself. diff --git a/README.mkdn b/README.mkdn index 3b43b58..efa4ca9 100644 --- a/README.mkdn +++ b/README.mkdn @@ -4,7 +4,7 @@ utf8::all - turn on Unicode - all of it # VERSION -version 0.017 +version 0.018 # SYNOPSIS @@ -41,7 +41,7 @@ behaviour of the `-CA` perl command-line switch (see [perlrun](https://metacpan. backtick operators), and [`glob`](https://metacpan.org/pod/perlfunc#glob) (including the `<>` operator) now all work with and return Unicode characters instead of (UTF-8) octets. -## Lexical scope +## Lexical Scope The pragma is lexically-scoped, so you can do the following if you had some reason to: @@ -63,36 +63,56 @@ off the effects. Note that the effect on `@ARGV` and the `STDIN`, `STDOUT`, and `STDERR` file handles is always global! -# SEE ALSO +## UTF-8 Errors -- [File::Find::utf8](https://metacpan.org/pod/File::Find::utf8) for fully utf-8 aware File::Find functions. -- [Cwd::utf8](https://metacpan.org/pod/Cwd::utf8) for fully utf-8 aware Cwd functions. +By default, `utf8::all` will handle invalid code points (i.e., +utf-8 that does not map to a valid unicode "character"), as a fatal +error. -# INTERACTION WITH AUTODIE +Note: On Perl < v5.24.0 a bug in handling the I/O encoding layers in +combination with threads (and, on Windows, forks) causes a +segmentation fault. To prevent this, `utf8::all` will use the +non-strict `:utf8` instead of `:encoding(UTF-8)` for I/O in case of +a thread enabled Perl < 5.24.0. If threads are not enabled for your +Perl version, or if you are using a version >= v5.24.0, `utf8::all` +will use the strict (and recommended) `:encoding(UTF-8)` I/O layer. + +Note: For `glob`, `readdir`, and `readlink`, one can decide how +decoding errors are handled by setting the attribute +["$utf8::all::UTF8\_CHECK"](#utf8-all-utf8_check). + +# ATTRIBUTES -If you use [autodie](https://metacpan.org/pod/autodie), which is a great idea, you need to use at least version -**2.12**, released on [June 26, 2012](https://metacpan.org/source/PJF/autodie-2.12/Changes#L3). -Otherwise, autodie obliterates the IO layers set by the [open](https://metacpan.org/pod/open) pragma. See -[RT #54777](https://rt.cpan.org/Ticket/Display.html?id=54777) and -[GH #7](https://github.com/doherty/utf8-all/issues/7). +## $utf8::all::UTF8\_CHECK -# AVAILABILITY +By default `utf8::all` marks decoding errors as fatal (default value +for this setting is `Encode::FB_CROAK`). If you want, you can change this by +setting `$utf8::all::UTF8_CHECK`. The value `Encode::FB_WARN` reports +the encoding errors as warnings, and `Encode::FB_DEFAULT` will completely +ignore them. Please see [Encode](https://metacpan.org/pod/Encode) for details. Note: `Encode::LEAVE_SRC` is +_always_ enforced. -The project homepage is [http://metacpan.org/release/utf8-all/](http://metacpan.org/release/utf8-all/). +Important: Only controls the handling of decoding errors in `glob`, +`readdir`, `readlink`. -The latest version of this module is available from the Comprehensive Perl -Archive Network (CPAN). Visit [http://www.perl.com/CPAN/](http://www.perl.com/CPAN/) to find a CPAN -site near you, or see [https://metacpan.org/module/utf8::all/](https://metacpan.org/module/utf8::all/). +# INTERACTION WITH AUTODIE -# SOURCE +If you use [autodie](https://metacpan.org/pod/autodie), which is a great idea, you need to use at least +version **2.12**, released on [June 26, +2012](https://metacpan.org/source/PJF/autodie-2.12/Changes#L3). +Otherwise, autodie obliterates the IO layers set by the [open](https://metacpan.org/pod/open) +pragma. See [RT +\#54777](https://rt.cpan.org/Ticket/Display.html?id=54777) and [GH +\#7](https://github.com/doherty/utf8-all/issues/7). -The development version is on github at [http://github.com/doherty/utf8-all](http://github.com/doherty/utf8-all) -and may be cloned from [git://github.com/doherty/utf8-all.git](git://github.com/doherty/utf8-all.git) +# BUGS -# BUGS AND LIMITATIONS +Please report any bugs or feature requests on the bugtracker +[website](https://github.com/doherty/utf8-all/issues). -You can make new bug reports, and view existing ones, through the -web interface at [https://github.com/doherty/utf8-all/issues](https://github.com/doherty/utf8-all/issues). +When submitting a bug or request, please include a test-file or a +patch to an existing test-file that illustrates the bug or desired +feature. # COMPATIBILITY @@ -100,14 +120,20 @@ The filesystems of Dos, Windows, and OS/2 do not (fully) support UTF-8. The `readlink` and `readdir` functions and `glob` operators will therefore not be replaced on these systems. +# SEE ALSO + +- [File::Find::utf8](https://metacpan.org/pod/File::Find::utf8) for fully utf-8 aware File::Find functions. +- [Cwd::utf8](https://metacpan.org/pod/Cwd::utf8) for fully utf-8 aware Cwd functions. + # AUTHORS - Michael Schwern - Mike Doherty +- Hayo Baan # COPYRIGHT AND LICENSE -This software is copyright (c) 2009 by Michael Schwern . +This software is copyright (c) 2009 by Michael Schwern ; he originated it. This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.