Skip to content

Commit

Permalink
Translate all - characters to \- in roff
Browse files Browse the repository at this point in the history
Pod::Man now translates all "-" characters in the input into *roff "\-"
escapes (normally rendered as an ASCII hyphen-minus, U+002D) rather
than using fragile heuristics to decide which characters represent true
hyphens and which represent ASCII hyphen-minus.  The previous
heuristics misrendered command names such as apt-get, causing search
and cut-and-paste issues.

This change may cause line-break issues with long hyphenated phrases.
In cases where the intent is a true hyphen, consider using UTF-8 as
the POD character set (declared with =encoding) and using true Unicode
hyphens instead of the ASCII "-" character.
  • Loading branch information
rra committed Oct 23, 2023
1 parent e16d890 commit 16bd347
Show file tree
Hide file tree
Showing 16 changed files with 69 additions and 78 deletions.
12 changes: 11 additions & 1 deletion Changes
Original file line number Diff line number Diff line change
@@ -1,6 +1,16 @@
Revision history for podlators

5.02 - Not Released
6.00 - Not Released

- Pod::Man now translates all "-" characters in the input into *roff "\-"
escapes (normally rendered as an ASCII hyphen-minus, U+002D) rather
than using fragile heuristics to decide which characters represent true
hyphens and which represent ASCII hyphen-minus. The previous
heuristics misrendered command names such as apt-get, causing search
and cut-and-paste issues. This change may cause line-break issues with
long hyphenated phrases. In cases where the intent is a true hyphen,
consider using UTF-8 as the POD character set (declared with =encoding)
and using true Unicode hyphens instead of the ASCII "-" character.

- pod2text --help now exits with status 0, not 1, matching normal UNIX
command behavior and the behavior of pod2man. (GitHub #19)
Expand Down
44 changes: 20 additions & 24 deletions lib/Pod/Man.pm
Original file line number Diff line number Diff line change
Expand Up @@ -578,26 +578,6 @@ sub guesswork {
my $self = shift;
local $_ = shift;

# By the time we reach this point, all hyphens will be escaped by adding a
# backslash. We want to undo that escaping if they're part of regular
# words and there's only a single dash, since that's a real hyphen that
# *roff gets to consider a possible break point. Make sure that a dash
# after the first character of a word stays non-breaking, however.
#
# Note that this is not user-controllable; we pretty much have to do this
# transformation or *roff will mangle the output in unacceptable ways.
s{
( (?:\G|^|\s|$NBSP) [\(\"]* [a-zA-Z] ) ( \\- )?
( (?: [a-zA-Z\']+ \\-)+ )
( [a-zA-Z\']+ ) (?= [\)\".?!,;:]* (?:\s|$NBSP|\Z|\\\ ) )
\b
} {
my ($prefix, $hyphen, $main, $suffix) = ($1, $2, $3, $4);
$hyphen ||= '';
$main =~ s/\\-/-/g;
$prefix . $hyphen . $main . $suffix;
}egx;

# Embolden functions in the form func(), including functions that are in
# all capitals, but don't embolden if there's anything inside the parens.
# The function must start with an alphabetic character or underscore and
Expand Down Expand Up @@ -2349,9 +2329,25 @@ document will have inconsistent spacing.
=head2 Hyphens
The handling of hyphens versus dashes is somewhat fragile, and one may get a
the wrong one under some circumstances. This will normally only matter for
line breaking and possibly for troff output.
The *roff language distinguishes between two types of hyphens: C<->, which is
a true typesetting hyphen (roughly equivalent to the Unicode U+2010 code
point), and C<\->, which is the ASCII hyphen-minus (U+002D) that is used for
UNIX command options and most filenames. Hyphens, where appropriate, produce
better typesetting, but incorrectly using them for command names and options
can cause problems with searching and cut-and-paste.
POD does not draw this distinction. Before podlators 6.00, Pod::Man attempted
to translate C<-> in the input into either a hyphen or a hyphen-minus,
depending on context. However, this distinction proved impossible to do
correctly with heuristics. Pod::Man therefore translates all C<-> characters
in the input to C<\-> in the output, ensuring that command names and options
are correct at the cost of somewhat inferior typesetting and line breaking
issues with long hyphenated phrases.
To use true hyphens in the Pod::Man output, declare an input character set of
UTF-8 (or some other Unicode encoding) and use Unicode hyphens. Pod::Man and
*roff should handle those correctly with the default output format and most
modern *roff implementations.
=head1 AUTHOR
Expand All @@ -2364,7 +2360,7 @@ recognition and all bugs are mine.
=head1 COPYRIGHT AND LICENSE
Copyright 1999-2010, 2012-2020, 2022 Russ Allbery <rra@cpan.org>
Copyright 1999-2010, 2012-2020, 2022-2023 Russ Allbery <rra@cpan.org>
Substantial contributions by Sean Burke <sburke@cpan.org>.
Expand Down
16 changes: 8 additions & 8 deletions t/data/basic.man
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,7 @@ This paragraph should be doubly indented.
.Sp
This paragraph should only be singly indented.
.IP \(bu 4
This is an item in the middle of a block-quote, which should be allowed.
This is an item in the middle of a block\-quote, which should be allowed.
.IP \(bu 4
We're also testing tagless item commands.
.RE
Expand Down Expand Up @@ -204,7 +204,7 @@ Another test taken from Pod::Parser.
This is a test to see if I can do not only \f(CW$self\fR and \f(CWmethod()\fR, but
also \f(CW\*(C`$self\->method()\*(C'\fR and \f(CW\*(C`$self\->{FIELDNAME}\*(C'\fR and
\&\f(CW\*(C`$Foo <=> $Bar\*(C'\fR without resorting to escape sequences. If
I want to refer to the right-shift operator I can do something
I want to refer to the right\-shift operator I can do something
like \f(CW\*(C`$x >> 3\*(C'\fR or even \f(CW\*(C`$y >> 5\*(C'\fR.
.PP
Now for the grand finale of \f(CW\*(C`$self\->method()\->{FIELDNAME} = {FOO=>BAR}\*(C'\fR.
Expand Down Expand Up @@ -240,18 +240,18 @@ An ampersand.
.IP ' 3
An apostrophe.
.IP < 3
A less-than sign.
A less\-than sign.
.IP > 3
A greater-than sign.
A greater\-than sign.
.IP """" 3
A double quotation mark.
.IP / 3
A forward slash.
.PP
Try to get this bit of text over towards the edge so |that\ all\ of\ this\ text\ inside\ S<>\ won't| be wrapped. Also test the
|same\ thing\ with\ non-breaking\ spaces.|
|same\ thing\ with\ non\-breaking\ spaces.|
.PP
There is a soft hy\%phen in hyphen at hy-phen.
There is a soft hy\%phen in hyphen at hy\-phen.
.PP
This is a test of an index entry.
.IX Xref "index entry"
Expand Down Expand Up @@ -322,7 +322,7 @@ Copyright 2001, 2004, 2016, 2018 Russ Allbery <rra@cpan.org>
.PP
Copying and distribution of this file, with or without modification, are
permitted in any medium without royalty provided the copyright notice and
this notice are preserved. This file is offered as-is, without any
this notice are preserved. This file is offered as\-is, without any
warranty.
.PP
SPDX-License-Identifier: FSFAP
SPDX\-License\-Identifier: FSFAP
6 changes: 3 additions & 3 deletions t/data/man/encoding.groff
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ Combining accent: nai\[u0308]ve
.PP
SMP plane character: \[u1F600]
.PP
Non-breaking space: foo\ bar, foo\ bar
Non\-breaking space: foo\ bar, foo\ bar
.PP
Soft hyphen: fac\%tory
.SH LICENSE
Expand All @@ -81,7 +81,7 @@ Copyright 2022 Russ Allbery <rra@cpan.org>
.PP
Copying and distribution of this file, with or without modification, are
permitted in any medium without royalty provided the copyright notice and
this notice are preserved. This file is offered as-is, without any
this notice are preserved. This file is offered as\-is, without any
warranty.
.PP
SPDX-License-Identifier: FSFAP
SPDX\-License\-Identifier: FSFAP
6 changes: 3 additions & 3 deletions t/data/man/encoding.roff
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ Combining accent: naiXve
.PP
SMP plane character: X
.PP
Non-breaking space: foo\ bar, foo\ bar
Non\-breaking space: foo\ bar, foo\ bar
.PP
Soft hyphen: fac\%tory
.SH LICENSE
Expand All @@ -143,7 +143,7 @@ Copyright 2022 Russ Allbery <rra@cpan.org>
.PP
Copying and distribution of this file, with or without modification, are
permitted in any medium without royalty provided the copyright notice and
this notice are preserved. This file is offered as-is, without any
this notice are preserved. This file is offered as\-is, without any
warranty.
.PP
SPDX-License-Identifier: FSFAP
SPDX\-License\-Identifier: FSFAP
6 changes: 3 additions & 3 deletions t/data/man/encoding.utf8
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ Combining accent: naïve
.PP
SMP plane character: 😀
.PP
Non-breaking space: foo\ bar, foo\ bar
Non\-breaking space: foo\ bar, foo\ bar
.PP
Soft hyphen: fac\%tory
.SH LICENSE
Expand All @@ -82,7 +82,7 @@ Copyright 2022 Russ Allbery <rra@cpan.org>
.PP
Copying and distribution of this file, with or without modification, are
permitted in any medium without royalty provided the copyright notice and
this notice are preserved. This file is offered as-is, without any
this notice are preserved. This file is offered as\-is, without any
warranty.
.PP
SPDX-License-Identifier: FSFAP
SPDX\-License\-Identifier: FSFAP
2 changes: 1 addition & 1 deletion t/data/snippets/man/fixed-font-in-item
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ didn't properly stop italic.
=back

[output]
.SH "Fixed-width Fonts in =item"
.SH "Fixed\-width Fonts in =item"
.IX Header "Fixed-width Fonts in =item"
In podlators 4.06 and earlier, italic was terminated with \ef(CW, which
didn't properly stop italic.
Expand Down
4 changes: 2 additions & 2 deletions t/data/snippets/man/guesswork
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Non-quoting guesswork applied by default
[input]
=head1 GUESSWORK

The hyphens-in-compound-words shouldn't be escaped, but e-mail should be.
Both hyphens-in-compound-words and e-mail should be escaped.

Function: foo(), bar::baz(), _private::_stuff()

Expand All @@ -15,7 +15,7 @@ Variables: $foo, @bar::baz, %Pod::Blah
[output]
.SH GUESSWORK
.IX Header "GUESSWORK"
The hyphens-in-compound-words shouldn't be escaped, but e\-mail should be.
Both hyphens\-in\-compound\-words and e\-mail should be escaped.
.PP
Function: \fBfoo()\fR, \fBbar::baz()\fR, \fB_private::_stuff()\fR
.PP
Expand Down
4 changes: 2 additions & 2 deletions t/data/snippets/man/guesswork-all
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ guesswork all
[input]
=head1 GUESSWORK

The hyphens-in-compound-words shouldn't be escaped, but e-mail should be.
Both hyphens-in-compound-words and e-mail should be escaped.

Function: foo(), bar::baz(), _private::_stuff()

Expand All @@ -18,7 +18,7 @@ Variables: $foo, @bar::baz, %Pod::Blah
[output]
.SH GUESSWORK
.IX Header "GUESSWORK"
The hyphens-in-compound-words shouldn't be escaped, but e\-mail should be.
Both hyphens\-in\-compound\-words and e\-mail should be escaped.
.PP
Function: \fBfoo()\fR, \fBbar::baz()\fR, \fB_private::_stuff()\fR
.PP
Expand Down
4 changes: 2 additions & 2 deletions t/data/snippets/man/guesswork-none
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ guesswork none
[input]
=head1 GUESSWORK

The hyphens-in-compound-words shouldn't be escaped, but e-mail should be.
Both hyphens-in-compound-words and e-mail should be escaped.

Function: foo(), bar::baz(), _private::_stuff()

Expand All @@ -18,7 +18,7 @@ Variables: $foo, @bar::baz, %Pod::Blah
[output]
.SH GUESSWORK
.IX Header "GUESSWORK"
The hyphens-in-compound-words shouldn't be escaped, but e\-mail should be.
Both hyphens\-in\-compound\-words and e\-mail should be escaped.
.PP
Function: foo(), bar::baz(), _private::_stuff()
.PP
Expand Down
4 changes: 2 additions & 2 deletions t/data/snippets/man/guesswork-partial
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ guesswork functions,variables
[input]
=head1 GUESSWORK

The hyphens-in-compound-words shouldn't be escaped, but e-mail should be.
Both hyphens-in-compound-words and e-mail should be escaped.

Function: foo(), bar::baz(), _private::_stuff()

Expand All @@ -18,7 +18,7 @@ Variables: $foo, @bar::baz, %Pod::Blah
[output]
.SH GUESSWORK
.IX Header "GUESSWORK"
The hyphens-in-compound-words shouldn't be escaped, but e\-mail should be.
Both hyphens\-in\-compound\-words and e\-mail should be escaped.
.PP
Function: \fBfoo()\fR, \fBbar::baz()\fR, \fB_private::_stuff()\fR
.PP
Expand Down
14 changes: 0 additions & 14 deletions t/data/snippets/man/hyphen-in-s

This file was deleted.

2 changes: 1 addition & 1 deletion t/data/snippets/man/nonbreaking-space-l
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ S<L<RFC4034|https://tools.ietf.org/html/rfc4034>>
.SH URLS
.IX Header "URLS"
S<> wrapping L<> should make the space between the anchor and URL
non-breaking and thus keep them together.
non\-breaking and thus keep them together.
.PP
perl Net::DNS Net::DNS::RR Net::DNS::SEC
RFC2535\ <https://tools.ietf.org/html/rfc2535>
Expand Down
2 changes: 1 addition & 1 deletion t/data/snippets/man/utf8-nonbreaking
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,4 @@ This is S<non-breaking output>.
[output]
.SH "S<> output with UTF\-8"
.IX Header "S<> output with UTF-8"
This is non-breaking\ output.
This is non\-breaking\ output.
2 changes: 1 addition & 1 deletion t/docs/spdx-license.t
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ sub check_file {
if ($line =~ m{ \b See \s+ LICENSE \s+ for \s+ licensing }xms) {
$saw_legacy_notice = 1;
}
if ($line =~ m{ \b SPDX-License-Identifier: \s+ \S+ }xms) {
if ($line =~ m{ \b SPDX\\?-License\\?-Identifier: \s+ \S+ }xms) {
$saw_spdx = 1;
last;
}
Expand Down
19 changes: 9 additions & 10 deletions t/man/snippets.t
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
#
# Test Pod::Man behavior with various snippets.
#
# Copyright 2002, 2004, 2006, 2008-2009, 2012-2013, 2015-2016, 2018-2020, 2022
# Russ Allbery <rra@cpan.org>
# Copyright 2002, 2004, 2006, 2008-2009, 2012-2013, 2015-2016, 2018-2020,
# 2022-2023 Russ Allbery <rra@cpan.org>
#
# This program is free software; you may redistribute it and/or modify it
# under the same terms as Perl itself.
Expand All @@ -16,7 +16,7 @@ use warnings;

use lib 't/lib';

use Test::More tests => 113;
use Test::More tests => 111;
use Test::Podlators qw(test_snippet);

# Load the module.
Expand All @@ -30,13 +30,12 @@ my @snippets = qw(
c-in-name dollar-magic error-die error-none error-normal error-pod
error-stderr error-stderr-opt eth fixed-font fixed-font-in-item for-blocks
guesswork guesswork-all guesswork-no-quoting guesswork-none
guesswork-partial guesswork-quoting hyphen-in-s item-fonts language
link-quoting link-to-url long-quote lquote-and-quote lquote-rquote
markup-in-name multiline-x naive naive-groff name-guesswork name-quotes
name-quotes-none nested-lists newlines-in-c non-ascii nonbreaking-space-l
not-bullet not-numbers nourls periods quote-escaping rquote-none
soft-hyphens trailing-space true-false x-whitespace x-whitespace-entry
zero-width-space
guesswork-partial guesswork-quoting item-fonts language link-quoting
link-to-url long-quote lquote-and-quote lquote-rquote markup-in-name
multiline-x naive naive-groff name-guesswork name-quotes name-quotes-none
nested-lists newlines-in-c non-ascii nonbreaking-space-l not-bullet
not-numbers nourls periods quote-escaping rquote-none soft-hyphens
trailing-space true-false x-whitespace x-whitespace-entry zero-width-space
);

# Run all the tests.
Expand Down

0 comments on commit 16bd347

Please sign in to comment.