Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sage 4.3.1 reference manual: PDF version failed to build due to non-ASCII characters in docstring #8036

Closed
sagetrac-mvngu mannequin opened this issue Jan 22, 2010 · 23 comments

Comments

@sagetrac-mvngu
Copy link
Mannequin

sagetrac-mvngu mannequin commented Jan 22, 2010

Even after applying #8021, the PDF version of the reference manual for Sage 4.3.1 failed to build. This is due to non-ASCII characters in the docstring of the method prove_BSD() of the class EllipticCurve_rational_field in

sage/schemes/elliptic_curves/ell_rational_field.py

Here's a snippet of the error message:

! Package inputenc Error: Unicode char \u8:ǎ not set up for use with LaTeX.

See the inputenc package documentation for explanation.
Type  H <return>  for immediate help.
 ...                                              
                                                  
l.364560 C. Tarniţǎ
                     . Computational verification of the Birch and
?

Component: documentation

Keywords: non-ASCII characters

Author: Mitesh Patel

Reviewer: John Palmieri

Merged: sage-4.3.2.rc0

Issue created by migration from https://trac.sagemath.org/ticket/8036

@sagetrac-mvngu sagetrac-mvngu mannequin added this to the sage-4.3.2 milestone Jan 22, 2010
@sagetrac-mvngu sagetrac-mvngu mannequin self-assigned this Jan 22, 2010
@sagetrac-mvngu
Copy link
Mannequin Author

sagetrac-mvngu mannequin commented Jan 22, 2010

based on Sage 4.3.1

@sagetrac-mvngu
Copy link
Mannequin Author

sagetrac-mvngu mannequin commented Jan 22, 2010

Author: Minh Van Nguyen

@sagetrac-mvngu
Copy link
Mannequin Author

sagetrac-mvngu mannequin commented Jan 22, 2010

comment:1

Attachment: trac_8036-non-ascii.patch.gz

@tornaria
Copy link
Contributor

comment:2

LaTeX is perfectly fine with utf8 if one uses the inputenc package:

\usepackage[utf8x]{inputenc}

IOW, it's the latex preamble which needs fixing.

@tornaria
Copy link
Contributor

Attachment: utf8.tex.gz

Latex file which shows usage of utf8

@jhpalmieri
Copy link
Member

comment:3

Sphinx uses \usepackage[utf8]{inputenc}, so if we want to change this to [utf8x], we need to patch Sphinx. I have no experience with [utf8] or [utf8x], but the documentation for inputenc frowns on utf8x, to some extent. Another option is to add characters one by one, as needed, using

\DeclareUnicodeCharacter{blah}{blah}

(See the documentation for inputenc.) If we knew the details, we could add lines like this to SAGE_ROOT/devel/sage/doc/common/conf.py -- add to the latex_preamble. I don't know the details.

A third option is to get rid of all accents, as mvngu's patch does.

A fourth option is to use the attached patch trac_8036-tex-replacements.patch, which does some preprocessing, changing the offending character to something latex can handle.

I'll mark this as "needs review", in case option 4 is appealing.

@jhpalmieri
Copy link
Member

Changed author from Minh Van Nguyen to Minh Van Nguyen, John Palmieri

@jhpalmieri
Copy link
Member

Attachment: trac_8036-tex-replacements.patch.gz

apply only this patch

@jhpalmieri
Copy link
Member

comment:4

Note: When I preview my attachment, the "offending character" looks like a capital "C" with a cedilla, but don't be deceived: the actual character (when I download the patch and look at it in emacs, for example), is an "a" with a "vee" accent on top -- the last character in "Tarnita".

@jhpalmieri
Copy link
Member

comment:5

Replying to @jhpalmieri:

I have no experience with [utf8] or [utf8x], but the documentation for inputenc frowns on utf8x, to some extent.

In case you're interested in this, the documentation says

For other languages that do not fit well into LaTeX font selection scheme, ... the outlined inputenc approach will not work. If that is the case one can try using Dominique Unruh’s option utf8x for inputenc which has a somewhat different approach and encodes many more UTF-8 characters than the standard utf8 option. However, we recommend to do so only if you really need such alphabets as there are problems with this extended approach which were precisely the reason that we decided to limit the support to what is properly supported within the boundaries of LaTeX’s font selection.

I don't know what the "problems with this extended approach" are.

@tornaria
Copy link
Contributor

comment:6

Replying to @jhpalmieri:

Replying to @jhpalmieri:

I have no experience with [utf8] or [utf8x], but the documentation for inputenc frowns on utf8x, to some extent.

In case you're interested in this, the documentation says

For other languages that do not fit well into LaTeX font selection scheme, ... the outlined inputenc approach will not work. If that is the case one can try using Dominique Unruh’s option utf8x for inputenc which has a somewhat different approach and encodes many more UTF-8 characters than the standard utf8 option. However, we recommend to do so only if you really need such alphabets as there are problems with this extended approach which were precisely the reason that we decided to limit the support to what is properly supported within the boundaries of LaTeX’s font selection.

I don't know what the "problems with this extended approach" are.

I use [utf8x] on a daily basis, without issues. As you quoted above, it is well known that [utf8] supports a reduced set of characters. Not that utf8x supports arbitrary unicode characters, but I think a proper superset of those supported by utf8.

The option [utf8x] is part of latex package "ucs".

Your proposal (according to the posted patch) would be to special-case any characters not supported by [utf8] option? The patch only handles that particular letter.

@jhpalmieri
Copy link
Member

comment:7

Replying to @tornaria:

Your proposal (according to the posted patch) would be to special-case any characters not supported by [utf8] option? The patch only handles that particular letter.

It's either that or patch Sphinx -- not hard, but I'm reluctant to patch external packages if there are other alternatives. I don't know how often we are likely to come across characters not supported by [utf8], so I don't know which option is better.

@rbeezer
Copy link
Mannequin

rbeezer mannequin commented Jan 23, 2010

comment:8

There are three non-ascii characters in this file, which prevent me from building the HTML version of the documentation. The patches here already seem to address the tex processing that builds the PDF.

The patch simply identifies the three characters and replaces them with straight ASCII equivalents. It might be useful for folks trying to build the docs to test their own fixes/changes elsewhere. I'm not trying to weigh-in on the long-run solution to this problem.

@qed777
Copy link
Mannequin

qed777 mannequin commented Jan 31, 2010

comment:9

Attachment: trac_8036-three-non-ascii.patch.gz

#7999 should take care of the HTML reference manual.

@qed777
Copy link
Mannequin

qed777 mannequin commented Jan 31, 2010

comment:10

For now, what if we set:

latex_elements['inputenc'] = '\\usepackage[utf8x]{inputenc}'

in doc/common/conf.py?

@qed777
Copy link
Mannequin

qed777 mannequin commented Jan 31, 2010

Attachment: trac_8036-docbuild_utf8x.patch.gz

Set utf8x in Sphinx option. Solo patch.

@qed777
Copy link
Mannequin

qed777 mannequin commented Jan 31, 2010

comment:11

Replying to @qed777:

For now, what if we set:

I've attached a patch that does this. It appears to solve the problem in this ticket's description.

But it fails to handle the unicode tests we've added to SageNB at #7249.

@jhpalmieri
Copy link
Member

comment:12

I like trac_8036-docbuild_utf8x.patch. I didn't know about the latex_elements customization; very nice.

To the release manager: apply only trac_8036-docbuild_utf8x.patch.

@sagetrac-mvngu
Copy link
Mannequin Author

sagetrac-mvngu mannequin commented Feb 1, 2010

Merged: sage-4.3.2.rc0

@sagetrac-mvngu
Copy link
Mannequin Author

sagetrac-mvngu mannequin commented Feb 1, 2010

Reviewer: John Palmieri

@sagetrac-mvngu
Copy link
Mannequin Author

sagetrac-mvngu mannequin commented Feb 1, 2010

Changed author from Minh Van Nguyen, John Palmieri to Mitesh Patel

@sagetrac-mvngu
Copy link
Mannequin Author

sagetrac-mvngu mannequin commented Feb 1, 2010

comment:13

Merged trac_8036-docbuild_utf8x.patch.

@sagetrac-mvngu sagetrac-mvngu mannequin removed the s: positive review label Feb 1, 2010
@sagetrac-mvngu sagetrac-mvngu mannequin closed this as completed Feb 1, 2010
@sagetrac-mvngu
Copy link
Mannequin Author

sagetrac-mvngu mannequin commented Feb 2, 2010

comment:14

The attachment trac_8036-docbuild_utf8x.patch breaks the build of the French tutorial. See #8146 for a follow-up to this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants