erroneous use of charwidth in rpad? #10825

jiahao · 2015-04-15T04:58:08Z

julia> a = rpad("\u2003", 5)
"    "

julia> length(a) #4 in 0.4-dev, 5 in 0.3.7
4

julia> map(x->convert(Uint32, x), collect(a)) #0.4-dev has one fewer space
4-element Array{UInt32,1}:
 0x00002003
 0x00000020
 0x00000020
 0x00000020

help?> rpad
search: rpad repeated macroexpand isdirpath tryparse normpath repmat repeat replace workspace realpath redisplay AbstractSparseArray

Base.rpad(string, n, p)

   Make a string at least "n" characters long by padding on the
   right with copies of "p".

help?> charwidth
search: charwidth

Base.charwidth(c)

   Gives the number of columns needed to print a character.

StefanKarpinski · 2015-04-15T14:00:28Z

I guess the issue here is whether rpad is meant for producing a specific number of characters or a specific number of columns. Both things are useful.

stevengj · 2015-04-15T14:22:10Z

@jiahao, what was the issue with width-2 characters in 4af443f? Conversely, what would you want to do with zero-width combining characters — should rpad("x̂", 2) return "x̂ " (the current behavior) or "x̂" (if it were producing a specific number of codepoints)?

Or a third option: maybe rpad should produce a specific number of graphemes? This would treat width-1 and width-2 graphemes identically, but would still give "x̂ " from rpad("x̂", 2).

jiahao · 2015-04-15T16:05:45Z

I was using rpad to generate a ReST simple table, where the columns had to be the exact same length. Sphinx considered each doublewidth character as fitting in 1 column and complained that the text was no longer correctly aligned. I think it does boil down to what the second argument means - I don't think "character" is well defined anymore as it is currently used in the docs.

stevengj · 2015-04-19T12:29:30Z

@jiahao, what does Sphinx consider to be a "character"? Any codepoint? Nonzero-width codepoints? Graphemes?

jiahao · 2015-04-19T17:30:45Z

I think Sphinx uses Python's definition of 1 char = 1 code point.

>>> len("e\u0302") #Python 3.4.3
2

>>> len(u"e\u0302") #Python 2.7.6
2

stevengj · 2015-04-19T19:27:13Z

In this case, you need a rewrite of rpad based on length(s) for your use-case. But I'm skeptical that this should be the default.

jiahao · 2015-04-20T02:52:22Z

Yes, unfortunately it looks like uses of rpad have to be changed based on whether the destination string is intended be displayed to the user or if the string is intended to be fed into an external program. Which is too bad, because the inconsistent handling of spacing by downstream programs is guaranteed to cause more headaches.

stevengj · 2015-04-20T04:12:32Z

(Not to mention the inconsistent handling of strings by displays. c.f. #3721)

elextr · 2015-04-20T04:40:19Z

I really needs two rpads, one for code points and one for graphemes, as @StefanKarpinski said, both are useful.

Or a keyword parameter rpad("xxx", 5, to=:code_point). That only leaves which is default as a discussion of the hue of the bipedal transport garaging :)

JeffBezanson · 2015-04-20T05:26:11Z

Padding based on columns feels like the most sensible behavior to me; the help text should be updated.

I wonder if lpad and rpad should be refactored. They have duplicated code, and it would get worse if we added options. These operations could be written something like s * padding(s, pad, columns=10) or padding(s, pad, codepoints=5) * s etc.

be5invis · 2016-12-02T13:25:23Z

Deciding the real spaced needed for layouting (in console or something else, whatever) is extremely hard. for example, some CJK fonts may make α full-width.

JeffBezanson · 2017-01-05T18:20:37Z

The help text for this has already been updated to say that padding is based on columns. There are other kinds of padding you might want, but the column behavior is useful too.

tkelman · 2017-01-05T19:52:21Z

x-ref f65befe

jiahao added a commit that referenced this issue Apr 15, 2015

Hack doc/tabcomplete.jl to work around #10825

4af443f

jiahao added a commit that referenced this issue Apr 15, 2015

Unicode input doc: fix spacing issue caused by #10825

705222a

stevengj added the unicode Related to unicode characters and encodings label Apr 15, 2015

StefanKarpinski added this to the 0.6.0 milestone Sep 14, 2016

JeffBezanson added the docs This change adds or pertains to documentation label Jan 5, 2017

JeffBezanson closed this as completed Jan 5, 2017

StefanKarpinski mentioned this issue Dec 10, 2017

lpad, rpad use textwidth/char count incoherently #25016

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

erroneous use of charwidth in rpad? #10825

erroneous use of charwidth in rpad? #10825

jiahao commented Apr 15, 2015

StefanKarpinski commented Apr 15, 2015

stevengj commented Apr 15, 2015

jiahao commented Apr 15, 2015

stevengj commented Apr 19, 2015

jiahao commented Apr 19, 2015

stevengj commented Apr 19, 2015

jiahao commented Apr 20, 2015

stevengj commented Apr 20, 2015

elextr commented Apr 20, 2015

JeffBezanson commented Apr 20, 2015

be5invis commented Dec 2, 2016

JeffBezanson commented Jan 5, 2017

tkelman commented Jan 5, 2017

erroneous use of charwidth in rpad? #10825

erroneous use of charwidth in rpad? #10825

Comments

jiahao commented Apr 15, 2015

StefanKarpinski commented Apr 15, 2015

stevengj commented Apr 15, 2015

jiahao commented Apr 15, 2015

stevengj commented Apr 19, 2015

jiahao commented Apr 19, 2015

stevengj commented Apr 19, 2015

jiahao commented Apr 20, 2015

stevengj commented Apr 20, 2015

elextr commented Apr 20, 2015

JeffBezanson commented Apr 20, 2015

be5invis commented Dec 2, 2016

JeffBezanson commented Jan 5, 2017

tkelman commented Jan 5, 2017