Added initial stubs for unicode case folding support #5822

Kimundi · 2013-04-10T16:01:28Z

No description provided.

graydon · 2013-04-10T19:08:02Z

It's not clear to me that we want to pull case folding (or "more unicode logic in general") into native rust code, vs. absorbing a dependency on an external unicode library. Every time this comes up I mention http://www.gnu.org/software/libunistring/ as a possible contender, but I balk on trying to figure out the implications of depending on LGPL code. I should probably talk to a licensing-and-law expert to clarify.

Kimundi · 2013-04-10T19:37:26Z

Yeah, would be nice if a good implementation could get integrated.

kud1ing · 2013-04-11T08:53:09Z

GHC have replaced LGPLed libgmp: http://hackage.haskell.org/trac/ghc/wiki/ReplacingGMPNotes

"Even though [GHC] is essentially a "free software" license (BSD3), according to paragraph 2 of the LGPL, GHC must be distributed under the terms of the LGPL! "

Currently it's also an issue for LGPLed Qt on iOS: http://qt-project.org/wiki/Licensing-talk-about-mobile-platforms

kud1ing · 2013-04-11T09:03:18Z

I don't know whether D's Unicode module is correct or complete, but they do not seem to depend on anything thirdparty: https://github.com/D-Programming-Language/phobos/blob/master/std/uni.d
The code is probably generated.

Maybe we should go on an extending unicode.rs with more generated code?
I consider out-of-the-box Unicode support important, even for a systems programming language.

graydon · 2013-04-13T21:01:09Z

I agree. I have some fixes to the unicode script in a wip branch of my own (mostly moving from switch statements to static tables now that we support those) but it doesn't cover the "slightly more than basic" stuff like normalization forms, collation and case folding. Also locales and message formatting.

We discussed a bit more at the end of the workweek and concluded ... in an unsatisfying place: that libICU is the only non-LGPL thing we can use as a project dependency, but that it's too big and clunky to make a dependency of libcore, and should be an external package managed by rustpkg when it's in shape (possibly one we consider part of the canonical / standard package set that mozilla supports).

This is roughly where we started from -- not able to commit to wanting "full" unicode (libICU is over 20mb of binary footprint) so thinking in terms of libcore containing a "partial" unicode library, like the one it currently has (possibly plus some more functions). Where to draw the line remains ... confusing to me, to say the least. There are a lot of parts to unicode.

Where would you draw the line?

Kimundi · 2013-04-13T21:17:29Z

Those are hard questions, and I don't know how to solve them, but I know that I'd like to know what happens to this pull request ;) Should I remove the parts that give possible unicode support, making it ascii only, or?...

thestinger · 2013-04-13T22:37:46Z

@Kimundi: I don't think they should be implemented on char if they don't have Unicode support, because it represents a ucs4 code point.

The libc functions like toupper aren't actually ascii, and they have half-working locale support depending on the platform. There are actual ascii versions included in glib because the implementation-defined libc ones just aren't reliable.

Kimundi · 2013-04-15T20:39:50Z

Would it be okay to remove the locale stuff, and add '_ascii' suffixes to the functions to signify that they only do their work on [A-Za-z] ? I'd imagine that they'd mostly just used for ascii values anyway, and this way the functions don't promise more then the can do.

thestinger · 2013-04-16T00:50:08Z

They're not going to do the expected thing all the time though, for example if the locale is Turkish.

Kimundi · 2013-04-16T16:14:17Z

@thestinger : I'm talking about explicit 'map [A-Z] <-> [a-z] functions that don't use libc for that.

Kimundi · 2013-04-17T20:01:17Z

Closing this for now, unicode issues are still undecided, and ascii values should be done differently.

This includes a workaround of the issue rust-lang#5822, the cause of this little mistake.

Downgrade option_if_let_else to nursery I believe that this lint's loose understanding of ownership (rust-lang#5822, rust-lang#6737) makes it unsuitable to be enabled by default in its current state, even as a pedantic lint. Additionally the lint has known problems with type inference (rust-lang#6137), though I may be willing to consider this a non-blocker in isolation if it weren't for the ownership false positives. A fourth false positive involving const fn: rust-lang#7567. But on top of these, for me the biggest issue is I basically fully agree with rust-lang/rust-clippy#6137 (comment). In my experience this lint universally makes code worse even when the resulting code does compile. --- changelog: remove [`option_if_let_else`] from default set of enabled lints

Fix `option_if_let_else` fixes: rust-lang#5822 fixes: rust-lang#6737 fixes: rust-lang#7567 The inference from rust-lang#6137 still exists so I'm not sure if this should be moved from the nursery. Before doing that though I'd almost want to see this split into two lints. One suggesting `map_or` and the other suggesting `map_or_else`. `map_or_else` tends to have longer expressions for both branches so it doesn't end up much shorter than a match expression in practice. It also seems most people find it harder to read. `map_or` at least has the terseness benefit of being on one line most of the time, especially when the `None` branch is just a literal or path expression. changelog: `break` and `continue` statments local to the would-be closure are allowed in `option_if_let_else` changelog: don't lint in const contexts in `option_if_let_else` changelog: don't lint when yield expressions are used in `option_if_let_else` changelog: don't lint when the captures made by the would-be closure conflict with the other branch in `option_if_let_else` changelog: don't lint when a field of a local is used when the type could be pontentially moved from in `option_if_let_else` changelog: in some cases, don't lint when scrutinee expression conflicts with the captures of the would-be closure in `option_if_let_else`

Added initial stubs for unicode case folding support

e5e6345

kud1ing mentioned this pull request Apr 11, 2013

"char": make "to_lower"/"to_upper" work for non-ASCII characters #1347

Closed

Kimundi closed this Apr 17, 2013

Kimundi mentioned this pull request Aug 2, 2013

Add to_ascii_upper, to_ascii_lower and eq_ignore_ascii_case in std::ascii #8231

Merged

flip1995 pushed a commit to flip1995/rust that referenced this pull request Aug 28, 2020

Fix the wrong use of snippet_with_applicability

edc05da

This includes a workaround of the issue rust-lang#5822, the cause of this little mistake.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added initial stubs for unicode case folding support #5822

Added initial stubs for unicode case folding support #5822

Kimundi commented Apr 10, 2013

graydon commented Apr 10, 2013

Kimundi commented Apr 10, 2013

kud1ing commented Apr 11, 2013

kud1ing commented Apr 11, 2013

graydon commented Apr 13, 2013

Kimundi commented Apr 13, 2013

thestinger commented Apr 13, 2013

Kimundi commented Apr 15, 2013

thestinger commented Apr 16, 2013

Kimundi commented Apr 16, 2013

Kimundi commented Apr 17, 2013

Added initial stubs for unicode case folding support #5822

Added initial stubs for unicode case folding support #5822

Conversation

Kimundi commented Apr 10, 2013

graydon commented Apr 10, 2013

Kimundi commented Apr 10, 2013

kud1ing commented Apr 11, 2013

kud1ing commented Apr 11, 2013

graydon commented Apr 13, 2013

Kimundi commented Apr 13, 2013

thestinger commented Apr 13, 2013

Kimundi commented Apr 15, 2013

thestinger commented Apr 16, 2013

Kimundi commented Apr 16, 2013

Kimundi commented Apr 17, 2013