Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement uc and lc in C for UTF-8 strings #99

Closed
StefanKarpinski opened this issue Jul 8, 2011 · 6 comments
Closed

implement uc and lc in C for UTF-8 strings #99

StefanKarpinski opened this issue Jul 8, 2011 · 6 comments
Assignees
Labels
performance Must go faster

Comments

@StefanKarpinski
Copy link
Member

A single function may be able to do this depending on the details of UTF-8 encoding.

@ghost ghost assigned StefanKarpinski Jul 8, 2011
@StefanKarpinski
Copy link
Member Author

The best resource I've found for this is: http://developer.gnome.org/glib/2.29/glib-Unicode-Manipulation.html

@JeffBezanson
Copy link
Member

Are these towupper and towlower?

@StefanKarpinski
Copy link
Member Author

For characters, yes, and that's how they're implemented. For ASCII strings, this is pretty easy. For UTF-8 strings, it's complicated because in principle, you have to decode each character, call towupper/towlower on it, and then append it to a new string. But that's a really slow way to do it, when I suspect that UTF-8 might well be designed so that it can be done much faster than that. The glib reference is disheartening though since it says:

The exact manner that this is done depends on the current locale, and may result in the number of characters in the string increasing. (For instance, the German ess-zet will be changed to SS.)

For now, I think I'm just going to keep the TransformedString approach, which is inefficient, but works.

@StefanKarpinski
Copy link
Member Author

The German ß is an excellent example — although not for the reasons glib gives. It shouldn't be capitalized as two letters, but rather as a different unicode character that has one more byte in UTF-8:

julia> length("\u00DF")
2

julia> length("\u1E9E")
3

That makes it really hard to write a fast, single-pass, in-place uppercasing function. We can still get most of the benefit by having something fast for the ASCII string case though.

@StefanKarpinski
Copy link
Member Author

Commit 3f51323 implements fast, copying ucfirst, lcfirst, uc, and lc for ASCIIString objects. UTF8String objects still use the slow TransformedString approach, but for now, that's fine.

@StefanKarpinski
Copy link
Member Author

Closed by 7010db8.

StefanKarpinski pushed a commit that referenced this issue Feb 8, 2018
Add non-float tryparse compat
KristofferC added a commit that referenced this issue Feb 9, 2018
cmcaine pushed a commit to cmcaine/julia that referenced this issue Sep 24, 2020
fredrikekre added a commit that referenced this issue Feb 26, 2021
$ git log --pretty=oneline --abbrev=commit 2b4bed9..6bb8306
6bb83068bd796c4890baaeb39628ff79a4979374 Stop the grace timer iff adding first handle (fix #99) (#102)
af6864d8872247faf2a402d6b2baca5cb74ab96e fix ssh_key_pass bug (fix #91) (#100)
KristofferC pushed a commit that referenced this issue Feb 26, 2021
$ git log --pretty=oneline --abbrev=commit 2b4bed9..6bb8306
6bb83068bd796c4890baaeb39628ff79a4979374 Stop the grace timer iff adding first handle (fix #99) (#102)
af6864d8872247faf2a402d6b2baca5cb74ab96e fix ssh_key_pass bug (fix #91) (#100)
KristofferC pushed a commit that referenced this issue Mar 2, 2021
$ git log --pretty=oneline --abbrev=commit 2b4bed9..6bb8306
6bb83068bd796c4890baaeb39628ff79a4979374 Stop the grace timer iff adding first handle (fix #99) (#102)
af6864d8872247faf2a402d6b2baca5cb74ab96e fix ssh_key_pass bug (fix #91) (#100)

(cherry picked from commit fb500b0)
ElOceanografo pushed a commit to ElOceanografo/julia that referenced this issue May 4, 2021
…uliaLang#39833)

$ git log --pretty=oneline --abbrev=commit 2b4bed9..6bb8306
6bb83068bd796c4890baaeb39628ff79a4979374 Stop the grace timer iff adding first handle (fix JuliaLang#99) (JuliaLang#102)
af6864d8872247faf2a402d6b2baca5cb74ab96e fix ssh_key_pass bug (fix JuliaLang#91) (JuliaLang#100)
antoine-levitt pushed a commit to antoine-levitt/julia that referenced this issue May 9, 2021
…uliaLang#39833)

$ git log --pretty=oneline --abbrev=commit 2b4bed9..6bb8306
6bb83068bd796c4890baaeb39628ff79a4979374 Stop the grace timer iff adding first handle (fix JuliaLang#99) (JuliaLang#102)
af6864d8872247faf2a402d6b2baca5cb74ab96e fix ssh_key_pass bug (fix JuliaLang#91) (JuliaLang#100)
staticfloat pushed a commit that referenced this issue Dec 23, 2022
$ git log --pretty=oneline --abbrev=commit 2b4bed9..6bb8306
6bb83068bd796c4890baaeb39628ff79a4979374 Stop the grace timer iff adding first handle (fix #99) (#102)
af6864d8872247faf2a402d6b2baca5cb74ab96e fix ssh_key_pass bug (fix #91) (#100)

(cherry picked from commit fb500b0)
vchuravy pushed a commit to JuliaPackaging/LazyArtifacts.jl that referenced this issue Oct 2, 2023
inkydragon pushed a commit that referenced this issue Dec 15, 2024
Stdlib: SHA
URL: https://github.com/JuliaCrypto/SHA.jl.git
Stdlib branch: master
Julia branch: master
Old commit: aaf2df6
New commit: 8fa221d
Julia version: 1.12.0-DEV
SHA version: 0.7.0(Does not match)
Bump invoked by: @inkydragon
Powered by:
[BumpStdlibs.jl](https://github.com/JuliaLang/BumpStdlibs.jl)

Diff:
JuliaCrypto/SHA.jl@aaf2df6...8fa221d

```
$ git log --oneline aaf2df6..8fa221d
8fa221d ci: update doctest config (#120)
346b359 ci: Update ci config (#115)
aba9014 Fix type mismatch for `shake/digest!` and setup x86 ci (#117)
0b76d04 Merge pull request #114 from JuliaCrypto/dependabot/github_actions/codecov/codecov-action-5
5094d9d Update .github/workflows/CI.yml
45596b1 Bump codecov/codecov-action from 4 to 5
230ab51 test: remove outdate tests (#113)
7f25aa8 rm: Duplicated const alias (#111)
aa72f73 [SHA3] Fix padding special-case (#108)
3a01401 Delete Manifest.toml (#109)
da351bb Remvoe all getproperty funcs (#99)
4eee84f Bump codecov/codecov-action from 3 to 4 (#104)
15f7dbc Bump codecov/codecov-action from 1 to 3 (#102)
860e6b9 Bump actions/checkout from 2 to 4 (#103)
8e5f0ea Add dependabot to auto update github actions (#100)
4ab324c Merge pull request #98 from fork4jl/sha512-t
a658829 SHA-512: add ref to NIST standard
11a4c73 Apply suggestions from code review
969f867 Merge pull request #97 from fingolfin/mh/Vector
b1401fb SHA-512: add NIST test
4d7091b SHA-512: add to docs
09fef9a SHA-512: test SHA-512/224, SHA-512/256
7201b74 SHA-512: impl SHA-512/224, SHA-512/256
4ab85ad Array -> Vector
8ef91b6 fixed bug in padding for shake, addes testcases for full code coverage (#95)
88e1c83 Remove non-existent property (#75)
068f85d shake128,shake256: fixed typo in export declarations (#93)
176baaa SHA3 xof shake128 and shake256  (#92)
e1af7dd Hardcode doc edit backlink
```

Co-authored-by: Dilum Aluthge <dilum@aluthge.com>
DilumAluthge added a commit that referenced this issue Dec 17, 2024
Stdlib: Distributed
URL: https://github.com/JuliaLang/Distributed.jl
Stdlib branch: master
Julia branch: master
Old commit: 6c7cdb5
New commit: c613685
Julia version: 1.12.0-DEV
Distributed version: 1.11.0(Does not match)
Bump invoked by: @DilumAluthge
Powered by:
[BumpStdlibs.jl](https://github.com/JuliaLang/BumpStdlibs.jl)

Diff:
JuliaLang/Distributed.jl@6c7cdb5...c613685

```
$ git log --oneline 6c7cdb5..c613685
c613685 Merge pull request #116 from JuliaLang/ci-caching
20e2ce7 Use julia-actions/cache in CI
9c5d73a Merge pull request #112 from JuliaLang/dependabot/github_actions/codecov/codecov-action-5
ed12496 Merge pull request #107 from JamesWrigley/remotechannel-empty
010828a Update .github/workflows/ci.yml
11451a8 Bump codecov/codecov-action from 4 to 5
8b5983b Merge branch 'master' into remotechannel-empty
729ba6a Fix docstring of `@everywhere` (#110)
af89e6c Adding better docs to exeflags kwarg (#108)
8537424 Implement Base.isempty(::RemoteChannel)
6a0383b Add a wait(::[Abstract]WorkerPool) (#106)
1cd2677 Bump codecov/codecov-action from 1 to 4 (#96)
cde4078 Bump actions/cache from 1 to 4 (#98)
6c8245a Bump julia-actions/setup-julia from 1 to 2 (#97)
1ffaac8 Bump actions/checkout from 2 to 4 (#99)
8e3f849 Fix RemoteChannel iterator interface (#100)
f4aaf1b Fix markdown errors in README.md (#95)
2017da9 Merge pull request #103 from JuliaLang/sf/sigquit_instead
07389dd Use `SIGQUIT` instead of `SIGTERM`
```

Co-authored-by: Dilum Aluthge <dilum@aluthge.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

2 participants