-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Number to/from string API #6220
Comments
For comparison, glibc's scanf: floating point, and integers (these are entirely separate). And printf: general number printing and specifically floating point (it actually uses (stack-allocated) GMP bignums). Notably the general number printing is one huge (500 line) macro that gets called twice. |
Also, I have been working on a generic parser for printf-like format strings. The current grammar is this:
So thinks are specified roughly like: |
A possibility for code sharing would be having a This would allow float parsing to use the int parsing code to read subsections. (This function might be actually useful for any from_str value.) |
I started on writing specialised replacements for ints, uints and floats. Currently the new (u)int versions use a stack allocated vector and a callback function. For the float versions, I started looking into papers and the links here for an optimal algorithm. I will also look into making the number munching thing work. |
Two papers about printing floats I saw on Reddit recently: |
This works towards a complete rewrite and ultimate removal of the `std::num::strconv` module (see rust-lang#6220), and the removal of the `ToStrRadix` trait in favour of using the `std::fmt` functionality directly. This should make for a cleaner API, encourage less allocation, and make the implementation far more comprehensible. The `Formatter::pad_integral` method has also been refactored make it easier to understand. The formatting tests for integers have been moved out of `run-pass/ifmt.rs` in order to provide more immediate feedback when building using `make check-stage2-std NO_REBUILD=1`. The benchmarks have been standardised between std::num::strconv and std::num::fmt to make it easier to compare the performance of the different implementations. Arbitrary radixes are now easier to use in format strings. For example: ~~~ assert_eq!(format!("{:04}", radix(3, 2)), ~"0011"); ~~~
This is PR is the beginning of a complete rewrite and ultimate removal of the `std::num::strconv` module (see #6220), and the removal of the `ToStrRadix` trait in favour of using the `std::fmt` functionality directly. This should make for a cleaner API, encourage less allocation, and make the implementation more comprehensible . The `Formatter::{pad_integral, with_padding}` methods have also been refactored make things easier to understand. The formatting tests for integers have been moved out of `run-pass/ifmt.rs` in order to provide more immediate feedback when building using `make check-stage2-std NO_REBUILD=1`. Arbitrary radixes are now easier to use in format strings. For example: ~~~rust assert_eq!(format!("{:04}", radix(3, 2)), ~"0011"); ~~~ The benchmarks have been standardised between `std::num::strconv` and `std::num::fmt` to make it easier to compare the performance of the different implementations. ~~~ type | radix | std::num::strconv | std::num::fmt ======|=======|========================|====================== int | bin | 1748 ns/iter (+/- 150) | 321 ns/iter (+/- 25) int | oct | 706 ns/iter (+/- 53) | 179 ns/iter (+/- 22) int | dec | 640 ns/iter (+/- 59) | 207 ns/iter (+/- 10) int | hex | 637 ns/iter (+/- 77) | 205 ns/iter (+/- 19) int | 36 | 446 ns/iter (+/- 30) | 309 ns/iter (+/- 20) ------|-------|------------------------|---------------------- uint | bin | 1724 ns/iter (+/- 159) | 322 ns/iter (+/- 13) uint | oct | 663 ns/iter (+/- 25) | 175 ns/iter (+/- 7) uint | dec | 613 ns/iter (+/- 30) | 186 ns/iter (+/- 6) uint | hex | 519 ns/iter (+/- 44) | 207 ns/iter (+/- 20) uint | 36 | 418 ns/iter (+/- 16) | 308 ns/iter (+/- 32) ~~~
I filed a vaguely related RFC for adding scanf-like functionality to Rust - rust-lang/rfcs#67 |
cc me |
Here's a library that's doing something like this: https://github.com/mahkoh/scan |
@huonw Is there anything actionable here right now, or should this perhaps move to the RFCs repo? |
This is a direct port of my prior work on the float formatting. The detailed description is available [here](https://github.com/lifthrasiir/rust-strconv#flt2dec). In brief, * This adds a new hidden module `core::num::flt2dec` for testing from `libcoretest`. Why is it in `core::num` instead of `core::fmt`? Because I envision that the table used by `flt2dec` is directly applicable to `dec2flt` (cf. #24557) as well, which exceeds the realm of "formatting". * This contains both Dragon4 algorithm (exact, complete but slow) and Grisu3 algorithm (exact, fast but incomplete). * The code is accompanied with a large amount of self-tests and some exhaustive tests. In particular, `libcoretest` gets a new dependency on `librand`. For the external interface it relies on the existing test suite. * It is known that, in the best case, the entire formatting code has about 30 KBs of binary overhead (judged from strconv experiments). Not too bad but there might be a potential room for improvements. This is rather large code. I did my best to comment and annotate the code, but you have been warned. For the maximal availability the original code was licensed in CC0, but I've also dual-licensed it in MIT/Apache as well so there should be no licensing concern. This is [breaking-change] as it changes the float output slightly (and it also affects the casing of `inf` and `nan`). I hope this is not a big deal though :) Fixes #7030, #18038 and #24556. Also related to #6220 and #20870. ## Known Issues - [x] I've yet to finish `make check-stage1`. It does pass main test suites including `run-pass` but there might be some unknown edges on the doctests. - [ ] Figure out how this PR affects rustc. - [ ] Determine which internal routine is mapped to the formatting specifier. Depending on the decision, some internal routine can be safely removed (for instance, currently `to_shortest_str` is unused).
remove FIXME(rust-lang#13101) since `assert_receiver_is_total_eq` stays. remove FIXME(rust-lang#19649) now that stability markers render. remove FIXME(rust-lang#13642) now the benchmarks were moved. remove FIXME(rust-lang#6220) now that floating points can be formatted. remove FIXME(rust-lang#18248) and write tests for `Rc<str>` and `Rc<[u8]>` remove reference to irelevent issues in FIXME(rust-lang#1697, rust-lang#2178...) update FIXME(rust-lang#5516) to point to getopts issue 7 update FIXME(rust-lang#7771) to point to RFC 628 update FIXME(rust-lang#19839) to point to issue 26925
cargo dev ra-setup: don't inject deps multiple times if we have already done so Fixes rust-lang#6220 changelog: none
This is an umbrella issue following on from/working with #4819. The number <-> string conversion needs work and thought.
For example, the current to_str version allocates, and the current from_str doesn't work with the compiler optimisations very well: none of the (for example) exponent parsing code is removed when the function is specialised for
int
s.Another important thing to consider is how non-primitive numbers (e.g. bigint/rational) are to be supported/helped (including not at all, other than the normal to/from_str traits)
The text was updated successfully, but these errors were encountered: