Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number to/from string API #6220

Closed
huonw opened this issue May 3, 2013 · 10 comments
Closed

Number to/from string API #6220

huonw opened this issue May 3, 2013 · 10 comments

Comments

@huonw
Copy link
Member

huonw commented May 3, 2013

This is an umbrella issue following on from/working with #4819. The number <-> string conversion needs work and thought.

For example, the current to_str version allocates, and the current from_str doesn't work with the compiler optimisations very well: none of the (for example) exponent parsing code is removed when the function is specialised for ints.

Another important thing to consider is how non-primitive numbers (e.g. bigint/rational) are to be supported/helped (including not at all, other than the normal to/from_str traits)

@huonw
Copy link
Member Author

huonw commented May 3, 2013

For comparison, glibc's scanf: floating point, and integers (these are entirely separate).

And printf: general number printing and specifically floating point (it actually uses (stack-allocated) GMP bignums). Notably the general number printing is one huge (500 line) macro that gets called twice.

@Aatch
Copy link
Contributor

Aatch commented May 3, 2013

Also, I have been working on a generic parser for printf-like format strings. The current grammar is this:

 Placeholder := Indicator Position? Flag* Width? Precision? Specifier
 Position := '{' '-'? [0-9]+ '}'
 Width := [1-9] | '[' ('-'? [0-9]+ | '*') ']'
 Precision := '.' ([1-9] | '[' ('-'? [0-9]+ | '*' ) ']')
 Flag := <From supplied set>
 Specifier := <From supplied set>

So thinks are specified roughly like: %{1} [2].9f, which is where the values is in argument 1 and the width is retrieved from argument 2.

@huonw
Copy link
Member Author

huonw commented May 4, 2013

A possibility for code sharing would be having a munch_number<Num: Integer>(&str) -> (Option<Num>, uint) function that parses a number at the start of the string, so from_str would do this then check that the whole string has been parsed.

This would allow float parsing to use the int parsing code to read subsections.

(This function might be actually useful for any from_str value.)

@Kimundi
Copy link
Member

Kimundi commented May 14, 2013

I started on writing specialised replacements for ints, uints and floats. Currently the new (u)int versions use a stack allocated vector and a callback function. For the float versions, I started looking into papers and the links here for an optimal algorithm. I will also look into making the number munching thing work.

@huonw
Copy link
Member Author

huonw commented Jul 21, 2013

brendanzab added a commit to brendanzab/rust that referenced this issue Feb 21, 2014
This works towards a complete rewrite and ultimate removal of the `std::num::strconv` module (see rust-lang#6220), and the removal of the `ToStrRadix` trait in favour of using the `std::fmt` functionality directly. This should make for a cleaner API, encourage less allocation, and make the implementation far more comprehensible.

The `Formatter::pad_integral` method has also been refactored make it easier to understand.

The formatting tests for integers have been moved out of `run-pass/ifmt.rs` in order to provide more immediate feedback when building using `make check-stage2-std NO_REBUILD=1`.

The benchmarks have been standardised between std::num::strconv and std::num::fmt to make it easier to compare the performance of the different implementations.

Arbitrary radixes are now easier to use in format strings. For example:

~~~
assert_eq!(format!("{:04}", radix(3, 2)), ~"0011");
~~~
bors added a commit that referenced this issue Feb 22, 2014
This is PR is the beginning of a complete rewrite and ultimate removal of the `std::num::strconv` module (see #6220), and the removal of the `ToStrRadix` trait in favour of using the `std::fmt` functionality directly. This should make for a cleaner API, encourage less allocation, and make the implementation more comprehensible .

The `Formatter::{pad_integral, with_padding}` methods have also been refactored make things easier to understand.

The formatting tests for integers have been moved out of `run-pass/ifmt.rs` in order to provide more immediate feedback when building using `make check-stage2-std NO_REBUILD=1`.

Arbitrary radixes are now easier to use in format strings. For example:

~~~rust
assert_eq!(format!("{:04}", radix(3, 2)), ~"0011");
~~~

The benchmarks have been standardised between `std::num::strconv` and `std::num::fmt` to make it easier to compare the performance of the different implementations.

~~~
 type | radix | std::num::strconv      | std::num::fmt
======|=======|========================|======================
 int  | bin   | 1748 ns/iter (+/- 150) | 321 ns/iter (+/- 25)
 int  | oct   |  706 ns/iter (+/- 53)  | 179 ns/iter (+/- 22)
 int  | dec   |  640 ns/iter (+/- 59)  | 207 ns/iter (+/- 10)
 int  | hex   |  637 ns/iter (+/- 77)  | 205 ns/iter (+/- 19)
 int  | 36    |  446 ns/iter (+/- 30)  | 309 ns/iter (+/- 20)
------|-------|------------------------|----------------------
 uint | bin   | 1724 ns/iter (+/- 159) | 322 ns/iter (+/- 13)
 uint | oct   |  663 ns/iter (+/- 25)  | 175 ns/iter (+/- 7)
 uint | dec   |  613 ns/iter (+/- 30)  | 186 ns/iter (+/- 6)
 uint | hex   |  519 ns/iter (+/- 44)  | 207 ns/iter (+/- 20)
 uint | 36    |  418 ns/iter (+/- 16)  | 308 ns/iter (+/- 32)
~~~
@nrc
Copy link
Member

nrc commented May 4, 2014

I filed a vaguely related RFC for adding scanf-like functionality to Rust - rust-lang/rfcs#67

@aturon
Copy link
Member

aturon commented Aug 12, 2014

cc me

@gsingh93
Copy link
Contributor

Here's a library that's doing something like this: https://github.com/mahkoh/scan

@aturon
Copy link
Member

aturon commented Feb 16, 2015

@huonw Is there anything actionable here right now, or should this perhaps move to the RFCs repo?

bors added a commit that referenced this issue May 9, 2015
This is a direct port of my prior work on the float formatting. The detailed description is available [here](https://github.com/lifthrasiir/rust-strconv#flt2dec). In brief,

* This adds a new hidden module `core::num::flt2dec` for testing from `libcoretest`. Why is it in `core::num` instead of `core::fmt`? Because I envision that the table used by `flt2dec` is directly applicable to `dec2flt` (cf. #24557) as well, which exceeds the realm of "formatting".
* This contains both Dragon4 algorithm (exact, complete but slow) and Grisu3 algorithm (exact, fast but incomplete).
* The code is accompanied with a large amount of self-tests and some exhaustive tests. In particular, `libcoretest` gets a new dependency on `librand`. For the external interface it relies on the existing test suite.
* It is known that, in the best case, the entire formatting code has about 30 KBs of binary overhead (judged from strconv experiments). Not too bad but there might be a potential room for improvements.

This is rather large code. I did my best to comment and annotate the code, but you have been warned.

For the maximal availability the original code was licensed in CC0, but I've also dual-licensed it in MIT/Apache as well so there should be no licensing concern.

This is [breaking-change] as it changes the float output slightly (and it also affects the casing of `inf` and `nan`). I hope this is not a big deal though :)

Fixes #7030, #18038 and #24556. Also related to #6220 and #20870.

## Known Issues

- [x] I've yet to finish `make check-stage1`. It does pass main test suites including `run-pass` but there might be some unknown edges on the doctests.
- [ ] Figure out how this PR affects rustc.
- [ ] Determine which internal routine is mapped to the formatting specifier. Depending on the decision, some internal routine can be safely removed (for instance, currently `to_shortest_str` is unused).
@huonw
Copy link
Member Author

huonw commented Jan 5, 2016

I think this is basically solved/irrelevant, e.g. #24612 & #30175 makes us handle floats right, and the new formatting infrastructure allows avoiding allocations (i.e. we have more than just the old to_str). I think other aspects are better handled elsewhere (e.g. RFC repo, or an external crate).

@huonw huonw closed this as completed Jan 5, 2016
nivkner added a commit to nivkner/rust that referenced this issue Sep 30, 2017
remove FIXME(rust-lang#13101) since `assert_receiver_is_total_eq` stays.
remove FIXME(rust-lang#19649) now that stability markers render.
remove FIXME(rust-lang#13642) now the benchmarks were moved.
remove FIXME(rust-lang#6220) now that floating points can be formatted.
remove FIXME(rust-lang#18248) and write tests for `Rc<str>` and `Rc<[u8]>`
remove reference to irelevent issues in FIXME(rust-lang#1697, rust-lang#2178...)
update FIXME(rust-lang#5516) to point to getopts issue 7
update FIXME(rust-lang#7771) to point to RFC 628
update FIXME(rust-lang#19839) to point to issue 26925
flip1995 pushed a commit to flip1995/rust that referenced this issue Nov 3, 2020
flip1995 pushed a commit to flip1995/rust that referenced this issue Nov 3, 2020
cargo dev ra-setup: don't inject deps multiple times if we have already done so

Fixes rust-lang#6220

changelog: none
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants