Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User defined integer literals #216

Closed
metagn opened this issue Apr 30, 2020 · 7 comments
Closed

User defined integer literals #216

metagn opened this issue Apr 30, 2020 · 7 comments

Comments

@metagn
Copy link
Contributor

metagn commented Apr 30, 2020

Lexer-level new language feature and very low priority (like Nim 2.0-not this year kind of low), but it comes too natural to not write down.

Abstract

Nim currently accepts the syntax 123'i32 and 123i32 as a syntax-level version of int32(123), mapping to the AST kind nkInt32Lit (I'm going to be giving ints as examples for this RFC but floats also work for all of it). This RFC, on top of this syntax existing, generalizes this and, similar to raw string literals, makes 123'name call a routine name(123) where 123 is of nkIntLit, or, piggybacking off another RFC/feature request that I can't find, some kind of nkRawIntLit (vs nkRawFloatLit)/nkRawNumberLit which contains a string of the integer. 123name might also work and, unless the lexer is smart, if name doesn't start with e or i/d/f. Also name can't be one of i i8 i16 i32 i64 u u8 u16 u32 u64 f d f32 f64

Motivation

Arbitrary/high precision integer/decimal/float libraries will benefit from a general number literal that uses a string (though I think this was part of the other RFC that I'm imagining). Currently what they have to do is something like xl"245" (though none I've seen take advantage of this) vs 245'xl. This also makes sure that the string isn't a deformed number at lexing time.

Also works well for DSLs like 12kg/12'kg, but this is already doable with 12.kg, with the cost of being less readable/harder to parse (. could be a decimal point).

Description

This is essentially the user defined literals from C++, except instead of starting the identifier with an underscore it starts it with a single quote, in consistency with Nim's previous 1'i32 syntax. Nim also already has user defined raw string literals, but as prefixes instead of postfixes, consistent with Python instead of C++.

This getting implemented would make it possible for number literals to have side effects, though this is already possible for raw string calls like foo"abcd".

Another issue is case sensitivity: Currently 1'I32 is the same as 1'i32, but routines in Nim are case sensitive for the first letter. This means the existing number tags (i32 etc) have to get special case insensitive treatment but normal identifiers don't, causing an inconsistency.

Yet another issue is bases; octal, binary, hexadecimal literals. These would be stored verbatim in string form, and any parser that gets called by number literals will have to account for them.

Examples

Before

template big(x: string): BigInt =
  parseBigInt(x)

template big(x: static[string]): static[BigInt] =
  static(parseBigInt(x))

let x = big"19308249384083490283"
echo x + 1

echo (1.days + 2.hours + 30.minutes) - (10.seconds)

After

template big(x: untyped{nkRawIntLit}): static[BigInt] =
  static(parseBigInt(astToStr(x)))

let x = 19308249384083490283'big # ' here so github highlighting doesnt break
echo x + 1

echo 1day + 2h + 30m - 10s
# 1d wont work here since it currently means 1'f64

Backward incompatibility

A lot. A new nkRawIntLit would have to be at the end of the node kind enum to keep binary compatibility, but this would keep it outside of all the other number literal/int literal/literal ranges. Ditto for ranges of what has strVal. This could be fixed by making nkIntLit and similarly nkFloatLit store strings instead of parsed numbers, but this is still binary incompatible.

Syntax highlighting tools would suffer, though it wouldn't be module or line breaking bad like #161, it would be a simple token-local mishap. A lot of highlighting tools don't even recognize 1'i32 though.

As mentioned previously, this would mean number literals can have side effects.

Separately from this post, I think 1.4d syntax should be deprecated, it ruins possibilities for a whole letter and Nim doesn't call them "doubles" except to be compatible with C. Might need another RFC for that though and I don't want to bother

@Clyybber
Copy link

Clyybber commented Apr 30, 2020

@Araq and I discussed this too and came to the same idea as presented in this RFC.
The motivation for me was to have - defined on a raw literal type, so that -128'u compiles, and the lexer can still parse it as -(128'u).

@metagn
Copy link
Contributor Author

metagn commented Apr 30, 2020

That does work though with current Nim.

template `-`(x: SomeUnsignedInt): typeof(x) = x

echo -3'u8

@Clyybber
Copy link

@hlaaftana I'm talking about the fact that 128 is not a uint8 but -128 is.

@timotheecour
Copy link
Member

oh i wasn't aware of this when I wrote #228
let's see what's the best synthethis

@JohnAD
Copy link

JohnAD commented Jan 24, 2021

It is my intent to have PR for this placed before the end of February 2021. Perhaps sooner as I already have it working; I just need to write more test cases to confirm it handles more border conditions.

As to the case-sensitivity, I plan on maintaining that. If you want both a xl suffix and a Xl suffix to do the same thing, the type library author will need to write both versions (both xl and Xl).

My motive is to allow this function to enable the IEEE 745 decimal library (#308) I'm writing to also use this. Specifically "m" will be the suffix (a convention used in C# and a few other languages.) One of my goals to have this fully convert/resolve at compile-time.

var a = 1234.56E7m   
# equivalent to:
var otherA = "1234.56E7".m

The f32, f64, u8, etc suffixes will still be built-in to the lexer and will take priority over any proc/func/template.

I will be also make updates to documentation.

@metagn
Copy link
Contributor Author

metagn commented Feb 13, 2021

Will be solved by nim-lang/Nim#17020, though it looks like we should probably do without optional ' (having to name routines starting with ' for this is better too), at least for any number literal that requires a letter like binary, hex or exponents.

@Araq
Copy link
Member

Araq commented Mar 24, 2021

nim-lang/Nim#17489 will soon be merged. This RFC has been implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants