Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yet another cool checksum address encoding #55

Closed
vbuterin opened this issue Jan 14, 2016 · 77 comments
Closed

Yet another cool checksum address encoding #55

vbuterin opened this issue Jan 14, 2016 · 77 comments

Comments

@vbuterin
Copy link
Contributor

vbuterin commented Jan 14, 2016

EDITOR UPDATE (2017-08-24): This EIP is now located at https://eips.ethereum.org/EIPS/eip-55. Please go there for the correct specification. The text below may be incorrect or outdated, and is not maintained.

Code:

def checksum_encode(addr): # Takes a 20-byte binary address as input
    o = ''
    v = utils.big_endian_to_int(utils.sha3(addr))
    for i, c in enumerate(addr.encode('hex')):
        if c in '0123456789':
            o += c
        else:
            o += c.upper() if (v & (2**(255 - i))) else c.lower()
    return '0x'+o

In English, convert the address to hex, but if the ith digit is a letter (ie. it's one of abcdef) print it in uppercase if the ith bit of the hash of the address (in binary form) is 1 otherwise print it in lowercase.

Benefits:

  • Backwards compatible with many hex parsers that accept mixed case, allowing it to be easily introduced over time
  • Keeps the length at 40 characters
  • The average address will have 60 check bits, and less than 1 in 1 million addresses will have less than 32 check bits; this is stronger performance than nearly all other check schemes. Note that the very tiny chance that a given address will have very few check bits is dwarfed by the chance in any scheme that a bad address will randomly pass a check

UPDATE: I was actually wrong in my math above. I forgot that the check bits are per-hex-character, not per-bit (facepalm). On average there will be 15 check bits per address, and the net probability that a randomly generated address if mistyped will accidentally pass a check is 0.0247%. This is a ~50x improvement over ICAP, but not as good as a 4-byte check code.

Examples:

  • 0xCd2a3d9f938e13Cd947eC05ABC7fe734df8DD826 (the "cow" address)
  • 0x9Ca0e998dF92c5351cEcbBb6Dba82Ac2266f7e0C
  • 0xcB16D0E54450Cdd2368476E762B09D147972b637
@chfast
Copy link
Member

chfast commented Jan 14, 2016

This is very nice idea.

I wander about the 0x prefix. Is it mandatory or preferred?

@vbuterin
Copy link
Contributor Author

Hmm, I'm fine either way, though I definitely see the rationale for standardizing one way or the other.

@Souptacular
Copy link
Contributor

I saw comments on the TurboEthereum guide that suggested that we were moving away from raw hex keys into ICAP keys:

ICAP: XE472EVKU3CGMJF2YQ0J9RO1Y90BC0LDFZ
Raw hex: 0092e965928626f8880629cec353d3fd7ca5974f

"Notice the last two lines there. One is the ICAP address, the other is the raw hexadecimal address. The latter is an older representation of address that you'll sometimes see and is being phased out in favour of the shorter ICAP address which also includes a checksum to avoid problems with mistyping. All normal (aka direct) ICAP addresses begin with XE so you should be able to recognise them easily."

My concern is that if there was a previous decision to start moving to ICAP, I'm not sure if this will add confusion. However, if this helps give raw hex addresses a checksum I guess that can only be beneficial, even if everyone wants to move to ICAP eventually.

@tgerring
Copy link

My preference is that a checksum-enabled Ethereum address is immediately recognizable as such.

The proposed solution is not immediately recognizable as being distinct from a standard Ethereum address and could be confused for being a strangely-cased version of non-checksummed addresses. Although it offers superior backwards compatibility, I believe will only cause additional confusion to the end-user.

Since the change in format serves to make the address less error prone through checksums, I posit they should also be immediately recognizable through a fixed prefix or otherwise obvious identifier. One reason why I prefer ICAP over this proposed solution is that it signals to the user clearly that this is an Ethereum address and cannot be confused with a transaction/block hash.

@alexvandesande
Copy link

Just saw this proposal now.

I disagree @tgerring that it will cause confusion: to a layman, it will be indistinguishable from a normal address. This approach is very easy to implement in the client side and doesnt require much. I would say this could be adopted as a great intermediary before ICAP – also would be a good alternative if ICAPs don't catch on.

@alexvandesande
Copy link

I did a rudimentary implementation on javascript in the web3 object:

var isAddress = function (address) {
    if (!/^(0x)?[0-9a-f]{40}$/i.test(address)) {
        // check if it has the basic requirements of an address
        return false;
    } else if (/^(0x)?[0-9a-f]{40}$/.test(address) || /^(0x)?[0-9A-F]{40}$/.test(address)) {
        // If it's all small caps or all all caps, return true
        return true;
    } else {
        // Otherwise check each case
        address = address.replace('0x','');

        // creates the case map using the binary form of the hash of the address
        var caseMap = parseInt(web3.sha3('0x'+address.toLowerCase()),16).toString(2).substring(0, 40);

        for (var i = 0; i < 40; i++ ) { 
            // the nth letter should be uppercase if the nth digit of casemap is 1
            if ((caseMap[i] == '1' && address[i].toUpperCase() != address[i])|| (caseMap[i] == '0' && address[i].toLowerCase() != address[i])) {
                return false;
            }
        }
        return true;
    }
};


/**
 * Makes a checksum address
 *
 * @method toChecksumAddress
 * @param {String} address the given HEX adress
 * @return {String}
*/
var toChecksumAddress = function (address) {

    var checksumAddress = '0x';
    address = address.toLowerCase().replace('0x','');

    // creates the case map using the binary form of the hash of the address
    var caseMap = parseInt(web3.sha3('0x'+address),16).toString(2).substring(0, 40);

    for (var i = 0; i < address.length; i++ ) {  
        if (caseMap[i] == '1') {
          checksumAddress += address[i].toUpperCase();
        } else {
            checksumAddress += address[i];
        }
    }

    console.log('create: ', address, caseMap, checksumAddress)
    return checksumAddress;
};

It works internally and it's almost invisible to the user. I don't really see a good reason not to implement it.
My results don't match yours, @vbuterin it might be interesting to figure out why. Here are my results omitting the '0x' before hashing the address:

  • 0xCD2a3d9F938e13cd947Ec05AbC7fE734dF8dd826
  • 0x9CA0E998df92C5351CeCBBb6DBa82Ac2266F7E0c
  • 0xCb16d0e54450cDd2368476E762b09D147972B637

And here the results including 0x on the hash of the address:

  • 0xCd2A3D9f938e13CD947ec05abC7Fe734dF8dd826
  • 0x9Ca0e998DF92c5351CECbBb6DbA82ac2266F7e0C
  • 0xCB16D0e54450cDD2368476e762b09d147972B637

@frozeman frozeman added the ERC label Feb 19, 2016
@alexvandesande
Copy link

Just pushed an experimental branch to web3.js and the wallet.

I would love feedback from anyone on those.

@vbuterin
Copy link
Contributor Author

web3.sha3('0x'+address)

You're hashing the hex and not the binary.

@alexvandesande
Copy link

Good catch, I switched to the sha3 of the binary but the results still won't match. I'm a bit confused on what you meant by # Takes a 20-byte binary address as input. Ethereum addresses are 160 bits..

For example:

  • address: 0xcd2a3d9f938e13cd947ec05abc7fe734df8dd826 (why cow? What joke did I miss?)
  • To binary: 1100110100101010001111011001111110010011100011100001001111001101100101000111111011000000010110101011110001111111111001110011010011011111100011011101100000100110
  • First 40 binary digits of the sha3 of the binary: 1110011101011010000000000110010011011010
  • That means it should start with three uppercase letters, followed by 2 lowercases, followed by 3 upper etc: 0xCD2a3D9F938E13Cd947ec05abC7fe734DF8DD826 not 0xCd2a3d9f938e13Cd947eC05ABC7fe734df8DD826 in your example.

I suppose I am misunderstanding what you are using as input..

PS: you can probably simplify your example by not checking for letters: you can do uppercase conversions on numbers and although there is such a thing as a lowercase digits they are represented the same

@vbuterin
Copy link
Contributor Author

>>> from ethereum import utils
>>> base_addr = utils.privtoaddr(utils.sha3('cow'))
>>> base_addr
'\xcd*=\x9f\x93\x8e\x13\xcd\x94~\xc0Z\xbc\x7f\xe74\xdf\x8d\xd8&'
>>> utils.sha3(base_addr)
'\xa2\x86)\xe4\x18A\xcc^p(\x99"z\x10\xd8\xfd}\xeb\xed\x9c\xe8\x7fG\xa9]\xcc;\xed\xd9\xa8\xa4\xef'

By "binary" I meant "just the raw bytes, not any kind of encoded representation". There's also the special chars ¹²³⁴⁵⁶⁷⁸⁹⁰ I suppose, but that's not backwards-compatible anymore.

@pipermerriam
Copy link
Member

I initially like this quite a bit. All of the cons that I see are extreme edge cases and I think that it's pretty trivial for library authors to handle gracefully. I like the backwards compatibility, the compatibility with existing hex parsing utilities.

@alexvandesande
Copy link

base_addr '\xcd*=\x9f\x93\x8e\x13\xcd\x94~\xc0Z\xbc\x7f\xe74\xdf\x8d\xd8&'

I'm not sure if the web3.js coverts to bytes. Also, pure javascript only supports binary conversion up to a hard limit, any larger and I had to use the BigNumber library. Wouldn't it be simpler just to use sha3(address)?

@vbuterin
Copy link
Contributor Author

Wouldn't it be simpler just to use sha3(address)?

Mathematically speaking it would be a bit ugly imo.

I'm not sure if the web3.js coverts to bytes

Yeah, I had this problem; for one of my example gambling dapps where I was using a hash-commit-reveal protocol I took an existing sha3 impl; you could do the same: https://github.com/ethereum/dapp-bin/blob/master/serpent_gamble/scripts/sha3.min.js

@simenfd
Copy link

simenfd commented Feb 19, 2016

I see some problems with ICAP's variable length and low checksum bitsize:

"XE7338O073KYGTWWZN0F2WZ0R8PX5ZPPZS": This is a 30 charaters address, IBAN compatible, based on the "Direct approach" from https://github.com/ethereum/wiki/wiki/ICAP:-Inter-exchange-Client-Address-Protocol

Now, If you enter such an address, and accidentally add another character somewhere, you have created a "Basic" (incompatible, but allowed and valid in ethereum ICAP implementation). The problem is that naively, without knowing all properties of the checksum algorithm, there is a 1% chance this will pass validation, and consequently you are sending money into a black hole.

On the topic of checksums in hex addresses:
0xCd2a3d9f938e13Cd947eC05ABC7fe734df8DD826

I agree that there should be some easy identification mechanism to separate it from an unchecked address. Alternatives might include:
XxCd2a3d9f938e13Cd947eC05ABC7fe734df8DD826
ExCd2a3d9f938e13Cd947eC05ABC7fe734df8DD826
#Cd2a3d9f938e13Cd947eC05ABC7fe734df8DD826

This makes it not completely backwards compatilble, but increadably easy to edit to satisfy a legacy system without any checksums.

@alexvandesande
Copy link

@simenfd there should be some easy identification mechanism to separate it from an uncheck address.

I disagree. I think the whole point of this scheme is that it's completely backwards compatible. There's no point in separating them. In my implementation, if the address is all caps or all small caps then it assumes to be a unchecksummed address. In a 40 char address, there will be in average 15 letters, the chances of all of them being the same case is 1:16384 so I guess it's strong enough.

@pipermerriam
Copy link
Member

the chances of all of them being the same case is 1:16384

That was exactly my line of thinking as well. It's safe enough to assume that all caps or all lower addresses are not checksummed.

@coder5876
Copy link

The backwards compatibility is nice but IMO presents a clear danger: If the user believes that the address has a checksum she might be willing to input an address by hand. If she then happens to use an old version of transaction handling that just parses the hex ignoring the case then her funds are lost in the case of a typo.

For this reason my feeling is that I prefer a scheme that would make a normal hex parser throw an error, rather than a user thinking she's protected by a checksum when in fact she is not.

@alexvandesande
Copy link

@christianlundkvist that's a good point, which can be solved with UI: show red when it fails, show yellow when it's not checksummed.

@simenfd
Copy link

simenfd commented Feb 19, 2016

@christianlundkvist Exactly my point: False security might be more dangerous than no security. E.g. when I enter a bitcoin address by hand (yeah, quite rarely), I am quite confident that the system will capture an error with the 32bit checksum that is universally implemented there; I wish I will get this confidence in ethereum as well.

For fun, I tried to make some ICAP addresses, using the functions in the go-ethereum implementation. The first two in bold are the original addresses, and the ICAP, the remaining are all ICAP mutation-addresses that validate, but of course, are different addresses.
0x11c5496aee77c1ba1f0854206a26dda82a81d6d8 == XE1222Q908LN1QBBU6XUQSO1OHWJIOS46OO

XE1222Q908LN1QBBU6XUQSO1OHWJIOS4603
XE1222Q908LN1QBBU6XUQSO1OHWJIOS4700
XE1222Q908LN1QBBU6XUQSO1OHWJIOS48IO
XE1222Q908LN1QBBU6XUQSO1OHWJIOS49FO
XE1222Q908LN1QBBU6XUQSO1OHWJIOS5AO5

@coder5876
Copy link

@alexvandesande: My main point was that backwards compatibility allows you to use the address in a dapp that was created before this EIP. So the UI in this case wouldn't know anything about checksummed addresses and wouldn't give the user any specific warning. If the user receives an address like 0xCd2a3d9f938e13Cd947eC05ABC7fe734df8DD826 they would think "sweet, it's checksummed!" and type it by hand into an app which hasn't been updated, and lose Ether when they make a typo.

@pipermerriam
Copy link
Member

I'd like to challenge the idea that we should pay much attention to the "type it in by hand" use cases. If the ecosystem matures then we'll have good tooling around QR-code based transmission of addresses or something else that's even better UX.

If the user receives an address like 0xCd2a3d9f938e13Cd947eC05ABC7fe734df8DD826 they would think "sweet, it's checksummed!" and type it by hand into an app which hasn't been updated, and lose Ether

The only way to avoid this situation is to have checksummed addresses be backwards incompatible. I'm of the opinion that backwards incompatibility is worse than cases where someone burns ether using an app that doesn't implement checksumming using an address that "looks" like it's checksummed. I think this situation is likely to be rare and to largely apply to using old software from before the checksum days, or poorly written software.

@coder5876
Copy link

@pipermerriam:

I'd like to challenge the idea that we should pay much attention to the "type it in by hand" use cases.

In that case do you think we should not worry about checksumming at all? Are there other scenarios where checksums are used?

The only way to avoid this situation is to have checksummed addresses be backwards incompatible.

I feel like this would be preferred.

I think this situation is likely to be rare and to largely apply to using old software from before the checksum days, or poorly written software.

My view is that the moment the checksum is introduced a majority of software becomes old software, and people are notoriously slow at updating too...

@pipermerriam
Copy link
Member

In that case do you think we should not worry about checksumming at all? Are there other scenarios where checksums are used?

My point was that I believe the type-by-hand use case is a small corner case where the user is potentially already doing something questionable. We can still apply checksums to these, but I am of the opinion that we don't need to cater to this use case.

As for the other stuff, I don't have very strong opinions on the matter. Backwards compatibility seems nice but I see the validity in the idea that a breaking change is also a way to achieve a level of security in the area since it removes ambiguity.

@alexvandesande
Copy link

I don't believe we can expect any users to realize the difference between a check summed address and a normal one (most people don't realize this even for bank accounts when the last digit is separated like12345-7), this is not the point of the checksum.

The point of backwards of compatibility is that transactions between checksum enabled wallets are safer. If you make a typo in a non checksum enabled wallet you'll lose your ether, just like you do now, and it's that particular wallet's developer job to make that client more secure.

Also, I don't think copying by hand is the main situation here, if we were trying to optimize that then we should be talking about pseudo-word seeds and name registries. Checksums are just extra securities against accidental typos, letters that were cut out by copying the wrong digit and are an extra assurance to the user that the address is still intact, just like the icon is.

I don't really see any disadvantage of adding these are they were very simple to implement to web3.js

Although I still haven't matched the initial implementation, probably because basic primitives on Python are very different than what JavaScript comes up with. Since a lot of implementations will be JavaScript I still think it makes more sense to use the sha of the hex, since that's how it comes to the library..

On Feb 19, 2016, at 18:08, Piper Merriam notifications@github.com wrote:

In that case do you think we should not worry about checksumming at all? Are there other scenarios where checksums are used?

My point was that I believe the type-by-hand use case is a small corner case where the user is potentially already doing something questionable. We can still apply checksums to these, but I am of the opinion that we don't need to cater to this use case.

As for the other stuff, I don't have very strong opinions on the matter. Backwards compatibility seems nice but I see the validity in the idea that a breaking change is also a way to achieve a level of security in the area since it removes ambiguity.


Reply to this email directly or view it on GitHub.

@coder5876
Copy link

I don't really feel very strongly either way TBH and the design of this particular checksum scheme is actually super cool. 😊
Thinking about my own interactions it's the need to always tell people to NEVER EVER type in an address by hand that gets annoying. But you are right @alexvandesande that as long as I update my own tools to use checksums I don't have to give people this advice anymore when advising them on using the tools that I build. 😊

@ethernomad
Copy link

Any reason we don't use good old base 58?

@alexvandesande
Copy link

Jonathan: This would break backwards compatibility. We already have a proposed standard without backwards compatibility that adopts more characters it's called IBAN

Sent from my iPhone

On Feb 20, 2016, at 03:08, Jonathan Brown notifications@github.com wrote:

Any reason we don't use good old base 58?


Reply to this email directly or view it on GitHub.

@taoteh1221
Copy link

Just chiming in as a web2 dev mostly being an observer (of your work and of end users discussions): If you look at the Ethereum subreddit these days there are a ton of new adopters with no tech experience at all trying to find out how to use Ethereum. In short, I believe anything including typing addresses by hand should be expected. I remember seeing twitter pinned tweets in 2014 with images (not text) of dogecoin addresses for charities etc. A lot of adopters may barely know their way around a computer at all, and I think if you accomplish retaining them you are a raging success and have what is needed for mass adoption.

@alexvandesande
Copy link

Agree. And adding a case sensitive checksum increases security for those cases, while being invisible for implementations that don't support it

On Feb 20, 2016, at 12:26, Michael Kilday notifications@github.com wrote:

Just chiming in as a web2 dev mostly being an observer (of your work and of end users discussions): If you look at the Ethereum subreddit these days there are a ton of new adopters with no tech experience at all trying to find out how to use Ethereum. In short, I believe anything including typing addresses by hand should be expected. I remember seeing twitter pinned tweets in 2014 with images (not text) of dogecoin addresses for charities etc. A lot of adopters may barely know their way around a computer at all, and I think if you accomplish retaining them you are a raging success and have what is needed for mass adoption.


Reply to this email directly or view it on GitHub.

@vaib999
Copy link

vaib999 commented Jun 27, 2017

I am curious what java implementation of this is ?

@cdetrio
Copy link
Member

cdetrio commented Jun 27, 2017

@almindor

You'll find the correct specification and example implementations at the file here: https://github.com/ethereum/EIPs/blob/master/EIPS/eip-55.md. The file also includes an adoption table to help track the adoption of EIP-55 checksums in the ecosystem.

We're going to close this issue now. If any corrections need to be made (or to update the adoption table), please open a PR on the file.

@prusnak
Copy link

prusnak commented Jul 11, 2017

You should edit the example code and test vectors in the first post. It is wrong and someone who does not read the whole conversation will use the incorrect implementation.

@cdetrio
Copy link
Member

cdetrio commented Aug 24, 2017

This EIP is now located at https://github.com/ethereum/EIPs/blob/master/EIPS/eip-55.md. Please go there for the correct specification. The text in this issue may be incorrect or outdated, and is not maintained.

@axic
Copy link
Member

axic commented Nov 16, 2017

@cdetrio can you push the "official test suite" into the EIP?

I believe it is this one: #55 (comment)

@adyliu
Copy link

adyliu commented Aug 3, 2018

Java checker of ethereum address
https://gist.github.com/adyliu/6c5ff4d41aa0177da55f4b8b1703f54a

@voron
Copy link

voron commented Aug 6, 2018

Current python3 eth-utils implementation

python3 -c "from eth_utils import address; import sys; print(address.to_checksum_address(sys.argv[1]));" 0x5aaeb6053f3e94c9b9a09f33669435e7ef1beaed

Output is

0x5aAeb6053F3E94C9b9A09f33669435E7Ef1BeAed

@Th1983
Copy link

Th1983 commented Aug 12, 2022

Thanks

RaphaelHardFork pushed a commit to RaphaelHardFork/EIPs that referenced this issue Jan 30, 2024
just-a-node pushed a commit to connext/EIPs that referenced this issue Feb 17, 2024
godking29 pushed a commit to godking29/ethereumjs-util that referenced this issue Mar 17, 2024
@Mortiemi
Copy link

EDITOR UPDATE (2017-08-24): This EIP is now located at https://eips.ethereum.org/EIPS/eip-55. Please go there for the correct specification. The text below may be incorrect or outdated, and is not maintained.

Code:

def checksum_encode(addr): # Takes a 20-byte binary address as input
    o = ''
    v = utils.big_endian_to_int(utils.sha3(addr))
    for i, c in enumerate(addr.encode('hex')):
        if c in '0123456789':
            o += c
        else:
            o += c.upper() if (v & (2**(255 - i))) else c.lower()
    return '0x'+o

In English, convert the address to hex, but if the ith digit is a letter (ie. it's one of abcdef) print it in uppercase if the ith bit of the hash of the address (in binary form) is 1 otherwise print it in lowercase.

Benefits:

  • Backwards compatible with many hex parsers that accept mixed case, allowing it to be easily introduced over time
  • Keeps the length at 40 characters
  • The average address will have 60 check bits, and less than 1 in 1 million addresses will have less than 32 check bits; this is stronger performance than nearly all other check schemes. Note that the very tiny chance that a given address will have very few check bits is dwarfed by the chance in any scheme that a bad address will randomly pass a check

UPDATE: I was actually wrong in my math above. I forgot that the check bits are per-hex-character, not per-bit (facepalm). On average there will be 15 check bits per address, and the net probability that a randomly generated address if mistyped will accidentally pass a check is 0.0247%. This is a ~50x improvement over ICAP, but not as good as a 4-byte check code.

Examples:

  • 0xCd2a3d9f938e13Cd947eC05ABC7fe734df8DD826 (the "cow" address)
  • 0x9Ca0e998dF92c5351cEcbBb6Dba82Ac2266f7e0C
  • 0xcB16D0E54450Cdd2368476E762B09D147972b637

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests