Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define stable API for StateManager #268

Closed
axic opened this issue Feb 3, 2018 · 20 comments
Closed

Define stable API for StateManager #268

axic opened this issue Feb 3, 2018 · 20 comments

Comments

@axic
Copy link
Member

axic commented Feb 3, 2018

Follow up of #33. Please read that thread properly first.

@kumavis since you had some suggestions, would you have a plan for a nice API?

@holgerd77
Copy link
Member

Note: this should probably include making access to the cache (stateManager.cache) private and let this be handled by the stateManager.

@mattdean-digicatapult
Copy link
Contributor

In an ideal world I think the trie and blockchain properties should also be considered private for the same reasons as cache. Any direct accesses should be handled by stateManager

@holgerd77
Copy link
Member

Let's push forward towards an ideal world. 😁

Yes, that really makes sense. The StateManager <-> Rest-of-the-VM relationship is really something where there still needs some work to be done. I generally find this whole API/usage confusing at times.

@holgerd77
Copy link
Member

Not sure if it's just/mainly the lack of code documentation though.

@mattdean-digicatapult
Copy link
Contributor

mattdean-digicatapult commented Jul 3, 2018

A little while ago I did some investigation to work out what we might change to make this interface more compact, easier to implement and cleaner in intention. I'll apologise in advance for the size of this post but there's quite a lot to go through. I've got a functioning implementation of most of these changes at https://github.com/mattdean-digicatapult/ethereumjs-vm/tree/stateManagerCleanup-hacks but this is obviously too large to PR. I can rebase this into separate changes as desired, but I thought this could start the discussion. Here we go...

Firstly, I believe stateManager should:

  • encapsulate all interactions of the evm with the storage trie. This should include all reads and all writes
  • not encapsulate interactions with the additional blockchain state; notably the block headers and the information contained within them
  • not expose unnecessary details about how it is implemented. Caching functionality should be completely hidden for example. The underlying data store of the storage trie should also be hidden
  • expose all methods asynchronously, taking a callback function (ideally also returning a Promise to give us some future flexibility)

Next, are some things that I've noticed are in the current implementation that we might fix to clean things up. Most of these can be done in isolation of one-another.

  1. There are currently two different sets of commit/revert methods in stateManager. This seems to have been done to abstract away the solution to the consensus issues encountered following EIP-161. This unfortunately exposes the need for all future stateManagers to implement this logic. It would perhaps be more sensible to localise this in the evm as has been done in cpp-ethereum. This would remove two methods from stateManager but would add localised complexity to runCall().

  2. There are currently three different definitions of "empty-ness" defined for an account. The first is defined in ethereumjs-account, the second in the method stateManager.exists() and the third by the method stateManager.accountIsEmpty(). They may have had valid distinctions previously, but I believe they are now not necessary. I would suggest consolidating them to a single implementation, probably keeping stateManager.accountIsEmpty() but it's implementation simply delegating to an updated definition in ethereumjs-account.

  3. We should remove the method stateManager.warmCache() and all external usages of stateManager.cache.flush(). This could be done by performing additional checkpoints on all external vm methods. When the checkpoint stack height collapses to zero we can flush. Alternatively we could define extra methods on stateManager to enter and exit a block. These would still have to be called when external vm methods are called but would not introduce superfluous checkpoints in the trie. We could also then get rid of the method cleanupTouchedAccounts (not done yet) as this could be done in the exit call reliably. It might however be less clear what is going on in that case.

  4. I'm still uncertain as to the relationship between stateManager and ethereumjs-account. Balance changes, for example, can be performed by both by using a stateManager method and by mutating the account and then using stateManager.putAccount(). The same goes for the account storage root or the code hash but these would be invalid without actually storing the associated value. One option is to say that code and storage should be changed with methods in stateManager whilst the nonce and balance should be updated using putAccount. Another option is to replicate all account reads/writes in stateManager but this feels like we're rewriting a lot of ethereumjs-account. My preference instead is to get rid of stateManager.getAccountBalance() and stateManager.putAccountBalance() updating relevant usages. We should also add a check stateManager.putAccount() to check that the codeHash and state root haven't been externally manipulated (not done yet).

  5. I don't think getBlockHash should be on stateManager. It seems odd that only this block property is present on stateManager. This would also mean the ctor doesn't need to take a blockchain which simplifies the interface.

  6. dumpStorage should not be part of the external interface as it is superfluous to internal evm usages. It can obviously be kept as part of the internal implementation.

  7. I'm uncertain how I feel about having the methods, hasGenesisState, generateCanonicalGenesis and generateGenesis on stateManager. The functionality of these could be, I think, performed externally without these methods. In their current state they also will lead to all other implementations implementing exactly the same logic. For now they could remain in stateManager but not be considered part of the interface.

If we did all of this the documented interface would become:

  • copy
  • getAccount
  • putAccount
  • putContractCode
  • getContractCode
  • getContractStorage
  • putContractStorage
  • clearContractStorage
  • checkpoint
  • commit
  • revert
  • getStateRoot
  • accountIsEmpty
  • cleanupTouchedAccounts

All this is only my thoughts from playing around with the code. Hopefully some discussion can now help us work to some consensus as to what this interface should look like.

@holgerd77
Copy link
Member

holgerd77 commented Jul 3, 2018

Hi Matthew, cool that you do such a serious take on this, looking forward to read this!

@holgerd77
Copy link
Member

Hi Matthew, sorry this got a bit delayed, the day I promised to do the reading I had two 2-line PRs and I didn't expect that to take 3 hours before I started on this. 😄

Will keep on with this though.

@holgerd77
Copy link
Member

@mattdean-digicatapult Hi Matthew, ok, I am slowly approaching this. Generally I think @cdetrio should join this discussion, since he has already done some StateManager refactoring work and you two might want to coordinate to bring things together.

Some first statements on your StateManger scope description:

  • encapsulate all interactions of the evm with the storage trie. This should include all reads and all writes

Yes.

  • not encapsulate interactions with the additional blockchain state; notably the block headers and the information contained within them

Yes, that also makes very much sense.

  • not expose unnecessary details about how it is implemented. Caching functionality should be completely hidden for example. The underlying data store of the storage trie should also be hidden

Yes, I'm no complete StateManager expert, but that was also one of my first thoughts when I had a first look. Ideally caching solution probably also should be swappable but that's maybe something for a second round.

  • expose all methods asynchronously, taking a callback function (ideally also returning a Promise to give us some future flexibility)

Not completely sure or having a strong opinion on this one.

@holgerd77
Copy link
Member

@mattdean-digicatapult Maybe already do a draft PR with a [DO NOT MERGE] note to the repo, then it might become easier to discuss your implementation suggestions.

@holgerd77
Copy link
Member

@mattdean-digicatapult I've allowed myself to add numberings to the points you have in your original post for easier reference, hope that was ok.

@holgerd77
Copy link
Member

holgerd77 commented Jul 12, 2018

Pre-note: will give answers to all of the listed points within this comment and save in between. Please don't read in e-mail but head over to GitHub to have the full list.

  1. Commit/Revert: Still a bit undecided on this but the reasoning seems valid and I would probably go with this.

  2. Emptiness: Yes, I would agree that this is the responsibility for ethereumjs-account.

  3. warm/flush: Not understanding deeply enough on how this is working internally, have to look at this closer at some point.

  4. ethereumjs-account: Yes, I think this generally needs some reflection. Wonder if trie actually has anything to do in ethereumjs-account at all. In the most clean version of this I would tend to say "no".

  5. blockchain: Yes yes yes.

  6. dumpStorage: Yes.

  7. genesis: Yes, I think suggestion with keeping / not interface makes sense.

I would also feel 80%+ comfortable with the proposed documented interface.

@holgerd77
Copy link
Member

Hi @mattdean-digicatapult, ok, did a first review of your proposed changes and had some (very first) look into your commits. I think this is generally very much going into the right direction, and from my side I would strongly encourage you to keep on with this work. Will nevertheless be good to have a further opinions on this, hopefully @cdetrio can join the discussion.

This is also generally a good time for actually implementing these changes, since atm no other restructuring changes are on the way, so there can be a solid focus on this without a lot of risk with interference with other major refactoring works.

@mattdean-digicatapult
Copy link
Contributor

Hi @holgerd77, thanks for having a look. I'll do a [DO NOT MERGE] PR to make reading this easier. There are a few things missing (like I've completely commented out the cache unit tests 😜 ) but it's a start. Thanks for numbering the points it makes it much easier to talk about.

Broadly speaking it seems like we're on the same page so that's good.

Regarding point 3, I'm really keen to hide the caching functionality so getting rid of the warm/flush cache functions is important. The exact implementation is what's in question I think.

When I was suggesting all methods should be asynchronous, the real benefit is that it allows other stateManager implementations to be asynchronous in their methods. If the state is stored in an external ethereum node it is an absolute pain if any of these methods have to be forced to be synchronous. The idea to also return a Promise is really a punt on my part as I believe the new Promise api combined with async/await is a huge readability benefit for js code. I don't know how others feel, but my opinion is that the internal code of this library would be much easier to reason about if we migrated to promises. We can't do this cleanly unless we inforce that externalized interface methods that are asynchronous also return promises, hence the suggestion.

@holgerd77
Copy link
Member

Hi @mattdean-digicatapult, thanks for the quick response. The topic of promises came up several times lately and I think the transition to a promise-based interface has become common sense in the team to be desirable (please contradict me everyone who is reading this and is of different opinion).

@holgerd77
Copy link
Member

(we are generally behind throughout whole EthereumJS org with a modern JS code base transition).

@holgerd77
Copy link
Member

Hi @mattdean-digicatapult, I've lost a bit track, is there a last PR to do for the StateManager changes? Just scrolled through your initial post, there is this this question on the genesis methods somewhat open, and generally there needs to be some more API documentation on this, anything else?

I am thinking about how to release for some time now, generally I would like to do a somewhat bigger VM release within the next 1-2 weeks or so once we have the last Constantinople EIP (EXTCODEHASH) in place.

I initially thought to also have the StateManager changes ready for this and do a major v3.0.0 release. On a second thought it's maybe as valid to "just" do a v2.5.0 release, clearly point out the StateManager changes and mark this as "beta" for a future stable API. Then this is more of a two-step process and we give others the chance to give feedback and can a bit later do a release declaring a stable/finalized StateManager API.

Does this generally make sense? I would have some tendency to remove StateManager from the VM and have this as an own package, do you have a strong stance on this (this would then be the time for a v3.0.0 release 😄)?

Would it eventually make sense to integrate the Account library in this repo? Just an initial thought I had some days ago, not thought about this more deeply.

Cheers, hope you recovered (if necessary 😋) well from Devcon!
Holger

@mattdean-digicatapult
Copy link
Contributor

@holgerd77, so there is a bit more to do. I've been keeping #309 up to date as I've been doing all this so everything there needs to be merged still. There's also API documentation as well as you mention. I see no reason why all of that cannot be completed in the 1-2 weeks timeline that you've set out. Whether there is an intermediate release or not is up to you guys but it might be a little more conservative and a little safer to do so. I really don't mind either way.

Now, whether it's 1 or more PRs somewhat depends on how you feel about what's left. As far as I'm concerned there is:

  1. Remove getAccountBalanceand setAccountBalance methods from stateManager
  2. Remove dependency on stateManager.copy() from the VM. (this is new see below)
  3. Documentation (not yet done)

Regarding point 2, I recently also had to make a change to do with SSTORE calculations for Constantinople. The implementation in master leads to the copy method of stateManager becoming part of the exposed interface. This would be fine but the copy method is highly unintuitive to me and would be difficult to document. The best I've come up with is as follows:

copy: Creates a new instance of the current stateManager whose state represents the last state that was fully committed, i.e. the state as though all current changes in the checkpoint stack were reverted.

Incidently, my change here has also fixed one of the remaining Constantinople failing tests. What do you think; can I get away with doing the remaining changes as a single PR? Then I'll do documentation as the very last PR?

@holgerd77
Copy link
Member

Ah, didn't know #309 is kept up to date. Just had a look at the scope of changes, I think it should be fine to do this in a single PR, also your reasoning for the different changes sound valid, so please go for it! 😄

@mattdean-digicatapult
Copy link
Contributor

Awesome, I'm going to add unit tests for the new storageReader class and then I'll push it to a new branch in this org so we can see coverage information

@holgerd77
Copy link
Member

Will close in favor of #503.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants