Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Json ABI reflects presence of self parameter. #527

Closed
wants to merge 1 commit into from

Conversation

g-r-a-n-t
Copy link
Member

@g-r-a-n-t g-r-a-n-t commented Aug 31, 2021

What was wrong?

We currently allow for functions that are pure with regard to contract storage, but the limited scope of these functions is not reflected in the Json ABI.

How was it fixed?

To-Do

  • OPTIONAL: Update Spec if applicable
  • Add entry to the release notes (may forgo for trivial changes)
  • Clean up commit history

@g-r-a-n-t
Copy link
Member Author

What do you guys think about placing all builtin attributes under self? @cburgdorf @sbillig

e.g. self.tx.origin, self.msg.value, self.chain.id, ...

@cburgdorf
Copy link
Collaborator

I wouldn't want to take up so many identifiers on self but we could mitigate that by grouping them under something like context (e.g. self.context.chain.id, self.context.msg.value). That said, I'm not sure if that's the right approach. If the motivation is just to have it easier to tell if a function is pure then at the same time we are making it much harder to spot if a function interacts with storage because now functions require self to be added even if they don't actually interact with storage. So on a second though, I think I'm not a fan. I think it's a bigger benefit to be able to tell easily if a function interacts with storage.

@cburgdorf
Copy link
Collaborator

Btw, we should make sure to always set the stateMutabilityproperty for every function to be either pure, view, nonpayable or paybable because some tooling amuses it to always be present. E.g. the popular hardhat framework assumes that to be present and won't work without it.

@g-r-a-n-t
Copy link
Member Author

I actually like the self.context (or self.ctx) idea . It's kind of like accessing a super class.

As noted in our other discussion about const values, I don't think self should be tied strictly to storage. I think it more so represents an instance of the contract, which is a fairly broad definition. These contextual attributes seem, to me, to be part of the contracts instance, as the instance of the contract lives on some chain and is provoked by some tx with some msg data in a block. As you mentioned, we do not want to pollute the namespace, so we can place all of them under a single attribute like context or ctx.

@cburgdorf
Copy link
Collaborator

cburgdorf commented Sep 7, 2021

I can understand that reasoning but I'm trying to look through the perspective of code auditing. It's helpful to be able to tell in what kind of access category a function falls just by looking at its signature. I'm wondering if something like the following could work nicely:

  • If a function already takes self (because it accesses contract storage) then one can use self.ctx to get access to these contextual properties.
  • If it is only interested in the contextual properties it takes ctx instead.

Examples:

Takes self

pub fn read_storage_and_msg_value(self) -> u256:
    return self.my_balance + msg.ctx.value

Takes ctx

pub fn only_reads_msg_value(ctx) -> u256:
    return ctx.value

Takes self but shouldn't (produces compile error or warning)

pub fn should_not_take_self(self) -> u256:
    return self.ctx.value    # <-- doesn't need to take `self`, could take only `ctx` instead

I haven't thought too much about this, just thinking out loud here.

@sbillig
Copy link
Collaborator

sbillig commented Sep 7, 2021

What do you guys think about placing all builtin attributes under self?

Yeah, I think I'm on board with this.

Regarding the ctx parameter idea, eg pub fn foo(ctx, val: u256):

I've been thinking about this, and I'm not sure how to make it not feel weird. How is this function foo called? foo(10)? Where does the magical ctx arg come from? Can you call foo from a pure function that doesn't take ctx? This looks similar to a function taking self, but it lacks the reasoning that the self parameter has. You can only call a function that takes self via self. Eg self.bar(10), which is effectively sugar for bar(self, 10). Can we come up with something similar for a ctx arg?

I think we should consider what categories we'd like to divide functions into, and how/if we intend for the user to indicate the category explicitly, and how the compiler will detect/enforce it.

Is it useful to distinguish between functions that read from storage and those that read from other state? Solidity doesn't, for example. In either case, the function is impure.

Starting with solidity's categories (pure/view/mutating; ignoring payable vs non-payable for now).

From the solidity docs:

the following are considered reading from the state:

  • Reading from state variables.
  • Accessing address(this).balance or .balance.
  • Accessing any of the members of block, tx, msg (with the exception of msg.sig and msg.data).
  • Calling any function not marked pure.
  • Using inline assembly that contains certain opcodes.

With Grant's proposal, a function taking self is a clear indicator that the function can read from the state (assuming we also make retrieving address balance part of self/self.ctx). It's impossible to write a "view" function that doesn't take self.

The following statements are considered modifying the state:

  • Writing to state variables.
  • Emitting events.
  • Creating other contracts.
  • Using selfdestruct.
  • Sending Ether via calls.
  • Calling any function not marked view or pure.
  • Using low-level calls.
  • Using inline assembly that contains certain opcodes.

A function taking mut self indicates that it can write to storage, but not taking mut self doesn't mean you can't emit events, create other contracts, or send ether. Right now a seemingly pure function might do any of these things; the only way to tell is to read the code and all code of all the functions it calls, etc. Should we gate these powers behind mut self? For example:

pub fn emit_transfer_event(mut self, from: address, to: address, amount: u64):
  self.ctx.emit(Transfer(from, to, amount))

If not, how do we differentiate between functions that have the power to do these things and those that don't?

I have a vague feeling that adding the concept of "capabilities" might be a reasonable answer here, but I'm not yet sure what that would look like. The general idea is that a function can't perform an action unless its given the capability to do so by the caller (who must have that capability, of course). In practice this might look like the ctx arg idea; a function can take ctx if it needs the ability to read from state, but that ctx needs to be passed in explicitly. Perhaps a function that takes self or mut self have access to these capabilities via the self arg, and can pass them to functions they call, which can then pass them onward, etc. For example:

pub fn transfer(mut self, from: address, to: address, amount: u64):
  if check_msg_val(self.ctx):
    emit_transfer_event(self.ctx.log, from, to, amount)

fn check_msg_val(ctx):
  return ctx.msg.val > 1

fn emit_transfer_event(log, from, to, amount):
  log.emit(Transfer(from, to, amount))

🤷

@sbillig
Copy link
Collaborator

sbillig commented Sep 7, 2021

Been thinking about the ctx arg and capabilities idea some more. RFC:

contract Foo:
  # Externally, this is a view fn that takes one arg. `ctx` is magically provided.
  # If called internally, `ctx` must be passed in.
  pub fn check_msg_value(ctx, min: u256) -> bool:
    return ctx.msg.value > min

  pub fn foo(ctx) -> bool:
    return check_msg_value(ctx)

  # We could even allow a subset of ctx.
  # Externally, this is a view that takes one arg.
  # Internally, it takes two args.
  pub fn check_time(block, t: u256) -> bool:
    return block.timestamp > t

contract Bar:
  foo: Foo
  min_value: u256
  deadline: u256
  
  # maybe `self` shouldn't provide access to `ctx`;
  # one has to be explicit about what state is accessed by a fn
  pub fn is_ready(self, ctx) -> bool:
    # if we're calling a fe contract from fe, we could require that ctx etc be passed in explicitly
    # even though it won't be included in the calldata
    return self.foo.check_msg_value(ctx, self.min_value) && self.foo.check_time(ctx.block, self.deadline)

Following this line of thought, one could "tighten up" the definition of external interfaces, so they're safer to call from fe:

interface Erc20:
  fn balanceOf(self, owner: address) -> u256
  fn transfer(mut self, msg, log, to: address, value: u256) -> bool

contract TokenManager:
  tokens: Array<impl Erc20, 32>

  pub fn transfer(self, msg, log, to: address, value: u256) -> bool:
    let mut ok: bool = true
    for token in self.tokens:
      ok &&= token.transfer(msg, log, to, value)
    return ok

  pub fn get_token_value(self, msg, index: u256) -> u256:
    let val = self.tokens[index].balanceOf(msg.sender)

    # underhanded attempt at stealing tokens:
    self.tokens[index].transfer(0xfece5head, value) # error: Foo::transfer requires `msg` and `log` capabilities

@cburgdorf
Copy link
Collaborator

cburgdorf commented Sep 8, 2021

Thank you for giving so much thought to this idea. I hadn't thought that far ahead, but I think the suggestion has merit.

not taking mut self doesn't mean you can't emit events, create other contracts, or send ether. Right now a seemingly pure function might do any of these things

Mmh, yeah, if we are trying to make it obvious to derive the category in which a function falls from its signature then that's a problem. A function that we tag as pure in the ABI should not be able to log events or create other contracts. That's a compelling argument in favor of exploring ways to tighten the rules!

I'm a bit worried where taking all the individual capabilities as arguments will lead though. For instance, I had always thought that things like create / create2 would eventually end up behind an import such as from evm import create2 and that by doing so we would keep the language more future proof if for instance eventually the EVM gets replaced with an eWASM[1] VM and maybe create / create2 would be replaced with different mechanisms then.

But I guess in your proposal create/ create2 would also become capabilities that one would demand to be passed in as an argument (or be magically provided if the function was called from externally).

I'm wondering if instead of using these magical arguments we could use generics + traits instead. Notice that self would be a special generic type param that automatically applies to the self argument.

pub fn wants_access_to_msg<self: Msg>(self) -> u256:
  return self.msg.value # Only has access to `self.msg` because of the Msg trait
pub fn wants_access_to_transfer<self: Transfer>(mut self):
  # Only has access to `self.transfer` because of the Transfer trait.
  # Also the Transfer trait may require us to take `mut self` because maybe
  # changing balances should fall into the same category as mutating storage?
  self.transfer(0xdeadbeef, 10)
# These traits can be combined
pub fn wants_access_to_transfer_and_msg<self: Transfer, Msg>(mut self):

  self.transfer(0xdeadbeef, self.msg.value)

I think the nice thing here is that we reduce the "magic" a bit and these traits can later work together with the import system such as in the following example:

from evm import Create2

# Again the Create2 trait may demand that `self` is taking with `mut` because we may think creating a contract (which might create storage) falls into the same category as mutating contract storage.
pub fn wants_access_to_create2<self: Create2>(mut self):
  self.create2(...)

For my own clarity I try to define categories that are compatible with the existing rules of pure and view but go beyond them to provide more safety:

Category Characteristics Fe Syntax ABI
Pure Can only operate on input arguments and not produce any information besides its return value. Can not take self and therefore has no access to things that would make it impure foo(val: u256) pure
Read Contract Reading information from the contract instance (broad definition includes reading constants from contract code) foo(self) view
Context Reading Reading contextual information from the blockchain (msg, block etc) foo<self: Msg, Block>(self) view
Storage Writing Writing to contract storage (own or that of other contracts) foo(mut self) payable or nonpayable
Context Modifying Emitting logs, transferring ether, creating contracts foo<self: Log, Create2>(mut self) payable or nonpayable

As we can see this scheme makes the capabilities of the function very clear from the function signature but doesn't require us to add any magic parameters.

Also, another cool thing about this scheme is that it allows us to better explore ideas like #519. E.g. in the future we might have a new ProtectedMsg trait that can only be read from once.

from evm.future.whatever import ProtectedMsg

contract Foo:

  pub fn wants_msg<self: ProtectedMsg>(self):
     self.msg.value # ok
     self.msg.value # compile error: can not be read again

As we can see ☝️ we are still using self.msg but since we imported a different trait we now get a slightly different behavior.

One obvious problem with this scheme is that names could become ambigious e.g.

contract Foo:

  pub fn send_money<self: Transfer>(mut self):
     # Transfer refers to `transfer` from the `Transfer` trait but what about that locally named method `transfer`?
     self.transfer(0xdeadbeef, 10)

  pub fn transfer(to: address, value: u256):
     pass

I think that this could be worked around e.g. by strictly prohibiting locally defined functions/attributes with conflicting names. Or if that's a bit too harsh I'm sure other ways could be explored to resolve that issue.

[1] The eWASM project lost a bit of steam but I wouldn't bet on it being entirely dead yet

@sbillig
Copy link
Collaborator

sbillig commented Sep 8, 2021

A couple thoughts:

  • One thing I like about these things being arguments, is that they have to be explicitly passed through the call graph, so it's easy to tell where capabilities are being used.
pub fn send_money<self: Transfer>(mut self, to: address):
  self.transfer(to, 1)
  # presumably this doesn't work, because log_transfer_event requires Log,
  # but our current `self` lacks that capability.
  self.log_transfer_event(to, 1)

pub fn log_transfer_event<self: Log>(to:address, amount: u256):
  self.log(Transfer(to, amount))

In this example, I guess send_money would have to be defined as send_money<self: Transfer + Log>, but log isn't used in send_money, so one would have to check all the functions that send_money calls to see where/if Log is used. This seems like a variation on solidity's function labels; if a function is labeled as view or payable, you can't necessarily see which capabilities are used how without reading through the call graph. You could replace "view" with "pure" and let the compiler tell you why that's not allowed; same applies here: you could delete Log and let the compiler tell you why its needed, but that doesn't seem ideal to me.

  • It seems undesirable to always need self to get access to ctx; you can't be clear that a function can only read ctx, but can't depend on contract state. Similarly for log being part of mut self; we shouldn't have to expose contract state mutation ability to a function that just logs an event. I'm not sure how much this matters in practice.

@sbillig
Copy link
Collaborator

sbillig commented Sep 8, 2021

It seems undesirable to always need self to get access to ctx; you can't be clear that a function can only read ctx, but can't depend on contract state.

If ctx is part of self, and ctx has a named type, one could just define a function that takes ctx (which wouldn't be callable externally):

pub fn is_expired(self) -> bool:
  return check_expiry(self.expiry, self.ctx)

fn check_expiry(t: u256, ctx: Context) -> bool:
  return ctx.block.timestamp > t

Perhaps all of the (non-contract) chain state readers and mutators could be under a single object, and you can only mutate via a mut reference (which is only obtainable via mut self):

pub fn new_token(mut self, name: String<32>) -> address:
  if check_value(self.ctx):
    let address = create_contract(mut self.ctx, name)
    log_event(mut self.ctx, name, address)
  else:
    revert()

fn check_value(ctx: Context) -> bool:
  return ctx.msg.value > 1
fn create_contract(ctx: mut Context, name: String<32>) -> address:
  return ctx.create2(...)
fn log_event(ctx: mut Context, name: String<32>, addr: address):
  ctx.log(NewToken(addr, name))

The child objects of Context could also have type names, and a function could be defined that just takes msg. In the future, the Context object could give access to a ProtectedMsg type, or some other kind of single-use msg value, etc.

A function taking any of these state r/w types is a pretty clear indicator of what the function does, and what category it falls in.

@sbillig
Copy link
Collaborator

sbillig commented Sep 8, 2021

I guess we could say that functions that take these state types can be called externally, and the values are magically provided (in the same way that self is magically provided in that case).

@cburgdorf
Copy link
Collaborator

log isn't used in send_money, so one would have to check all the functions that send_money calls to see where/if Log is used. This seems like a variation on solidity's function labels; if a function is labeled as view or payable, you can't necessarily see which capabilities are used how without reading through the call graph.

Yep, that's a valid concern 👍

It seems undesirable to always need self to get access to ctx; you can't be clear that a function can only read ctx, but can't depend on contract state.

Yes, that's right so another thing that initially thought of would be to use the generics to scope the context object as in:

fn check_value<C: Message >(ctx: C) -> bool:
  # Givess access to `msg` but doesn't give access to `block`
  return ctx.msg.value > 1

One thing I like about these things being arguments, is that they have to be explicitly passed through the call graph, so it's easy to tell where capabilities are being used.

Yes, I think that's true. Well, I guess I'm on board with the idea if we keep the magic injection to just Context and then put things like create2 on that context object as you did in your example.

I also like the idea of demanding mut Context for things that modify the blockchain such as ctx.log(..), ctx.create2(..) 👍

I guess we could say that functions that take these state types can be called externally, and the values are magically provided

Yes, I think so but I think we should reserve that magic injection to the context object alone. I feel that the difference between Context and mut Context might already draw the right line between the things that matter.

@sbillig
Copy link
Collaborator

sbillig commented Nov 23, 2022

Closing as defunct. See #783 for the json changes.

@sbillig sbillig closed this Nov 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants