Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory and storage copying #155

Merged
merged 1 commit into from
Dec 17, 2020
Merged

Conversation

g-r-a-n-t
Copy link
Member

@g-r-a-n-t g-r-a-n-t commented Dec 9, 2020

What was wrong?

closes #152

  • we didn't have operations for copying storage -> storage or memory -> memory
  • the existing naming scheme for these operations was a bit confusing (e.g. scopy for storage -> memory copying)
  • copy operations were going byte-by-byte, which is extremely inefficient

How was it fixed?

  • implemented said copying operations and used them in the relevant places (assignments and expression moves)
  • updated the existing operations with a cleaner naming scheme
  • copying now happens word-by-word

To-Do

  • OPTIONAL: Update Spec if applicable
  • Clean up commit history

@g-r-a-n-t g-r-a-n-t force-pushed the data-operations branch 2 times, most recently from 001f803 to 1d52991 Compare December 10, 2020 00:48

return match attr.node {
builtins::CLONE => value_attributes.into_cloned(),
builtins::TO_MEM => value_attributes.into_cloned_from_sto(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, .clone() would be used to do a memory to memory copy and to_mem() to make a storage to memory copy? We might have to fine tune the naming.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a good alternative?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, .clone() would be used to do a memory to memory copy and to_mem() to make a storage to memory copy?

And yeah, this is the case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One alternative to consider would be to treat the difference as an implementation detail and just use clone for both. If I understand correctly, there isn't really a case where one has to choose between clone() or to_mem() because each does only apply in one specific scenario where the other one doesn't apply. But we can argue that eventually both methods allocate a new segment of memory. It is just the origin that is different. So maybe we could just branch into two different implementations internally but only expose clone()?

self.my_other_u256 = my_other_u256

pub def set_to_my_other_vals():
self.my_string = self.my_other_string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused about what is happening here. Are we copying because strings are currently copied by default? Or are we setting self.my_string to another storage reference?

My first reaction was: Ah, we are probably copying because strings are set to implicitly copy but then further down I see this code:

pub def emit_my_event():
        self.emit_my_event_internal(
            self.my_string.to_mem(),
            self.my_u256.to_mem()
        )

...which makes me think they aren't implicitly copied and need .to_mem() to explicitly copy them. Can you clarify what happens here?

To be honest, the more I think about it, the more I'm leaning towards not implicitly copying for any variable that is in storage no matter its type. This may feel a little awkward for things like numerics where people are just so used to having them copied implicitly but at the same time it makes the semantics very very clear.

Because strings in Fe are currently not much different from arrays. Both have a fixed max length but can occupy any size between 0 and the max length. So you can have a string10000 and it would be hard to explain why we implicitly copy it but demand to_nem for an u256[10000].

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...which makes me think they aren't implicitly copied and need .to_mem() to explicitly copy them. Can you clarify what happens here?

This is a memory to storage copy: self.my_string = self.my_other_string.to_mem()

This is a storage to storage copy: self.my_string = self.my_other_string

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we copying because strings are currently copied by default?

Strings, arrays, and any other reference types will be treated the same in terms of assignments.

Copy link
Member Author

@g-r-a-n-t g-r-a-n-t Dec 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, the more I think about it, the more I'm leaning towards not implicitly copying for any variable that is in storage no matter its type. This may feel a little awkward for things like numerics where people are just so used to having them copied implicitly but at the same time it makes the semantics very very clear.

This is to say that something like

my_num: u256 = self.my_nums[42]

would not automatically load the value stored at the pointer onto the stack? So we would instead need another attribute function to load this onto the stack?

my_num: u256 = self.my_nums[42].to_stack()

I think we should stick with the conventional behavior of loading onto the stack when assigning a value type.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should stick with the conventional behavior of loading onto the stack when assigning a value type.

Ok, but isn't it confusing then that I have to call to_mem() on self.my_u256 here.

pub def emit_my_event():
        self.emit_my_event_internal(
            self.my_string.to_mem(),
            self.my_u256.to_mem()
        )

Why wouldn't that cause an implicit copy directly?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, this is my mental model of how I assume things to work now:

Mem Storage
Mem some = val.clone() some = self.val.to_mem()
Storage self.some = val.clone() self.some = self.val

For numeric

Mem Storage
Mem some = val some = self.val
Storage self.some = val self.some = self.val

Copy link
Contributor

@satyamakgec satyamakgec Dec 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the concern @cburgdorf in terms of semantics but for the user-friendliness, I will vote for implicit conversion to memory as this is something every coder/developer used to, Almost every other language does that. We may need to communicate our message more strongly to our developer community for some exceptions like a string in this case.

For the event case, I do agree with you @cburgdorf that it should be implicit conversion as with standard semantics. I haven't checked the code thoroughly to find gotchas in it but I am assuming @g-r-a-n-t is very well aware of this behavior and if we want we can log it in our backlog.

Copy link
Collaborator

@cburgdorf cburgdorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is starting to look very nice. I left a few comments. There are still some things that confuse me. I'd also like to see some added tests for the compile errors we would expect from code that doesn't use to_mem() when it should. I left a comment with an example code that compiles without to_mem() which I think might be wrong. Would be great to walk through this PR in a call.

@g-r-a-n-t g-r-a-n-t force-pushed the data-operations branch 2 times, most recently from a18a55e to 0033117 Compare December 17, 2020 01:16
@codecov-io
Copy link

codecov-io commented Dec 17, 2020

Codecov Report

Merging #155 (9090fb7) into master (220185f) will decrease coverage by 0.81%.
The diff coverage is 38.75%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #155      +/-   ##
==========================================
- Coverage   81.29%   80.48%   -0.82%     
==========================================
  Files          44       44              
  Lines        3144     3202      +58     
==========================================
+ Hits         2556     2577      +21     
- Misses        588      625      +37     
Impacted Files Coverage Δ
compiler/src/yul/mappers/assignments.rs 97.82% <ø> (ø)
compiler/src/yul/runtime/functions.rs 0.00% <0.00%> (ø)
semantics/src/errors.rs 5.00% <0.00%> (-0.27%) ⬇️
semantics/src/namespace/types.rs 56.85% <ø> (ø)
semantics/src/traversal/assignments.rs 96.87% <ø> (ø)
semantics/src/traversal/functions.rs 45.28% <0.00%> (ø)
compiler/src/yul/abi/functions.rs 53.84% <33.33%> (-0.53%) ⬇️
semantics/src/traversal/expressions.rs 71.84% <36.92%> (-4.14%) ⬇️
compiler/src/yul/operations.rs 80.95% <40.00%> (-8.53%) ⬇️
semantics/src/lib.rs 77.89% <57.69%> (-4.61%) ⬇️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 220185f...9090fb7. Read the comment docs.

@g-r-a-n-t g-r-a-n-t merged commit 8e3189b into ethereum:master Dec 17, 2020
self.my_nums[3] = 1
self.my_nums[4] = 255
my_nums_mem = self.my_nums.to_mem()
return my_nums_mem
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the same thing have been achieved with?

pub def assign_my_nums_and_return() -> u256[5]:
        my_nums_mem: u256[5]
        my_nums_mem[0] = 42
        my_nums_mem[1] = 26
        my_nums_mem[2] = 0
        my_nums_mem[3] = 1
        my_nums_mem[4] = 255
        self.my_nums = my_nums_mem.clone()
        return my_nums_mem

So, all the individual assignments would be made in memory and then a copy of the whole memory array would be put in storage in one go. This is not about changing this, just for my understanding.

pub def emit_my_event():
self.emit_my_event_internal(
self.my_string.to_mem(),
self.my_u256.to_mem()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, now spotted that further down in sized_vals_in_sto.fe we have this:

 emit MyEvent(
            self.num,
            self.nums.to_mem(),
            self.str.to_mem()
        ) 

Here we don't call .to_mem() on self.num which is what I would expect. So, I'm assuming that maybe this is a bug?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory and storage copying
4 participants