Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: return data from BPF programs #19318

Closed
wants to merge 9 commits into from

Conversation

seanyoung
Copy link
Contributor

See docs/src/proposals/return-data.md in PR for details.


## Proposed Solution

The callee can set the return data using a new system call `sol_returndata(u8 *buf, u64 length)`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

Copy link
Member

@mvines mvines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@armaniferrante / @jstarry - would be great if you have some time to review this proposal, it should be quick!

docs/src/proposals/return-data.md Outdated Show resolved Hide resolved
docs/src/proposals/return-data.md Outdated Show resolved Hide resolved
docs/src/proposals/return-data.md Outdated Show resolved Hide resolved
seanyoung and others added 3 commits August 19, 2021 16:16
Co-authored-by: Michael Vines <mvines@gmail.com>
Co-authored-by: Michael Vines <mvines@gmail.com>
Co-authored-by: Michael Vines <mvines@gmail.com>
docs/src/proposals/return-data.md Show resolved Hide resolved
docs/src/proposals/return-data.md Outdated Show resolved Hide resolved
docs/src/proposals/return-data.md Outdated Show resolved Hide resolved
uint64_t *return_data_length,
);
```
On entry, `return_data_length` should point to the size of the buffer at `return_data`. If the callee
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the size of the return data impact compute unit consumption?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting question but I don't know what the answer to that is. @jackcmay @mvines wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than overloading invoke, why not break this out into a separate syscall that retrieves the return data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overloading invoke saves the program making two syscalls (invoke & get_return_data) rather than just invoke.

Maybe the separate syscall makes more sense. Happy to move on that and amend the proposal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's the trade-off. I'm leaning toward separating the two so that the two API's don't get entwined. Especially if we change invoke, we might need to create two new invokes. But the advantage of a single syscall isn't negligible so I could be swayed either way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yes, we have a common way of charging for data passed in and out of the program, these bytes should incur a similar cost

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the proposal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that now the return data is re-used from the invoke/callee return data and the current program return data. This means a) Single buffer required, rather than two b) Callee return data is passed up the stack, which is very useful proxy programs

seanyoung and others added 2 commits August 20, 2021 08:55
Co-authored-by: Justin Starry <justin.m.starry@gmail.com>
Signed-off-by: Sean Young <sean@mess.org>
Copy link

@ronielex ronielex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know about this promotion or being completely?

Signed-off-by: Sean Young <sean@mess.org>
Copy link

@ronielex ronielex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of idea means if someone to missed
The proposal didn't succeed it's means there is a lot of pending of what did done for.

mvines
mvines previously approved these changes Aug 27, 2021
Copy link
Member

@mvines mvines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest iteration looks great to me, nice and simple

@seanyoung seanyoung mentioned this pull request Sep 1, 2021
4 tasks
seanyoung added a commit to seanyoung/solang that referenced this pull request Sep 1, 2021
Requires:
 - solana-labs/solana#19548
 - solana-labs/solana#19318

Signed-off-by: Sean Young <sean@mess.org>
Comment on lines +103 to +105
When an instruction calls `sol_invoke()`, the return data of the callee is copied into the return data
of the current instruction. This means that any return data is automatically passed up the call stack,
to the callee of the current instruction (or the RPC call).
Copy link
Contributor

@joncinque joncinque Sep 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm coming in a bit late here, but here's an idea, take it or leave it. Would it make sense to also include the program id that set the return data?

As you say here, if you have program A which calls program B which calls program C, ie. A -> B -> C, program C can set a return value, B can use that return value but not set a return value itself, then A can also read C's return value. As a safety check, A may want to know who set the return value, B or C.

If program A relies on C's return value, for example to calculate how many tokens to transfer, then a malicious program B could update itself to change that value, and steal funds. It's a bit far-fetched, but at least we'd give devs the tools to program defensively if they can check which program set the value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a really good idea

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seanyoung - wdyt about providing the program id that set the return data in sol_get_return_data()? This'll allow the caller to add a defensive check like what Jon mentioned if they want.
Something like sol_get_return_data(u8 mut *buf, u64 length, Pubkey mut *from) -> u64.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree this is a good idea. I've changed the syscall as suggested. Thanks @joncinque !

- Compute costs are done for the syscalls, not for passing
- return data is cleared by sol_invoke
@mergify mergify bot dismissed mvines’s stale review September 8, 2021 17:13

Pull request has been modified.

Copy link

@ronielex ronielex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope it works for us

of the current instruction. This means that any return data is automatically passed up the call stack,
to the callee of the current instruction (or the RPC call).

Note that `sol_invoke()` clears the returns data, so that any return data from
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate this, A -> B -> C, does this mean that if B sets return data it would be cleared when invokeing C? So if B sets return data, calls C, C does not set return data, then A will not receive B's return value?

This behavior should be well documented next to the sol_set_return_data system call so folks have a clear understanding of what to expect

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've expanded on this.

@seanyoung
Copy link
Contributor Author

Merged via #19548

@seanyoung seanyoung closed this Sep 10, 2021
seanyoung added a commit to seanyoung/solang that referenced this pull request Sep 10, 2021
Requires:
 - solana-labs/solana#19548
 - solana-labs/solana#19318

Signed-off-by: Sean Young <sean@mess.org>
@Lichtso
Copy link
Contributor

Lichtso commented Sep 13, 2021

Unfortunately, I saw it just now that it was merged, and I have some concerns:

  • I don't think a syscall is the right way to implement this. We should avoid syscalls if possible, because of:
    • their context switch overhead
    • they are hard to get rid off again
    • simply not necessary here as a program can only return once, if I understand the purpose correctly
  • Global (transaction wide) state in InvokeContext is what I am trying to get rid off as it requires explicit copies everywhere.

Instead, as it needs to be implemented in the new ABI (see #19191) anyway, I think that reserving a shared memory area there would be a better approach.

@seanyoung
Copy link
Contributor Author

@Lichtso I should have added you as a reviewer, sorry about that. I think your criticisms are all valid.

Your proposed ABI requires re-building BPF programs anyway, how about if we removed the syscall in ABI v2.

seanyoung added a commit to seanyoung/solang that referenced this pull request Sep 14, 2021
Requires:
 - solana-labs/solana#19548
 - solana-labs/solana#19318

Signed-off-by: Sean Young <sean@mess.org>
seanyoung added a commit to seanyoung/solang that referenced this pull request Sep 14, 2021
Requires:
 - solana-labs/solana#19548
 - solana-labs/solana#19318

Signed-off-by: Sean Young <sean@mess.org>
seanyoung added a commit to seanyoung/solang that referenced this pull request Sep 18, 2021
Requires:
 - solana-labs/solana#19548
 - solana-labs/solana#19318

Signed-off-by: Sean Young <sean@mess.org>
seanyoung added a commit to seanyoung/solang that referenced this pull request Sep 18, 2021
Requires:
 - solana-labs/solana#19548
 - solana-labs/solana#19318

Signed-off-by: Sean Young <sean@mess.org>
seanyoung added a commit to seanyoung/solang that referenced this pull request Sep 18, 2021
Requires:
 - solana-labs/solana#19548
 - solana-labs/solana#19318

Signed-off-by: Sean Young <sean@mess.org>
@seanyoung seanyoung deleted the return-data branch October 12, 2021 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants