ABI v2 Proposal #19191

Lichtso · 2021-08-11T20:02:33Z

Roadmap

Design

See document of this PR.

Preparation

#15410, #17898, solana-labs/rbpf#196 and #19469, #19762, #20034, #20165, #20290, #20308, #20448, #20540, #20598, #20785, #20881, #20954, #21108, #21180, #21185, #21205, #21395, #21404, #21545, #21563, #21574, #21671, #21882, #21927, #22102, #22107, #22111, #22165

Runtime

Implementation of ABI v2 internals and inter-operation with ABI v1 by mounting the old ABI on top of the new one.
#21706, #22226, #22274

Native programs

Replace the use of InvokeContext and KeyedAccount of ABI v1 by InstructionContext and BorrowedAccount of ABI v2.

BPF programs

Add a new loader with a different entrypoint and syscalls: CPI, return data, realloc.
See dev branch.

Testing

Migrate and adjust the entire test suite around message processing.

Benchmarking

Measure the limits of ABIv2.

docs/src/proposals/abi-v2.md

dmakarov

Looks good to me.

jon-chuang · 2021-08-13T03:38:13Z

To piggyback on ABI v2, we've also identified the need for AccountInfo.owner to be Rc<RefCell<&'a mut Pubkey>> instead of immutable &'a Pubkey in order to avoid unsafe when reassigning the owner.

Not sure if this should be squeezed in to this proposal as "miscellaneous changes"

Pull request has been modified.

ryoqun · 2021-08-13T15:35:11Z

dumb question: why all accounts needs to be passed wholly including account.data, always? why can't we provide set of api (syscalls?) to returns handles to n-th account's (address, owner, is_signed, data, etc) as separate functions?

I mean, as our defi composition increases, i think there will be more situations where programs are just passing accounts down to CPI-ed instructions. for example, raydium instructions are just passing serum dex's (large) event queue account and its program bpf account. In these cases, just account address are needed, if i'm right? there could be wasted cycles even with this proposal? Or, behind-the-scene ondemand (page fault in traditional mmu sense) MemoryRegion could alleviate that?

also my (long-neglected) traced dynamic account loading proposal (#17796 ) would be more fit with the my dumb api approach.

Lichtso · 2021-08-13T19:33:26Z

@ryoqun I admit that the the proposal document does not make the most important point totally obvious and needs a better formulation: The proposed solution would serialize the accounts meta data just once per transaction (instead of twice for every instruction call and return) and then map it by reference together with the account payload data / content.

dumb question: why all accounts needs to be passed wholly including account.data, always? why can't we provide set of api (syscalls?) to returns handles to n-th account's (address, owner, is_signed, data, etc) as separate functions?

Syscalls are harder to optimize and more expensive than memory accesses (even with address translation). Thus, we try to shift accessors from syscalls to memory if they don't need to trigger anything in the runtime immediately.

I mean, as our defi composition increases, i think there will be more situations where programs are just passing accounts down to CPI-ed instructions. for example, raydium instructions are just passing serum dex's (large) event queue account and its program bpf account. In these cases, just account address are needed, if i'm right?
there could be wasted cycles even with this proposal?

This way there would be no more memory copies involved and passing everything (mind the pointer to everything, not a copy thereof) is cheaper than selecting what is needed (as that requires interaction between programs and the runtime via syscalls).

Or, behind-the-scene ondemand (page fault in traditional mmu sense) MemoryRegion could alleviate that?

We don't even need to do the mapping on demand as we have constant time (O(1)) address translation in RBPF now.

also my (long-neglected) traced dynamic account loading proposal (#17796 ) would be more fit with the my dumb api approach.

Not sure why these two are in conflict. Because dynamically adding / removing accounts is harder if their meta-data is serialized once at the beginning of a transaction?

ryoqun · 2021-08-14T18:37:40Z

I admit that the the proposal document does not make the most important point totally obvious and needs a better formulation: The proposed solution would serialize the accounts meta data just once per transaction (instead of twice for every instruction call and return) and then map it by reference together with the account payload data / content.

aha! now cleared my mind. :) so, this is also friendly with justin's transaction v2 (moar accounts!) and gpu (avoid system memory access from gpu by syscalls), i guess?

Because dynamically adding / removing accounts is harder if their meta-data is serialized once at the beginning of a transaction?

yeah. but come to think of, column-major serialization scheme maybe easy to extend to dynamically load accounts at end of areas for each field, right?

Also, as for the traced dynamic account proposal (not about abi v2), mutating state of &[AccountInfo] (which could be onstack) like .len() could be daunting as execution progresses to maintain no observability between simulation or on-chain execution... maybe dynamically loaded accounts special marker (tx format change, ugh)...?

docs/src/proposals/abi-v2.md

jackcmay · 2021-08-18T22:35:09Z

docs/src/proposals/abi-v2.md

+- Instruction context:
+  - `ParentStackFrame = 8`: `Map`
+  - `InstructionData = 9`: `[u8]`
+  - `NumberOfAccountsInInstruction = 10`: `u32`


These could be u16

Sure but saving two bytes here won't really make a dent.
Compressing all boolean arrays into bit vectors might help a lot more.

I was referring to all the u32s in these contexts. But vector boolean would save space, though unpacking bit-arrays in the program might not be worth it.

docs/src/proposals/abi-v2.md

jackcmay · 2021-08-18T22:36:22Z

docs/src/proposals/abi-v2.md

+  - `NumberOfAccountsInTransaction = 1`: `u32`
+  - `AccountKey = 2`: `[Pubkey; NumberOfAccountsInTransaction]`
+  - `AccountIsExecutable = 3`: `[bool; NumberOfAccountsInTransaction]`
+  - `AccountOwner = 4`: `[Pubkey; NumberOfAccountsInTransaction]`


Are these duplicates of the rw fields below?

Yes, and I am not happy with that yet. The problem arises because some values of this attribute are read-only while others are writable.

I see, the duplicate entries are sure to cause confusion. Executable, owner, lamports, etc... are all together writeable or not, can they be in the same region with a marker of whether the region (account) is rw or not?

docs/src/proposals/abi-v2.md

jackcmay · 2021-08-18T22:37:53Z

docs/src/proposals/abi-v2.md

+  - `ProgramAccountIndex = 12`: `u32`
+  - `AccountIsSigner = 13`: `[bool; NumberOfAccountsInInstruction]`
+  - `AccountIsWritable = 14`: `[bool; NumberOfAccountsInInstruction]`
+  - `WritableAttributes = 15`: `Map`


Why isn't this the same as AccountIsSigner?

WritableAttributes is a pointer to the read-write meta data region.
If that was the question.

Ah, maybe a pointer to the "possibly writable" attributes and AccountIsWritable indicates if they are writable or not?

docs/src/proposals/abi-v2.md

docs/src/proposals/sbf-program-abi-v2.md

jackcmay · 2021-08-19T11:48:50Z

docs/src/proposals/sbf-program-abi-v2.md

+  - `InvocationStackHeight = 6`: `u16`
+  - `InvocationStack = 7`: `[Map; InvocationStackHeight]`
+- Instruction context:
+  - `InstructionData = 8`: `[u8]`


where is the length of the instruction data communicated?

Currently only by the value slice (difference between adjacent value_offsets).
However that might include padding, so yes, we should add an explicit InstructionDataLength attribute.

docs/src/proposals/sbf-program-abi-v2.md

stale · 2021-08-28T16:50:46Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale · 2021-09-11T00:30:05Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

brooksprumo · 2021-11-24T15:52:46Z

I know this PR has been around for a while, so pardon the late comment. In a previous role I used Flatbuffers to do serialization from an embedded IoT device to send to the cloud for processing. Without fully understanding our use-case here, I wonder if Flatbuffers may work. Then that's one fewer thing we need to create and maintain ourselves.

Flatbuffers is designed to be memory and runtime efficient, designed for mobile/games where those are a premium. Data in in little endian. It can support ABI changes over time. You can access the flatbuffer directly without needing to parse/unpack first.

It looks like the Rust version has a few features missing compared to the C++ version, but figured I'd pass thing along in case it was helpful!

Lichtso · 2021-11-24T16:35:54Z

Thanks, I just scanned over it and will read it in detail once I get back to this PR.

One thing which almost all modern serialization formats (not older formats such as Property list) are missing, and which we require for memory mapping of accounts, is an approach to indirection such as references and pointers. It seems tables in Flatbuffers might go in that direction, but I have to read further.

The other problem is that essential all serialization formats are row major encodings, and a column major encoding might be better for defining memory access permissions. But, I am not entirely convinced either way yet.

jstarry · 2021-12-15T16:39:46Z

docs/src/proposals/sbf-program-abi-v2.md

+In the old ABI return data was implemented by a setter and a getter syscall for copying data forth and back between userspace and runtime. But now, we can use shared memory in the transaction context instead. So, the two syscalls will be deprecated as there is no need to trigger anything in the runtime.
+
+#### Account Reallocation / Resizing
+TODO


@Lichtso do you have any initial thoughts about how new account allocations will work? Could it be potentially dynamic such that CPI's could allocate many MB's of account data if needed? cc @brooksprumo

maybe if the TX specifies the expected allocations in the TX

stale · 2022-01-09T03:04:33Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

Adds explanation for direct mapping and placeholder for syscall sections.

Removes the integration section as that is part of the roadmap.

ryoqun · 2022-02-16T13:23:34Z

docs/src/proposals/sbf-program-abi-v2.md

+  - Account Pubkeys
+  - Owner Pubkeys
+  - Is-executable flags
+  - Rent epochs


i think we can stop passing this field (rent_epoch) as we're effectively discontinuing rent-paying account support. This enables abi vi2 to squeeze cycles and reduce attack surface.

Agreed, however currently it is still necessary as ABIv1 is now running on-top of ABIv2 so it needs to be passed through for backward compatibility. But, we might be able to hide the attribute / property field in ABIv2 BPF programs.

Lichtso requested review from dmakarov and jackcmay August 11, 2021 20:02

Lichtso added enhancement New feature or request research Open question / topic of discussion labels Aug 11, 2021

dmakarov reviewed Aug 12, 2021

View reviewed changes