-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ABI v2 Proposal #19191
ABI v2 Proposal #19191
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
To piggyback on ABI v2, we've also identified the need for Not sure if this should be squeezed in to this proposal as "miscellaneous changes" |
dumb question: why all accounts needs to be passed wholly including account.data, always? why can't we provide set of api (syscalls?) to returns handles to n-th account's (address, owner, is_signed, data, etc) as separate functions? I mean, as our defi composition increases, i think there will be more situations where programs are just passing accounts down to CPI-ed instructions. for example, raydium instructions are just passing serum dex's (large) event queue account and its program bpf account. In these cases, just account address are needed, if i'm right? there could be wasted cycles even with this proposal? Or, behind-the-scene ondemand (page fault in traditional mmu sense) MemoryRegion could alleviate that? also my (long-neglected) traced dynamic account loading proposal (#17796 ) would be more fit with the my dumb api approach. |
@ryoqun I admit that the the proposal document does not make the most important point totally obvious and needs a better formulation: The proposed solution would serialize the accounts meta data just once per transaction (instead of twice for every instruction call and return) and then map it by reference together with the account payload data / content.
Syscalls are harder to optimize and more expensive than memory accesses (even with address translation). Thus, we try to shift accessors from syscalls to memory if they don't need to trigger anything in the runtime immediately.
This way there would be no more memory copies involved and passing everything (mind the pointer to everything, not a copy thereof) is cheaper than selecting what is needed (as that requires interaction between programs and the runtime via syscalls).
We don't even need to do the mapping on demand as we have constant time (O(1)) address translation in RBPF now.
Not sure why these two are in conflict. Because dynamically adding / removing accounts is harder if their meta-data is serialized once at the beginning of a transaction? |
aha! now cleared my mind. :) so, this is also friendly with justin's transaction v2 (moar accounts!) and gpu (avoid system memory access from gpu by syscalls), i guess?
yeah. but come to think of, column-major serialization scheme maybe easy to extend to dynamically load accounts at end of areas for each field, right? Also, as for the traced dynamic account proposal (not about abi v2), mutating state of |
docs/src/proposals/abi-v2.md
Outdated
- Instruction context: | ||
- `ParentStackFrame = 8`: `Map` | ||
- `InstructionData = 9`: `[u8]` | ||
- `NumberOfAccountsInInstruction = 10`: `u32` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These could be u16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure but saving two bytes here won't really make a dent.
Compressing all boolean arrays into bit vectors might help a lot more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to all the u32s in these contexts. But vector boolean would save space, though unpacking bit-arrays in the program might not be worth it.
docs/src/proposals/abi-v2.md
Outdated
- `NumberOfAccountsInTransaction = 1`: `u32` | ||
- `AccountKey = 2`: `[Pubkey; NumberOfAccountsInTransaction]` | ||
- `AccountIsExecutable = 3`: `[bool; NumberOfAccountsInTransaction]` | ||
- `AccountOwner = 4`: `[Pubkey; NumberOfAccountsInTransaction]` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these duplicates of the rw fields below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and I am not happy with that yet. The problem arises because some values of this attribute are read-only while others are writable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, the duplicate entries are sure to cause confusion. Executable, owner, lamports, etc... are all together writeable or not, can they be in the same region with a marker of whether the region (account) is rw or not?
docs/src/proposals/abi-v2.md
Outdated
- `ProgramAccountIndex = 12`: `u32` | ||
- `AccountIsSigner = 13`: `[bool; NumberOfAccountsInInstruction]` | ||
- `AccountIsWritable = 14`: `[bool; NumberOfAccountsInInstruction]` | ||
- `WritableAttributes = 15`: `Map` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't this the same as AccountIsSigner
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WritableAttributes
is a pointer to the read-write meta data region.
If that was the question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, maybe a pointer to the "possibly writable" attributes and AccountIsWritable
indicates if they are writable or not?
- `InvocationStackHeight = 6`: `u16` | ||
- `InvocationStack = 7`: `[Map; InvocationStackHeight]` | ||
- Instruction context: | ||
- `InstructionData = 8`: `[u8]` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is the length of the instruction data communicated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently only by the value slice (difference between adjacent value_offset
s).
However that might include padding, so yes, we should add an explicit InstructionDataLength
attribute.
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
e7fb4d8
to
235ad30
Compare
235ad30
to
275fea6
Compare
I know this PR has been around for a while, so pardon the late comment. In a previous role I used Flatbuffers to do serialization from an embedded IoT device to send to the cloud for processing. Without fully understanding our use-case here, I wonder if Flatbuffers may work. Then that's one fewer thing we need to create and maintain ourselves. Flatbuffers is designed to be memory and runtime efficient, designed for mobile/games where those are a premium. Data in in little endian. It can support ABI changes over time. You can access the flatbuffer directly without needing to parse/unpack first. It looks like the Rust version has a few features missing compared to the C++ version, but figured I'd pass thing along in case it was helpful! |
Thanks, I just scanned over it and will read it in detail once I get back to this PR. One thing which almost all modern serialization formats (not older formats such as Property list) are missing, and which we require for memory mapping of accounts, is an approach to indirection such as references and pointers. It seems tables in Flatbuffers might go in that direction, but I have to read further. The other problem is that essential all serialization formats are row major encodings, and a column major encoding might be better for defining memory access permissions. But, I am not entirely convinced either way yet. |
In the old ABI return data was implemented by a setter and a getter syscall for copying data forth and back between userspace and runtime. But now, we can use shared memory in the transaction context instead. So, the two syscalls will be deprecated as there is no need to trigger anything in the runtime. | ||
|
||
#### Account Reallocation / Resizing | ||
TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Lichtso do you have any initial thoughts about how new account allocations will work? Could it be potentially dynamic such that CPI's could allocate many MB's of account data if needed? cc @brooksprumo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe if the TX specifies the expected allocations in the TX
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
Adds explanation for direct mapping and placeholder for syscall sections.
Removes the integration section as that is part of the roadmap.
de8bd9c
to
feb3a0f
Compare
- Account Pubkeys | ||
- Owner Pubkeys | ||
- Is-executable flags | ||
- Rent epochs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we can stop passing this field (rent_epoch
) as we're effectively discontinuing rent-paying account support. This enables abi vi2 to squeeze cycles and reduce attack surface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, however currently it is still necessary as ABIv1 is now running on-top of ABIv2 so it needs to be passed through for backward compatibility. But, we might be able to hide the attribute / property field in ABIv2 BPF programs.
Fixes: #14523
Roadmap
Design
See document of this PR.
Preparation
#15410, #17898, solana-labs/rbpf#196 and #19469, #19762, #20034, #20165, #20290, #20308, #20448, #20540, #20598, #20785, #20881, #20954, #21108, #21180, #21185, #21205, #21395, #21404, #21545, #21563, #21574, #21671, #21882, #21927, #22102, #22107, #22111, #22165
Runtime
Implementation of ABI v2 internals and inter-operation with ABI v1 by mounting the old ABI on top of the new one.
#21706, #22226, #22274
Native programs
Replace the use of
InvokeContext
andKeyedAccount
of ABI v1 byInstructionContext
andBorrowedAccount
of ABI v2.BPF programs
Add a new loader with a different entrypoint and syscalls: CPI, return data, realloc.
See dev branch.
Testing
Migrate and adjust the entire test suite around message processing.
Benchmarking
Measure the limits of ABIv2.