Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new event api #2408

Closed
armaniferrante opened this issue Feb 27, 2023 · 8 comments · Fixed by #2438
Closed

new event api #2408

armaniferrante opened this issue Feb 27, 2023 · 8 comments · Fixed by #2438

Comments

@armaniferrante
Copy link
Member

armaniferrante commented Feb 27, 2023

Problem

Currently, anchor events are emitted through logs. This is potentially problematic since logs have a maximum size per transaction. And so one might emit an event and that event can be not present in the transaction.

Solution

Instead, perhaps a better (hack) would be to add to the anchor codegeneration a special instruction for events that does nothing. Then, when emit!(event) is called, we recursively CPI into the program with that instruction and the serialized event as data. Indexers can then use this to reliably build a table.

@jarry-xiao
Copy link

I would highly recommend adding a PDA log authority for this if it does get implemented. This makes it so there's no way to spoof log transactions through CPI from another program.

The PDA makes it so the only way you can invoke the log event is through self CPI.

In theory it should always be possible to properly index regardless of the log authority, but you run into far fewer edge cases. Unpacking the inner instruction data of the transaction tree is really hard because stack depth is not found in the returned object, and with this change you can be confident that any inner instruction with the log instruction is valid.

Not an easy change by any means, but it would be a win for overall devex if it was built into Anchor:

Here's an example of how this works in a native contract (with a log authority): https://github.com/Ellipsis-Labs/phoenix-v1/blob/master/src/lib.rs#L111

The invoke: https://github.com/Ellipsis-Labs/phoenix-v1/blob/master/src/program/event_recorder.rs#L133

It's tricky to optimize because you really would like to densely pack all of the emits to minimize the number of self CPIs to make. The normal emit! covers the 80/20 for contracts that don't need literally 100% of their data preserved, but you can probably implement a generalized non-optimized safe emit.

@armaniferrante
Copy link
Member Author

Note: one tradeoff with the PDA log authority is that it adds another 32 bytes to the transaction, which is a tough price to pay.

@Henry-E
Copy link

Henry-E commented Feb 28, 2023

Jarry made some good points. There's so much wrong about using CPI to log data, from the fact it's a hack in the first place to the limitations it places on how often you can log data since you probably just want a single CPI. The current emit work fine for most people.

Adding support to anchor would just mean having a simpler helper function for sending the CPI and serializing the data needed, plus taking care of the parsing in the clients. It makes sense to have this although I can't imagine it would be in much demand except by the most intense projects.

Is Solana never going to fix the core issue here with logs dying randomly? The CPI transaction data approach is such a hack but it also shows that data can survive undamaged from the transaction. I can't believe they're not working on a solution.

@jarry-xiao
Copy link

jarry-xiao commented Feb 28, 2023

I think it's a beautiful hack. It's one of my proudest discoveries

I think the log issue is a lot tougher than it might seem on the surface because logging is handled through syscalls. Also the buffer limit is set on a validator basis. There should be a log limit per program or an exclusive buffer for program logs.

You can also use set_return_data, but I think that's limited whereas you can CPI multiple times

@ngundotra
Copy link
Contributor

ngundotra commented Mar 13, 2023

IIUC benefit of self-cpi is that you reduce tx size by 32 bytes by removing a separate noop program key. Therefore, It doesn't make sense to self-cpi if you're going to require an additional PDA.

Otherwise might as well just deploy a no-op program that hardcodes its list of caller programs.

@jarry-xiao
Copy link

IIUC benefit of self-cpi is that you reduce tx size by 32 bytes by removing a separate noop program key. Therefore, It doesn't make sense to self-cpi if you're going to require an additional PDA.

I think there's a tradeoff, and IMO having the PDA is far safer and worth the extra 32 bytes. Adding the extra key makes handling the log parsing far less error prone. Given the fact that the tx object doesn't have stack depth, it's not very difficult to effectively trick the indexer into storing bad data by calling the Log instruction directly.

Example:

  • Sample program A CPI's into target program T's log instruction and logs either some nonsense or false data
  • Downstream indexer fails to detect the malicious pattern (it's probably possible to catch, just more difficult)
  • DB corruption

If the PDA signs, the filter of tx success && instruction == T::log is sufficient to know that the data is safe to record

@ngundotra
Copy link
Contributor

ngundotra commented Mar 17, 2023

There's clearly some contention on what the "best" way to log data should be.

I've tagged a relevant rough draft PR of adding a no-op instruction to Anchor programs that does not require a PDA to sign (sorry Jarry).

Any issues with indexing (ie CPI stack depth not being returned by RPC results) should be addressed via a SIMD or at least through forum discussion on the sRFC forums.

Let me know what y'all think :)

@jarry-xiao
Copy link

jarry-xiao commented Mar 17, 2023

I've tagged a relevant rough draft PR of adding a no-op instruction to Anchor programs that does not require a PDA to sign (sorry Jarry).

The PDA adds more complexity, but IMO the important thing is that the indexing is done accurately, and it makes it bulletproof.

Even with stack depth, having an openly exposed log instruction won't make indexing easy. There are lots of adversarial cases that still need to be checked, I think it's still very possible to mess it up.

I think the condition that the Log instruction from program A must immediately be preceded by program A AND the stack depth has increased by 1 is sufficient for indexing without a PDA signer. As long as the logic to do this is present in the parsing code, the implementation is probably acceptable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants