Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: events: Add Lotus APIs to consume smart contract and built-in actor events #11618

Merged
merged 22 commits into from
Mar 5, 2024

Conversation

rvagg
Copy link
Member

@rvagg rvagg commented Feb 13, 2024

Related Issues

Proposed Changes

This work implements a new API in Lotus for clients to consume both smart contract and the upcoming built-in Actor events in FIP-0083

Replaces #11540 but with a linear git history and it's actually mergeable. I've squashed it down to a few major commits, retaining (mostly) the provenance. The result is identical to #11540 except for two very minor example doc changes ("ddata" instead of "data" for some cbor, it was done for one of the 3 and I didn't notice until now).

Additional Info

This would be a good one to rebase-merge since it's already linear and sits at the top of feat/nv22.

Checklist

Before you mark the PR ready for review, please make sure that:

  • Commits have a clear commit message.
  • PR title is in the form of of <PR type>: <area>: <change being made>
    • example: fix: mempool: Introduce a cache for valid signatures
    • PR type: fix, feat, build, chore, ci, docs, perf, refactor, revert, style, test
    • area, e.g. api, chain, state, market, mempool, multisig, networking, paych, proving, sealing, wallet, deps
  • If the PR affects users (e.g., new feature, bug fix, system requirements change), update the CHANGELOG.md and add details to the UNRELEASED section.
  • New features have usage guidelines and / or documentation updates in
  • Tests exist for new functionality or change in behavior
  • CI is green

@rvagg rvagg requested a review from rjan90 February 13, 2024 05:31
@rvagg rvagg requested a review from a team as a code owner February 13, 2024 05:31
@rvagg rvagg requested review from Stebalien and arajasek and removed request for a team February 13, 2024 05:31
@rvagg rvagg force-pushed the rvagg/feat/builtin-actor-events branch from f5a9aa0 to 8303b77 Compare February 13, 2024 05:35
Copy link
Contributor

@rjan90 rjan90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@rvagg rvagg force-pushed the rvagg/feat/builtin-actor-events branch from 8303b77 to 5ee59f6 Compare February 14, 2024 21:09
#
# type: bool
# env var: LOTUS_FEVM_ENABLEACTOREVENTSAPI
#EnableActorEventsAPI = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to not put this under Fevm?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussion started but not resolved in #11540 (review)

We currently have a Fevm section in config, not Fvm. Do you think this warrants starting a new config section for one item?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe an Events top level config option would be appropriate but that logically should move all of the current Fevm.Events options into it as well since this feature builds on top of it. But that probably needs a config section deprecation and duplicate handling during a deprecation period. I can do that if it makes sense but it seems heavy-handed.

Maybe the alternative is to put this in a new Events section alone for now and do a move of the other options at a later date as a cleanup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with a top-level Events config option, although I'd keep the ETH-specific options where they are (although... maybe they all generalize?).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the alternative is to put this in a new Events section alone for now and do a move of the other options at a later date as a cleanup.

Yeah, this sounds like a good idea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Events top level config makes sense. make sure we have decoupled eth api enablement vs. native event subscription enablement, while keeping the coupling between eth api enablement & FEVM event enablement

chain/types/actor_event.go Show resolved Hide resolved
node/builder_chain.go Outdated Show resolved Hide resolved
@rvagg
Copy link
Member Author

rvagg commented Feb 21, 2024

Notes from zoom chat just now with @Stebalien:

  • Epoch -> Height

  • Height should take an int

    • "latest" and "earliest" go and are replaced by:
    • empty/nil = latest (or next tipset), from and to, for both get and subscribe
    • 0 meaning actual 0
    • Warn that in future this may error if you specify a tipset earlier than which we have indexed events for (see below)
  • ~Prefill for subscription stays and means the same thing, just note that FromHeight in the subscription won't go any earlier unless Prefill is specified, otherwise it'll be rounded up to latest. (What about FromHeight in the future? That should be OK too but need to confirm.) See also notes below. We're removing the possibility to "earliest you have" so there's not a clear use-case for optionality here.

  • Config: top level "Events" with just that one item; future likely migrate most of the Fevm events stuff there.

  • TipSetKey instead of TipSetCid

  • Later TODO: add a "got events for tipset" for every tipset processed into the events index db, needs a minor migration - version + new table with just "tipset" processed. So we know whether we can provide events (which will be [] for earlier tipsets anyway but we don't know when we started processing events; also could we hardwire fevm start date to the earliest here? Figure that out later.).

  • Document Addr->AddrID for ethereum events change to come in the future, still be Address but will be an ID address in future.

Height abi.ChainEpoch `json:"height"`

// CID of the tipset that contained the message that produced this event.
TipSetCid cid.Cid `json:"tipsetCid"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed sync: Let's use tipsetKey, if possible. If that's not possible/efficient, oh well. But let's try to use a key for consistency.

Comment on lines 42 to 46
FromEpoch string `json:"fromEpoch,omitempty"`

// Interpreted as an epoch (in hex) or one of "latest" for last mined block, "earliest" for first,
// Optional, default: "latest".
ToEpoch string `json:"toEpoch,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Epoch -> Height for consistency.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to decide what to do if this and/or From will cause an error if they are later than the current heaviest. If we're going to do that with earliest eventually then it may be something to consider. It just means you can't be sloppy with your calls other than nil.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to do that with earliest eventually then it may be something to consider.

I'd say yes? You can just specify "nil" if you don't care.


// Interpreted as an epoch (in hex) or one of "latest" for last mined block, "earliest" for first,
// Optional, default: "latest".
FromEpoch string `json:"fromEpoch,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed:

  • Use integers.
  • Omit to mean "latest" in both cases. Or, for subscription, to mean "the next tipset".

// before any new real-time events that match the given filter are written.
// If `Prefill` is true and `FromEpoch` is set to latest, the pre-fill operation will become a no-op.
// if `Prefill` is false and `FromEpoch` is set to earliest, historical events will still be sent to the client.
Prefill bool `json:"prefill"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: remove. Prefill is implied if FromEpoch (now FromHeight) is non-nil.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Stebalien I have a minor mismatch from our discussion with Prefill, but I think maybe I'm remembering it wrong since we did a couple of loops. We came to the conclusion that the use-case for "earliest you have" isn't clear so we're doing away with that (eventually to error if you specify a height we don't actually have). Without Prefill on the subscribe then I think we can collapse SubActorEventFilter and do away with it entirely so the same ActorEventFilter is provided to both APIs. It does mean that you must use FromEpoch as a signal to decide whether to fish around in historical events or not. The only differences between the two APIs is (1) the chan and (2) the ability to reach into the future for both Height parameters for Subscribe whereas they are both maxxed at the current heaviest for Get.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can collapse SubActorEventFilter and do away with it entirely so the same ActorEventFilter is provided to both APIs.

Yes.

It does mean that you must use FromEpoch as a signal to decide whether to fish around in historical events or not

Yes.

The only differences between the two APIs is

That and, for "get" FromEpoch and ToEpoch default to "latest". For "subscribe", they default to "next" (i.e., the epoch after the current epoch. Subtle difference that nobody's going to care about but still technically a difference.

"github.com/filecoin-project/go-state-types/abi"
)

type ActorEventBlock struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not now: in the future, I'd like to consider automatically translating cbor to json when requested by the user.

Entries []EventEntry `json:"entries"`

// Filecoin address of the actor that emitted this event.
EmitterAddr address.Address `json:"emitter"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: just call it Emitter?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed changing this to ID, but decided to punt on that for now.

in := make(chan interface{}, 256)
fm.SetSubChannel(in)

for {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for ctx.Err() == nil. Select doesn't have a priority.

return nil, err
}

out := make(chan *types.ActorEvent, 25)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Given that we're not batching by block, this buffer is tiny (likely way too tiny) to simply disconnect the reader if we get stuck.

We likely need some leniency. E.g., the user gets to block for a few milliseconds at a time before we kill the reader. Or, even, the user gets some form of "budget" where they get to block for, say, 10% of the time? I.e., if they get to block for 3 seconds every 30s, they'll never really fall behind.

Or maybe just a larger buffer? I'm just worried about this API may be completely unusable in practice (given any amount of RPC latency).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A somewhat complex algorithm that should "work" (after prefill) is:

  1. Read from the event channel into a slice.
  2. Record the height of the event from the newest tipset observed. Don't make reduce this height if we hit a revert.
  3. If the height increases and the slice isn't empty, disconnect the reader.

Except when the node is "catching up" (probably not something we care about anyways), this will always give the reader a full block time to receive events but won't let them fall behind.

However, the downside is memory. Ideally the event filter manager would return batches of events (the same slice for every subscriber).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of implementation, one usually does this as follows:

var buffer []*types.ActorEvent
for ctx.Err() == nil {
    var ev *types.ActorEvent
    var ok bool
    if len(buffer) > 0 {
        select {
        case ev, ok = <-in:
        case out<-buffer[0]:
            buffer[0] = nil
            buffer = buffer[1:]
            continue
        case <-ctx.Done():
            return
        }
    } else {
        select {
        case ev, ok = <-in:
        case <-ctx.Done():
            return
        }
    }
    if !ok {
        return
    }
    // check if we need to disconnect the client because they've fallen behind.
    // ...
    
    // Otherwise, add the event to the buffer
    buffer = append(buffer, ev)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But... this is an implementation thing. We can fix this in a followup patch (no API changes).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For pre-filling, we'll likely just need some kind of minimum rate (e.g., one event per millisecond, maybe averaged out over some period of time).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Stebalien a case I've hit that I'm unsure about in terms of what patterns to follow with existing APIs - what about the case where we can clearly identify a halting condition, do we close the channel? If I specify a ToHeight that's 100 epochs in the future (or even do a subscribe that has the range entirely historical), does the API terminate itself? It seems like yes is the right answer to this, at least that's what I'd expect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I specify a ToHeight that's 100 epochs in the future

You mean on subscribe? Ideally you'd terminate when hitting the target height. Although it would nice to be able to signal success/error... i think that's a limitation of our JSON-RPC API more than anything.

@@ -1262,7 +1263,60 @@ func (e *EthEvent) EthGetFilterLogs(ctx context.Context, id ethtypes.EthFilterID
return nil, xerrors.Errorf("wrong filter type")
}

func (e *EthEvent) installEthFilterSpec(ctx context.Context, filterSpec *ethtypes.EthFilterSpec) (*filter.EventFilter, error) {
func parseBlockRange(heaviest abi.ChainEpoch, fromBlock, toBlock *string, maxRange abi.ChainEpoch) (minHeight abi.ChainEpoch, maxHeight abi.ChainEpoch, err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, all of this can be reverted. I think?

ActorEventAPI
}

func (a *ActorEventHandler) GetActorEvents(ctx context.Context, filter *types.ActorEventFilter) ([]*types.ActorEvent, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a limit here, although we may want to install that limit in the gateway? Do we even expose this API on the gateway?

I.e., I'm thinking:

  1. A max number of epochs.
  2. A max number of events to collect before aborting.

But, given those limits... we may want to change the API? E.g.:

  1. Return an object specifying the range of epochs we returned events from.
  2. Guarantee that we'll either return all events from an epoch, or none.

(I'm starting to become more convinced that we should have batched by epoch...).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really, we likely need:

  1. A Limit field in the ActorEventFilter (max number of events).
  2. Something in the gateway that caps the Limit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already have the latter, cfg.Fevm.Events.MaxFilterHeightRange exists already as part of fevm events and we're piggy backing that, defaults to 2880. Because it's old, it's going to be difficult to move it to the new top-level Events object but it would be nice for them to sit together ... maybe we need to consider a config "migration", or at least having a config unifying thing run at startup to support old + new for these events values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's probably fine? But then we need to make sure to apply that in the gateway proxy for the events API.

Or are we applying that deep in the system? If so... ugh, that's not how we do things... but oh well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, thankfully not, it's at the parse-inputs phase, which is one main purpose of the parseBlockRange, which itself was an extraction of existing code for fevm event filter code.

What I'm doing now is keeping that function intact for the fevm case so we leave that functionality as is, but implementing a simpler one for parseHeightRange for this API that applies the same limit but also gets us from inputs to the internal filter representation. That turns out to be a bit tricky to unify because the "earliest" and "latest" stuff comes from the existing fevm events filter API. So I'm just trying to make sure tests cover the edges of the conversions.

@rvagg
Copy link
Member Author

rvagg commented Mar 1, 2024

Rebased and retargeted to release/v1.26.0 branch. Got a bunch of updates that I worked on in #11649 if you find it easier to review the incremental bits there.

@jennijuju
Copy link
Member

Plz follow the pr title format (in pr template) and update the title accordingly! (Our release note script takes PR title

@masih masih force-pushed the rvagg/feat/builtin-actor-events branch from 15448eb to 8151c75 Compare March 1, 2024 18:07
@masih masih changed the title Built-in actor events (w/ re-crafted commit history) feat: events: Add Lotus APIs to consume smart contract and built-in actor events Mar 1, 2024
@@ -6,14 +6,14 @@ import (
"time"

"github.com/ipfs/go-cid"
"github.com/raulk/clock"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gah.... upstream is now unmaintained and we have two forks of this library, where unmaintained upstream has nice features like clock.WithTimeout(ctx, timeout).

Upstream: https://github.com/benbjohnson/clock

Now I'm going to go and update raul's fork...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I would have used benbjohnson/clock, but that got dropped last year; raulk/clock is used throughout lotus so I just went with it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know, upstream works (even if it's archived). IMO, we should just use it. #11677

rvagg and others added 16 commits March 5, 2024 12:44
Use BlockDelay as the window for receiving events on the SubscribeActorEvents
channel. We expect the user to have received the initial batch of historical
events (if any) in one block's time. For real-time events we expect them to
not fall behind by roughly one block's time.
Reduce verbosity and remove duplicate test logic from actor event types
JSON marshalling tests.
Add missing `s` to `actor_events` test file to follow golang convention
used across the repo.
Refactor `map` usage for actor event table tests to ensure deterministic
test execution order, making debugging potential issues easier. If
non-determinism is a target, leverage Go's built-in parallel testing
capabilities.
Use a fresh context to remove the temporary filter installed solely to
get the actor events. This should reduce chances of failure in a case
where the original context may be expired/cancelled.

Refactor removal into a `defer` statement for a more readable, concise
return statement.
Improve determinism in actor event tests by using a fixed RNG seed. This
makes up a more reproducible test suit.
Use the functionalities already provided by `testify` to assert eventual
conditions, and remove the use of `time.Sleep`.

Remove duplicate code in utility functions that are already defined.

Refactor assertion helper functions to use consistent terminology:
"require" implies fatal error, whereas "assert" implies error where the
test may proceed executing.
@rvagg rvagg force-pushed the rvagg/feat/builtin-actor-events branch from 8151c75 to a7da65d Compare March 5, 2024 01:44

// for the case where we have a MaxHeight set, we don't get a signal from the filter when we
// reach that height, so we need to check it ourselves, do it now but also in the loop
if params.MaxHeight > 0 && a.chain.GetHeaviestTipSet().Height() > params.MaxHeight {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is racy. Can we just get the filter manager to return this information?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we can just check when we receive the first event.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This point is the same for the nextBacklogHeightUpdate comment below: in the case of a restrictive filter, we may not get an event to use for our check for a long time, or ever, so we can't rely on looking at new events to check for height. e.g. the itest in here won't terminate if we don't have these checks because it does a subscribe with a height termination condition and no further activity. This is a reasonable case when you consider how specific a filter can be.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. You're right. Can we check before we start the filter? My concern here and above is that we:

  1. Install the filter.
  2. Then we get the tipset height.

The tipset could advance in the meantime and we could end up missing some events.

I mean, we could miss events anyways due to reorgs, but that's less likely. So we're probably good (after filing an issue) to land this as long as we check the tipset height first.

node/impl/full/actor_events.go Show resolved Hide resolved
node/impl/full/actor_events.go Outdated Show resolved Hide resolved
node/impl/full/actor_events.go Show resolved Hide resolved
Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with:

  1. File an issue to fix the timing stuff.
  2. Get the initial "heaviest tipset" before we subscribe to reduce the race a bit.


// for the case where we have a MaxHeight set, we don't get a signal from the filter when we
// reach that height, so we need to check it ourselves, do it now but also in the loop
if params.MaxHeight > 0 && a.chain.GetHeaviestTipSet().Height() > params.MaxHeight {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. You're right. Can we check before we start the filter? My concern here and above is that we:

  1. Install the filter.
  2. Then we get the tipset height.

The tipset could advance in the meantime and we could end up missing some events.

I mean, we could miss events anyways due to reorgs, but that's less likely. So we're probably good (after filing an issue) to land this as long as we check the tipset height first.

node/impl/full/actor_events.go Show resolved Hide resolved
api/api_full.go Outdated Show resolved Hide resolved
api/api_full.go Outdated Show resolved Hide resolved
@rvagg rvagg merged commit 4dcd658 into release/v1.26.0 Mar 5, 2024
89 of 90 checks passed
@rvagg rvagg deleted the rvagg/feat/builtin-actor-events branch March 5, 2024 05:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 P1: Must be resolved
Projects
Status: ☑️ Done (Archive)
Development

Successfully merging this pull request may close these issues.

6 participants