Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[STAB-19] Reduce the size of the critical section #16

Merged
merged 1 commit into from
May 1, 2023

Conversation

lcwik
Copy link

@lcwik lcwik commented Apr 28, 2023

This change makes a copy of runTx and does the following:

  • moves transaction descoding and tx validation before lock acquisition
  • moves creation of the response after releasing the lock
  • removes support for the post handler (unused by dYdX) and extracts out the only code that is executed in runMsgs allowing us to avoid the creation of the runMsgCtx and its associated MultiStore
  • removes consumeBlockGas since it is only executed during deliverTx

Note that this change allowed for checkTx to scale up to ~600 place orders/second before running out of CPU on an m5d.large:
image

@linear
Copy link

linear bot commented Apr 28, 2023

STAB-19

@lcwik lcwik force-pushed the lukestab19 branch 2 times, most recently from 658fe28 to e30fcbc Compare May 1, 2023 20:16
@lcwik lcwik marked this pull request as ready for review May 1, 2023 20:31
anteEvents = events.ToABCIEvents()
}

if mode == runTxModeCheck {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Given that this function only runs in CheckTx or RecheckTx, assert this condition at the beginning of the function and drop this if branch

Copy link
Author

@lcwik lcwik May 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need the call since recheckTx shouldn't insert the transaction into the mempool again but I will add the check to the top to ensure that this is used only for the appropriate modes.

This change makes a copy of `runTx` and does the following:
* moves transaction descoding and tx validation before lock acquisition
* moves creation of the response after releasing the lock
* removes support for the post handler (unused by dYdX) and extracts out the only code that is executed in `runMsgs` allowing us to avoid the creation of the `runMsgCtx` and its associated `MultiStore`
* removes `consumeBlockGas` since it is only executed during `deliverTx`
@lcwik lcwik merged commit 6477dcf into dydx-fork-v0.47.1 May 1, 2023
@lcwik lcwik deleted the lukestab19 branch May 1, 2023 21:58
// Note, gas execution info is always returned. A reference to a Result is
// returned if the tx does not run out of gas and if all the messages are valid
// and execute successfully. An error is returned otherwise.
func (app *BaseApp) runCheckTxConcurrently(mode runTxMode, txBytes []byte) (gInfo sdk.GasInfo, result *sdk.Result, anteEvents []abci.Event, priority int64, err error) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I think there are duplicate code between this new method and the runTx method.

Is there room to refactor out common components? Or is the goal here to keep the runTx unmodified to keep the rebasing easier in the future? Or something else?

How do we plan to keep this method in sync w future updates to runTx?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I think there are duplicate code between this new method and the runTx method.

Is there room to refactor out common components? Or is the goal here to keep the runTx unmodified to keep the rebasing easier in the future? Or something else?

The goal was to keep the rebasing simple and also to cut the function down to what is needed for dYdX v4.

How do we plan to keep this method in sync w future updates to runTx?

Our e2e tests that execute through the ABCI++ flow should cover most of our needs but an audit of the differences in runTx versions during upgrades would help until we have more e2e tests.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

audit of the differences in runTx

Do you think it's worth listing a checklist in our readme?

We also prob want to add an item to audit our app/ante impl, which we have been tracking as one offs using tickets like this, but I think it's better to create a checklist somewhere more visible

// NOTE: GasWanted should be returned by the AnteHandler. GasUsed is
// determined by the GasMeter. We need access to the context to get the gas
// meter, so we initialize upfront.
var gasWanted uint64
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, gas is only considered when we are in deliverTx mode. Is this the correct understanding?

If so, do we need to do any gas related work here (i.e. setting gasWanted and recovery func below)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, gas is only considered when we are in deliverTx mode. Is this the correct understanding?

We check that the transaction has enough gas during CheckTx and consume the gas during DeliverTx. We fail the CheckTx if it wouldn't have enough gas.

If so, do we need to do any gas related work here (i.e. setting gasWanted and recovery func below)?

Yes, we have to return any gas computations from CheckTx back to the caller.

// bytes, and the decoded transaction itself. All state transitions occur through
// a cached Context depending on the mode provided.
//
// Note, gas execution info is always returned. A reference to a Result is
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gas execution info is always returned ... tx does not run out of gas

Is this true? i think gas doesn't matter unless we run DeliverTx mode and I think we only run this w checkTx or recheckTx. This comment doesn't seem super relevant

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

panic("runCheckTxConcurrently can only be invoked for CheckTx and RecheckTx.")
}

// Strip out the post handler
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we add more comments why this is not supported?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do.

if mode == runTxModeCheck {
err = app.mempool.Insert(ctx, tx)
if err != nil {
result = nil
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we also want to set anteEvents or priority to the default error value? if not, why?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return gInfo, result, anteEvents, priority, sdkerrors.Wrap(err, "failed to marshal tx data")
}

msgLogs := make(sdk.ABCIMessageLogs, 0, len(msgs))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this var always empty? if so, should we just set Log to an empty string?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was to keep compatibility with the ABCIMessageLogs.String function incase if it was empty it would choose to return something other then an empty string (e.g. empty json list formatted as string).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we perhaps add a comment?

result = &sdk.Result{
Data: data,
Log: strings.TrimSpace(msgLogs.String()),
Events: sdk.EmptyEvents().ToABCIEvents(),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we add comments why we set the events to empty?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

MsgResponses: msgResponses,
}

// We don't support the post handler specifically to avoid creating a branched MultiStore and since dYdX
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we move this logic to line 625?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally had the comment here because this is where the PostHandler logic was in runTx

@@ -609,6 +609,141 @@ func (app *BaseApp) cacheTxContext(ctx sdk.Context, txBytes []byte) (sdk.Context
return ctx.WithMultiStore(msCache), msCache
}

// runCheckTxConcurrently processes a transaction with either the checkTx or recheckTx modes, encoded transaction
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any tests for this function? Can we create a tech-debt ticket to track if you are working on higher priority item?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I specifically added the concurrent matches and cancels tests to cover most of the concurrent scenarios here, also updated the simulator to use this function over runTx, and we have some coverage from the CosmosSDKs tests themselves. I feel as though this would be sufficient but if you think there are certain scenarios you would want covered in addition to the above I would open up a ticket around this.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I agree that this feels sufficient for now. Thanks for clarifying!

lcwik added a commit that referenced this pull request Aug 22, 2023
lcwik added a commit that referenced this pull request Aug 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants