Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Access] Connection pool evictions cause connection failures #4534

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
54670af
Implemented atomic request counter
Guitarheroua Jun 29, 2023
fa9acbc
Implemented graceful close and test.
Guitarheroua Jun 30, 2023
f906baa
Added comments
Guitarheroua Jun 30, 2023
49f751e
Merge branch 'master' into guitarheroua/2833-conn-pool-evictions-caus…
Guitarheroua Jun 30, 2023
3886060
Merge branch 'master' into guitarheroua/2833-conn-pool-evictions-caus…
Guitarheroua Jul 5, 2023
d80f544
Update engine/access/rpc/backend/connection_factory_test.go
Guitarheroua Jul 7, 2023
721dd84
Merge branch 'master' into guitarheroua/2833-conn-pool-evictions-caus…
Guitarheroua Jul 7, 2023
cfdf79a
Merge branch 'master' into guitarheroua/2833-conn-pool-evictions-caus…
Guitarheroua Jul 10, 2023
7b7658e
Fixed remarks
Guitarheroua Jul 11, 2023
9089a1e
Merge branch 'master' into guitarheroua/2833-conn-pool-evictions-caus…
Guitarheroua Jul 17, 2023
02be088
Fixed remarks
Guitarheroua Jul 17, 2023
e2547b6
Added evicting cache client test
Guitarheroua Jul 18, 2023
fb08d0d
Merge branch 'master' into guitarheroua/2833-conn-pool-evictions-caus…
Guitarheroua Jul 18, 2023
7e9e885
Merge branch 'guitarheroua/2833-conn-pool-evictions-cause-conn-failur…
Guitarheroua Jul 18, 2023
8aa1593
Merge branch 'master' into guitarheroua/2833-conn-pool-evictions-caus…
Guitarheroua Jul 18, 2023
163cee6
Merge branch 'guitarheroua/2833-conn-pool-evictions-cause-conn-failur…
Guitarheroua Jul 18, 2023
fbd8446
Added test TestExecutionEvictingCacheClients
Guitarheroua Jul 18, 2023
3412d68
Merge branch 'master' into guitarheroua/2833-conn-pool-evictions-caus…
Guitarheroua Jul 19, 2023
43bea66
Return add to retrive connection.
Guitarheroua Jul 19, 2023
13c49c8
Merge branch 'guitarheroua/2833-conn-pool-evictions-cause-conn-failur…
Guitarheroua Jul 19, 2023
811f83b
spike to improve locking in grpc connection factory
peterargue Jul 19, 2023
e09ba32
remove cache from factory
peterargue Jul 19, 2023
da22beb
Merge branch 'master' into guitarheroua/2833-conn-pool-evictions-caus…
Guitarheroua Jul 20, 2023
6a857fe
Merge branch 'petera/example-conn-factory-refactor' of github.com:onf…
Guitarheroua Jul 20, 2023
ae9c109
Refactor connection factory
Guitarheroua Jul 20, 2023
85b17f9
Added comments to new components
Guitarheroua Jul 20, 2023
48cae42
Merge branch 'guitarheroua/2833-conn-pool-evictions-cause-conn-failur…
Guitarheroua Jul 20, 2023
1122b8c
removed unnecessary changes
Guitarheroua Jul 20, 2023
fb4e43f
removed unnecessary changes
Guitarheroua Jul 20, 2023
226f053
Merge branch 'master' into guitarheroua/2833-conn-pool-evictions-caus…
Guitarheroua Jul 20, 2023
5240b3b
Merge branch 'master' into guitarheroua/2833-conn-pool-evictions-caus…
Guitarheroua Jul 21, 2023
695ff8c
Fixed review remarks
Guitarheroua Jul 21, 2023
8644def
Merge branch 'master' of github.com:Guitarheroua/flow-go into guitarh…
Guitarheroua Jul 21, 2023
eb6b627
Fixed conflicts
Guitarheroua Jul 21, 2023
46d10fb
Removed work around
Guitarheroua Jul 24, 2023
f7c7116
Merge branch 'master' into guitarheroua/2833-conn-pool-evictions-caus…
Guitarheroua Jul 25, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,7 @@ generate-mocks: install-mock-generators
mockery --name 'API' --dir="./access" --case=underscore --output="./access/mock" --outpkg="mock"
mockery --name 'API' --dir="./engine/protocol" --case=underscore --output="./engine/protocol/mock" --outpkg="mock"
mockery --name '.*' --dir="./engine/access/state_stream" --case=underscore --output="./engine/access/state_stream/mock" --outpkg="mock"
mockery --name 'ConnectionFactory' --dir="./engine/access/rpc/backend" --case=underscore --output="./engine/access/rpc/backend/mock" --outpkg="mock"
mockery --name 'ConnectionFactory' --dir="./engine/access/rpc/connection" --case=underscore --output="./engine/access/rpc/connection/mock" --outpkg="mock"
mockery --name '.*' --dir=model/fingerprint --case=underscore --output="./model/fingerprint/mock" --outpkg="mock"
mockery --name 'ExecForkActor' --structname 'ExecForkActorMock' --dir=module/mempool/consensus/mock/ --case=underscore --output="./module/mempool/consensus/mock/" --outpkg="mock"
mockery --name '.*' --dir=engine/verification/fetcher/ --case=underscore --output="./engine/verification/fetcher/mock" --outpkg="mockfetcher"
Expand Down
19 changes: 14 additions & 5 deletions cmd/access/node_builder/access_node_builder.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ import (
"github.com/onflow/flow-go/engine/access/rest/routes"
"github.com/onflow/flow-go/engine/access/rpc"
"github.com/onflow/flow-go/engine/access/rpc/backend"
rpcConnection "github.com/onflow/flow-go/engine/access/rpc/connection"
"github.com/onflow/flow-go/engine/access/state_stream"
followereng "github.com/onflow/flow-go/engine/common/follower"
"github.com/onflow/flow-go/engine/common/requester"
Expand Down Expand Up @@ -917,7 +918,7 @@ func (builder *FlowAccessNodeBuilder) Build() (cmd.Node, error) {
builder.rpcConf.CollectionAddr,
grpc.WithDefaultCallOptions(grpc.MaxCallRecvMsgSize(int(builder.rpcConf.MaxMsgSize))),
grpc.WithTransportCredentials(insecure.NewCredentials()),
backend.WithClientUnaryInterceptor(builder.rpcConf.BackendConfig.CollectionClientTimeout))
rpcConnection.WithClientTimeoutOption(builder.rpcConf.BackendConfig.CollectionClientTimeout))
if err != nil {
return err
}
Expand Down Expand Up @@ -1047,16 +1048,24 @@ func (builder *FlowAccessNodeBuilder) Build() (cmd.Node, error) {
return nil, fmt.Errorf("could not initialize backend cache: %w", err)
}

connFactory := &backend.ConnectionFactoryImpl{
var connBackendCache *rpcConnection.Cache
if backendCache != nil {
connBackendCache = rpcConnection.NewCache(backendCache, int(cacheSize))
}

connFactory := &rpcConnection.ConnectionFactoryImpl{
CollectionGRPCPort: builder.collectionGRPCPort,
ExecutionGRPCPort: builder.executionGRPCPort,
CollectionNodeGRPCTimeout: backendConfig.CollectionClientTimeout,
ExecutionNodeGRPCTimeout: backendConfig.ExecutionClientTimeout,
ConnectionsCache: backendCache,
CacheSize: cacheSize,
MaxMsgSize: config.MaxMsgSize,
AccessMetrics: accessMetrics,
Log: node.Logger,
Manager: rpcConnection.NewManager(
connBackendCache,
node.Logger,
accessMetrics,
config.MaxMsgSize,
),
}

backend := backend.New(
Expand Down
17 changes: 13 additions & 4 deletions cmd/observer/node_builder/observer_builder.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ import (
"github.com/onflow/flow-go/engine/access/rest/routes"
"github.com/onflow/flow-go/engine/access/rpc"
"github.com/onflow/flow-go/engine/access/rpc/backend"
rpcConnection "github.com/onflow/flow-go/engine/access/rpc/connection"
"github.com/onflow/flow-go/engine/common/follower"
synceng "github.com/onflow/flow-go/engine/common/synchronization"
"github.com/onflow/flow-go/engine/protocol"
Expand Down Expand Up @@ -904,16 +905,24 @@ func (builder *ObserverServiceBuilder) enqueueRPCServer() {
return nil, fmt.Errorf("could not initialize backend cache: %w", err)
}

connFactory := &backend.ConnectionFactoryImpl{
var connBackendCache *rpcConnection.Cache
if backendCache != nil {
connBackendCache = rpcConnection.NewCache(backendCache, int(cacheSize))
}

connFactory := &rpcConnection.ConnectionFactoryImpl{
CollectionGRPCPort: 0,
ExecutionGRPCPort: 0,
CollectionNodeGRPCTimeout: backendConfig.CollectionClientTimeout,
ExecutionNodeGRPCTimeout: backendConfig.ExecutionClientTimeout,
ConnectionsCache: backendCache,
CacheSize: cacheSize,
MaxMsgSize: config.MaxMsgSize,
AccessMetrics: accessMetrics,
Log: node.Logger,
Manager: rpcConnection.NewManager(
connBackendCache,
node.Logger,
accessMetrics,
config.MaxMsgSize,
),
}

accessBackend := backend.New(
Expand Down
10 changes: 5 additions & 5 deletions cmd/util/cmd/execution-state-extract/export_report.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"EpochCounter": 0,
"PreviousStateCommitment": "1c9f9d343cb8d4610e0b2c1eb74d6ea2f2f8aef2d666281dc22870e3efaa607b",
"CurrentStateCommitment": "1c9f9d343cb8d4610e0b2c1eb74d6ea2f2f8aef2d666281dc22870e3efaa607b",
"ReportSucceeded": true
}
"EpochCounter": 0,
"PreviousStateCommitment": "1c9f9d343cb8d4610e0b2c1eb74d6ea2f2f8aef2d666281dc22870e3efaa607b",
"CurrentStateCommitment": "1c9f9d343cb8d4610e0b2c1eb74d6ea2f2f8aef2d666281dc22870e3efaa607b",
"ReportSucceeded": true
}
14 changes: 4 additions & 10 deletions engine/access/rpc/backend/backend.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ import (

"github.com/onflow/flow-go/access"
"github.com/onflow/flow-go/cmd/build"
"github.com/onflow/flow-go/engine/access/rpc/connection"
"github.com/onflow/flow-go/engine/common/rpc"
"github.com/onflow/flow-go/engine/common/rpc/convert"
"github.com/onflow/flow-go/model/flow"
Expand Down Expand Up @@ -78,7 +79,7 @@ type Backend struct {
chainID flow.ChainID
collections storage.Collections
executionReceipts storage.ExecutionReceipts
connFactory ConnectionFactory
connFactory connection.ConnectionFactory
}

// Config defines the configurable options for creating Backend
Expand All @@ -104,7 +105,7 @@ func New(
executionResults storage.ExecutionResults,
chainID flow.ChainID,
accessMetrics module.AccessMetrics,
connFactory ConnectionFactory,
connFactory connection.ConnectionFactory,
retryEnabled bool,
maxHeightRange uint,
preferredExecutionNodeIDs []string,
Expand Down Expand Up @@ -224,16 +225,9 @@ func NewCache(
var cache *lru.Cache
cacheSize := connectionPoolSize
if cacheSize > 0 {
// TODO: remove this fallback after fixing issues with evictions
// It was observed that evictions cause connection errors for in flight requests. This works around
// the issue by forcing hte pool size to be greater than the number of ENs + LNs
if cacheSize < DefaultConnectionPoolSize {
log.Warn().Msg("connection pool size below threshold, setting pool size to default value ")
cacheSize = DefaultConnectionPoolSize
}
var err error
cache, err = lru.NewWithEvict(int(cacheSize), func(_, evictedValue interface{}) {
store := evictedValue.(*CachedClient)
store := evictedValue.(*connection.CachedClient)
store.Close()
log.Debug().Str("grpc_conn_evicted", store.Address).Msg("closing grpc connection evicted from pool")
if accessMetrics != nil {
Expand Down
6 changes: 4 additions & 2 deletions engine/access/rpc/backend/backend_accounts.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,13 @@ import (
"time"

"github.com/hashicorp/go-multierror"
execproto "github.com/onflow/flow/protobuf/go/flow/execution"
"github.com/rs/zerolog"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"

execproto "github.com/onflow/flow/protobuf/go/flow/execution"

"github.com/onflow/flow-go/engine/access/rpc/connection"
"github.com/onflow/flow-go/engine/common/rpc"
"github.com/onflow/flow-go/engine/common/rpc/convert"
"github.com/onflow/flow-go/model/flow"
Expand All @@ -21,7 +23,7 @@ type backendAccounts struct {
state protocol.State
headers storage.Headers
executionReceipts storage.ExecutionReceipts
connFactory ConnectionFactory
connFactory connection.ConnectionFactory
log zerolog.Logger
}

Expand Down
3 changes: 2 additions & 1 deletion engine/access/rpc/backend/backend_events.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ import (
"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"

"github.com/onflow/flow-go/engine/access/rpc/connection"
"github.com/onflow/flow-go/engine/common/rpc"
"github.com/onflow/flow-go/engine/common/rpc/convert"
"github.com/onflow/flow-go/model/flow"
Expand All @@ -24,7 +25,7 @@ type backendEvents struct {
headers storage.Headers
executionReceipts storage.ExecutionReceipts
state protocol.State
connFactory ConnectionFactory
connFactory connection.ConnectionFactory
log zerolog.Logger
maxHeightRange uint
}
Expand Down
3 changes: 2 additions & 1 deletion engine/access/rpc/backend/backend_scripts.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ import (
"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"

"github.com/onflow/flow-go/engine/access/rpc/connection"
"github.com/onflow/flow-go/engine/common/rpc"
"github.com/onflow/flow-go/model/flow"
"github.com/onflow/flow-go/module"
Expand All @@ -29,7 +30,7 @@ type backendScripts struct {
headers storage.Headers
executionReceipts storage.ExecutionReceipts
state protocol.State
connFactory ConnectionFactory
connFactory connection.ConnectionFactory
log zerolog.Logger
metrics module.BackendScriptsMetrics
loggedScripts *lru.Cache
Expand Down
4 changes: 3 additions & 1 deletion engine/access/rpc/backend/backend_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ import (
"strconv"
"testing"

"github.com/onflow/flow-go/engine/access/rpc/connection"

"github.com/dgraph-io/badger/v2"
accessproto "github.com/onflow/flow/protobuf/go/flow/access"
entitiesproto "github.com/onflow/flow/protobuf/go/flow/entities"
Expand Down Expand Up @@ -2399,7 +2401,7 @@ func (suite *Suite) setupReceipts(block *flow.Block) ([]*flow.ExecutionReceipt,
return receipts, ids
}

func (suite *Suite) setupConnectionFactory() ConnectionFactory {
func (suite *Suite) setupConnectionFactory() connection.ConnectionFactory {
// create a mock connection factory
connFactory := new(backendmock.ConnectionFactory)
connFactory.On("GetExecutionAPIClient", mock.Anything).Return(suite.execClient, &mockCloser{}, nil)
Expand Down
3 changes: 2 additions & 1 deletion engine/access/rpc/backend/backend_transactions.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ import (
"google.golang.org/grpc/status"

"github.com/onflow/flow-go/access"
"github.com/onflow/flow-go/engine/access/rpc/connection"
"github.com/onflow/flow-go/engine/common/rpc"
"github.com/onflow/flow-go/engine/common/rpc/convert"
"github.com/onflow/flow-go/fvm/blueprints"
Expand All @@ -37,7 +38,7 @@ type backendTransactions struct {
transactionMetrics module.TransactionMetrics
transactionValidator *access.TransactionValidator
retry *Retry
connFactory ConnectionFactory
connFactory connection.ConnectionFactory

previousAccessNodes []accessproto.AccessAPIClient
log zerolog.Logger
Expand Down
Loading