Fix replication service denial after `MaxObjectSize` increase #2911

cthulhu-rider · 2024-08-08T13:19:18Z

overall, solution is very simple: listen to config changes and update the server limit. But...

gRPC

lib does not provide option to update max recv msg size on the runnig server. https://pkg.go.dev/google.golang.org/grpc#MaxRecvMsgSize is static. It can be very easily supported, so i patched my fork. If we decide this proposal is good, i suggest to add an organization fork (and later propose it to the upstream)

there is an alternative approach i dont like completely:

gracefully stop the running server
create new one with updated limit and run it
although MaxObjectSize is expected to be changed very rarely, such method looks very ugly taking into account the native option's simplicity

Notifications

we dont have them now (nspcc-dev/neofs-contract#427). Polling is used as a temp soluition

this brings us closer to fix the test within which the bug was originally detected. It should be noted that there is not 100% stable solution for it: there will always be a gap b/w contract update and node reaction. The only option i see for now is to ignore overflow errors for about 1-2 minutes and retry
cc @evgeniiz321

Since storage node serves `ObjectService.Replicate` RPC, the gRPC server must be able to accept the biggest allowed object. Previously, node set max size of received gRPC messages to sum of payload (`MaxObjectSize` network setting) and header (16K at the moment) size limits. This was not entirely correct because replication request message also contains non-object fields (only signature for now) and protobuf service bytes. Thus, when the size of an object approached the maximum allowed, it was possible to overflow the calculated limit and receive service refuse. This adds 1 KB to the calculated limit. This is a heuristic value that is larger than the current protocol version's query extra data, while supporting small extensions in advance. The exact value is meaningless given the smallness volume degrees. Signed-off-by: Leonard Lyubich <leonard@morphbits.io>

Since storage node serves `ObjectService.Replicate` RPC, the gRPC server must be able to accept the biggest allowed object. Previously, node calculated global message limit for the gRPC server once on startup. With this behavior, when network setting `MaxObjectSize` was increased, the node stopped accepting write objects larger than the previous limit. This manifested itself in a denial of replication service. From now storage node updates max received gRPC message size (if needed) on each refresh of the `MaxObjectSize` setting cache and via Netmap contract's polling done once per minute. Refs #2910. Signed-off-by: Leonard Lyubich <leonard@morphbits.io>

codecov · 2024-08-08T13:22:07Z

Codecov Report

Attention: Patch coverage is 31.81818% with 45 lines in your changes missing coverage. Please review.

Project coverage is 23.78%. Comparing base (73e8414) to head (19f39df).

Files	Patch %	Lines
cmd/neofs-node/netmap.go	0.00%	16 Missing ⚠️
cmd/neofs-node/cache.go	0.00%	15 Missing ⚠️
cmd/neofs-node/grpc.go	65.62%	11 Missing ⚠️
cmd/neofs-node/object.go	0.00%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2911      +/-   ##
==========================================
+ Coverage   23.75%   23.78%   +0.02%     
==========================================
  Files         774      774              
  Lines       44866    44911      +45     
==========================================
+ Hits        10660    10681      +21     
- Misses      33356    33380      +24     
  Partials      850      850

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

roman-khimov · 2024-08-12T13:21:59Z

If we decide this proposal is good

It's OK to me (a new feature there, solves a problem, not very invasive). But I wouldn't like to use any forks of the upstream library, so server restart is perfectly fine to me as well as a temporary solution (until your patch is accepted). This is a very rare event.

We can consider some handling on the test side as well. Node restart will solve this for sure.

cthulhu-rider added 2 commits August 8, 2024 12:03

cthulhu-rider requested a review from evgeniiz321 August 8, 2024 13:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix replication service denial after `MaxObjectSize` increase #2911

Fix replication service denial after `MaxObjectSize` increase #2911

cthulhu-rider commented Aug 8, 2024

codecov bot commented Aug 8, 2024

roman-khimov commented Aug 12, 2024

Fix replication service denial after MaxObjectSize increase #2911

Are you sure you want to change the base?

Fix replication service denial after MaxObjectSize increase #2911

Conversation

cthulhu-rider commented Aug 8, 2024

gRPC

Notifications

codecov bot commented Aug 8, 2024

Codecov Report

roman-khimov commented Aug 12, 2024

Fix replication service denial after `MaxObjectSize` increase #2911

Fix replication service denial after `MaxObjectSize` increase #2911