Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic fees implementation that reports bridge queues state to the sending chain (backup option) #2233

Closed
wants to merge 10 commits into from

Conversation

svyatonik
Copy link
Contributor

@svyatonik svyatonik commented Jun 27, 2023

related to #2434
the future of this PR depend on the #2213

The first commit of this PR allows our XCM dispatcher to report its status back to the bridged messaging pallet. After that at the source bridge hub we'll know:

  1. number of queued messages in the bridge queue;
  2. number of queued messages in the outbound XCMP queue of the bridged bridge hub.

I have a "hackish" idea on how to report those numbers back to the sending chain (e.g. to asset hub in the case of asset hub <> asset hub bridge). If we'll implement that, the sending asset hub will know the:

  1. number of queued messages in its outbound XCMP queue to the sibling bridge hub;
  2. number of queued messages in the inbound XCMP queue of the sibling bridge hub;
  3. number of queued outbound bridge messages;
  4. number of queued messages in the outbound XCMP queue of the bridged bridge hub.

It can use this information to adjust message fee dynamically. But there's the last queue that is outside of this ^^^ list - that's the inbound queue of the target chain (for our example: target asset hub). So even if all intermediate queues may be empty, the target chain may fail to process inbound messages in timely fashion. We may just live with that - if it will fail to process messages, they'll keep piling up in other queues (target bridge hub outbound XCMP queue) and we'll be able to understand that and increase fee. Also there's a channel state, which may switch to Suspended, when the target chain is failing to process inbound messages => we may treat that as a signal that the number of messages "there" is equal to the channel.suspend_threshold (or something like that).

There are many open questions and remaining tasks here. And they may/will be implement outside of this PR Below is just a draft list that I can imagine right now:

  • keep experimenting with XCMP/UMP queues to understand whether we can get the total numbers of unprocessed messages in the queue (meaning both outbound and inbound queues);
  • do we need to account message size in the state of the xcm-over-bridge queue. Like we will be charging sender more for larger messages. But what if there are just one big message in the queue that is "equal" to the hundred of smaller messages. Shall we increase fee (and report queue sizes) accordingly?
  • in current implementation, we'll be updating dispatcher_state only when messages are delivered. Meaning that the amount of unprocessed messages will always be equal or larger than the number of delivered messages. So e.g.if we have configured message fee to be increased when number of messages in the queue is larger than 100 and someone has sent 200 messages at once AND they have been delivered in a single delivery transaction AND noone else is sending messages over the bridge, then the fee will be increasing at every block - noone will ever send updated state back to the source chain. So we either need to (carefully) incentivize relayers to do that or to ensure that the configuration is safe given that possibility Should be solved with proper threshold that accounts maximal number of unconfirmed messages;
  • do we need a separate "system" lane for reporting queue sizes. This has both it pros and cons. I.e. we may have some schedule for updates, meaning less transactions. But OTOH it isn't a scalable solution - imo would be better to have everything lane-specific to be reported in this lane transactions only. Still, may be changed in the future No;
  • there are larger-than-usual delays in xcm-over-bridge bridge delivery - it includes finalization delays (for two relay chains and two parachains), delivery and confirmation delays transaction. We probably need to be more delicate with the fee than using a simple exponential formula. Like have some expectation that at block current + N, the number of messages in the queue must be not larger than M. And only if it is larger, start applying exponential factor to fee;
  • to report the xcm-over-bridge state to the sending chain, we'll need to have a separate pallet there. It'll be responsible for sending bridge-state requests (still considering whether we need it or not), receiving responses and updating the fee accordingly It gonna be the pallet at XCM bridge hub pallet (will be deployed at bridge hubs) #2213 .

@svyatonik svyatonik added P-Runtime PR-audit-needed A PR has to be audited before going live. labels Jun 27, 2023
@svyatonik svyatonik self-assigned this Jun 27, 2023
@svyatonik svyatonik changed the title Xcm-over-bridge queue state (aka dynamic fees) (WIP) Dynamic fees implementation that reports bridge queues state to the sending chain (backup option) Jul 13, 2023
@svyatonik
Copy link
Contributor Author

svyatonik commented Jul 13, 2023

I'm going to remove code that is required for this implementation from #2213. Just leaving it here:

		/// Report reachable bridge queues state.
		///
		/// The origin must pass the `AllowedOpenBridgeOrigin` filter. The pallet prepares
		/// the bridge queues state structure and sends it back to the caller.
		#[pallet::call_index(4)]
		#[pallet::weight(Weight::zero())] // TODO: https://github.com/paritytech/parity-bridges-common/issues/1760 - weights
		pub fn report_bridge_queues_state(
			_origin: OriginFor<T>,
			_lane_id: LaneId,
			_encoded_call_prefix: Vec<u8>,
			_encoded_call_suffix: Vec<u8>,
		) -> DispatchResult {
			// TODO: implement me in https://github.com/paritytech/parity-bridges-common/pull/2233
			// Something like:
			//
			// ```nocompile
			// let bridge_origin_relative_location = T::AllowedOpenBridgeOrigin::ensure_origin(origin)?;
			// ...
			// let state = BridgeQueuesState { .. };
			//  let encoded_call = encoded_call_prefix ++ state.encode() ++ encoded_call_suffix;
			// T::ToBridgeOriginSender::send(
			//     bridge_origin_relative_location,
			//     vec![Xcm::Transact { call: encoded_call }],
			// );
			unimplemented!("")
		}
/// The state of all (reachable) queues as they seen from the bridge hub.
#[derive(
	Clone,
	Copy,
	Decode,
	Default,
	Encode,
	Eq,
	Ord,
	PartialOrd,
	PartialEq,
	RuntimeDebug,
	TypeInfo,
	MaxEncodedLen,
	Serialize,
	Deserialize,
)]
pub struct BridgeQueuesState {
	/// Number of messages queued at bridge hub outbound (`pallet-bridge-messages`) queue.
	pub outbound_here: MessageNonce,
	/// Number of messages queued at the outbound queue of the bridged bridge hub. This
	/// queue connects bridged bridge hub and remote bridge destination. In most cases
	/// it will be the XCMP (HRMP) or `DMP` queue.
	pub outbound_at_bridged: MessageNonce,
	/// Number of messages queued at the destination inbound queue. This queue connects
	/// bridged bridge hub and remote bridge destination. In most cases it will be the XCMP
	/// (HRMP) or `DMP` queue.
	///
	/// Bridged (target) bridge hub doesn't have an access to the exact value of
	/// this metric. But it may get an estimation, depending on the channel
	/// state. The channel between target bridge hub and destination is suspended
	/// when there are more than `N` unprocessed messages at the destination inbound
	/// queue. So if we see the suspended channel state at the target bridge hub,
	/// we: (1) assume that there's at least `N` queued messages at the inbound
	/// destination queue and (2) all further messages are now piling up at our
	/// outbound queue (`outbound_at_bridged`), so we have exact count.
	pub inbound_at_destination: MessageNonce,
}

impl BridgeQueuesState {
	/// Return total number of messsages that we assume are currently in the bridges queue.
	pub fn total_enqueued_messages(&self) -> MessageNonce {
		self.outbound_here
			.saturating_add(self.outbound_at_bridged)
			.saturating_add(self.inbound_at_destination)
	}
}

svyatonik added a commit that referenced this pull request Jul 17, 2023
* XCM over bridge pallet (won't ever be merged to this repo): initial commit

* try both bridge-id and lane-id

* flush

* flush

* flush

* flush

* first prototype

* started working on xcm-over-bridge tests

* proper open_bridge_works test

* more open_bridge tests

* request_bridge_closure tests

* tests for close_bridge

* report_misbehavior tests

* renaming xcm-over-bridge ad xcm-bridge-hub: we'll have two similar pallets for dynamic lanes/fees support. One will be deployed at the bridge hub (xcm-bridge-hub) and another one (xcm-bridge-hub-router) at sibling/parent chains to send messages to remote network over the bridge hub

* added couple of TODOs

* moved BridgeQueuesState here + TODO for implementing report_bridge_queues_state + fmt

* fixing TODOs

* fixing TODOs

* moved bridge locations to primitives

* added a couple of TODOs

* ref issue from TODO

* clippy and spelling

* fix tests

* another TODOs

* typo

* LaneState -> force_close_bridge

* start_closing_the_bridge -> start_closing_bridge

* removed Closing bridge state, so now we only have the Opened -> optional temporary Closed state -> None

* spelling

* started prototyping suspend+resume on misbehavior

* continue prototyping

* more tests for open_bridge

* more tests for close_bridge

* more tests for report_misbehavior

* started tests for resume_misbehaving_bridges

* implemented tests for resume_misbehaving_bridges

* remove UnsuspendedChannelWithMisbehavingOwner because now, when all bridges are resumed at once + we assume that the inbound XCM channel is suspended immediately it is no longer actual

* added TODO

* add comment re misbehavior reporter if he's associated with the bridge owner

* ignored clippy

* fmt

* use versioned XCM structs in public interfaces and storage + Box XCM locations where possible

* spelling

* removed TODO in favor of #2257

* LocalChannelManager -> LocalXcmChannelManager

* removed code specific to #2233, because it isn't the only option now

* removemisbehavior mentions

* also temporary (?) remove BridgesByLocalOrigin because the storage format will likely change to be able to resume bridges from the on_iniitalize/on_idle

* AllowedOpenBridgeOrigin -> OpenBridgeOrigin

* - TooManyBridgesForLocalOrigin

* use bridge_destination_relative_location in args

* better impl of strip_low_level_junctions

* get rid of xcm_into_latest

* clippy

* added #![warn(missing_docs)] to new crates
@svyatonik svyatonik mentioned this pull request Aug 1, 2023
9 tasks
@svyatonik
Copy link
Contributor Author

#2294 (comment)

@svyatonik svyatonik closed this Aug 1, 2023
bkontur pushed a commit that referenced this pull request May 7, 2024
* XCM over bridge pallet (won't ever be merged to this repo): initial commit

* try both bridge-id and lane-id

* flush

* flush

* flush

* flush

* first prototype

* started working on xcm-over-bridge tests

* proper open_bridge_works test

* more open_bridge tests

* request_bridge_closure tests

* tests for close_bridge

* report_misbehavior tests

* renaming xcm-over-bridge ad xcm-bridge-hub: we'll have two similar pallets for dynamic lanes/fees support. One will be deployed at the bridge hub (xcm-bridge-hub) and another one (xcm-bridge-hub-router) at sibling/parent chains to send messages to remote network over the bridge hub

* added couple of TODOs

* moved BridgeQueuesState here + TODO for implementing report_bridge_queues_state + fmt

* fixing TODOs

* fixing TODOs

* moved bridge locations to primitives

* added a couple of TODOs

* ref issue from TODO

* clippy and spelling

* fix tests

* another TODOs

* typo

* LaneState -> force_close_bridge

* start_closing_the_bridge -> start_closing_bridge

* removed Closing bridge state, so now we only have the Opened -> optional temporary Closed state -> None

* spelling

* started prototyping suspend+resume on misbehavior

* continue prototyping

* more tests for open_bridge

* more tests for close_bridge

* more tests for report_misbehavior

* started tests for resume_misbehaving_bridges

* implemented tests for resume_misbehaving_bridges

* remove UnsuspendedChannelWithMisbehavingOwner because now, when all bridges are resumed at once + we assume that the inbound XCM channel is suspended immediately it is no longer actual

* added TODO

* add comment re misbehavior reporter if he's associated with the bridge owner

* ignored clippy

* fmt

* use versioned XCM structs in public interfaces and storage + Box XCM locations where possible

* spelling

* removed TODO in favor of #2257

* LocalChannelManager -> LocalXcmChannelManager

* removed code specific to #2233, because it isn't the only option now

* removemisbehavior mentions

* also temporary (?) remove BridgesByLocalOrigin because the storage format will likely change to be able to resume bridges from the on_iniitalize/on_idle

* AllowedOpenBridgeOrigin -> OpenBridgeOrigin

* - TooManyBridgesForLocalOrigin

* use bridge_destination_relative_location in args

* better impl of strip_low_level_junctions

* get rid of xcm_into_latest

* clippy

* added #![warn(missing_docs)] to new crates
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P-Runtime PR-audit-needed A PR has to be audited before going live.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant