Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Protocol] Retry failed requests #95

Merged
merged 14 commits into from
Sep 25, 2023

Conversation

Douile
Copy link
Collaborator

@Douile Douile commented Sep 16, 2023

I added some logic to retry requests when failed, this seems to make

$ cargo run --example generic -- tf2 91.216.250.10

fail (less) often for me. (there are still sometimes malformed packet errors but I don't see WouldBlock errors anymore).

The problem with the specific query seems to be that the players and rules requests sometimes don't receive a response. Inspecting the packet captures the requests look valid so my assumption is that these responses are abnormally large for UDP packets and hence are occasionally dropped on the way to me.

For valve query protocol there might be a better way to solve this as I believe that it supports "paginated" multi-packet responses. But I think that having some retry logic could be useful as internally retrying the individual part of the request that fails might be faster than the caller retrying the entire query.

NOTE: Setting GatherOptions::{gather_player, gather_rules} to false avoids my original issue and causes query to succeed every time.

This can be used to specify how many times to re-send requests that
fail. The default value is "1" so the if the first request fails, 1 more
attempt will be made.
@Douile Douile changed the title Retry failed requests [Protocol] Retry failed requests Sep 16, 2023
@CosminPerRam CosminPerRam added enhancement New feature or request protocol This is something regarding a protocol labels Sep 16, 2023
@CosminPerRam
Copy link
Member

For valve query protocol there might be a better way to solve this as I believe that it supports "paginated" multi-packet responses.

Yep, and it's currently implemented, but I don't remember exactly when games do hand out these packets in multiple parts and when not (e.g. exceeding a maximum size, but some games have bigger packet sizes than others).

But I think that having some retry logic could be useful as internally retrying the individual part of the request that fails might be faster than the caller retrying the entire query.

Agreed, this could be done for every protocol that does multiple queries (I think only the Valve protocol for now).

@Douile
Copy link
Collaborator Author

Douile commented Sep 17, 2023

Yep, and it's currently implemented

Ah yes I forgot its a completley server side implementation dependent thing.

Agreed, this could be done for every protocol that does multiple queries

The way I implemented here (retries field in TimeoutSettings), I think it might be confusing to users if protocols without multiple queries don't follow the same retry logic. Would probably need to document which protocols its implemented for as the multiple query part is invisible to a library consumer or just implement everywhere.

@CosminPerRam
Copy link
Member

I think it might be confusing to users if protocols without multiple queries don't follow the same retry logic.

Then we could just treat this as a retry count, which not only applies to valve queries (where are multiple ones) but on every query (GS1 does just one, if it fails and we have 2 retries, try it again, and on those which do have multiple ones (valve), just retry the subqueries).

Copy link
Collaborator Author

@Douile Douile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ready for review, the only thing I might add would be unit tests but that can be a separate PR.


/// Get the number of retries if there are timeout settings else fall back
/// to the default
pub fn get_retries_or_default(timeout_settings: &Option<TimeoutSettings>) -> usize {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This maybe shouldn't be in impl TimeoutSettings but you cannot implement functions on external types (Optional) so I put it here for organization. (Also probably doesn't need to be pub). Let me know what you think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think thats fine for now.

let mut server_vars = query_vars(address, timeout_settings)?;
let retry_count = TimeoutSettings::get_retries_or_default(&timeout_settings);
let mut server_vars = retry_on_timeout(retry_count, move || {
query_vars(address, timeout_settings.clone())
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clone here is undesirable but needed because there is no struct to take ownership of it here.

let data = get_data::<Client>(address, timeout_settings)?;
let retry_count = TimeoutSettings::get_retries_or_default(&timeout_settings);
let data = retry_on_timeout(retry_count, || {
get_data::<Client>(address, timeout_settings.clone())
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undesirable clone (see before)

@Douile Douile marked this pull request as ready for review September 18, 2023 21:17
@CosminPerRam
Copy link
Member

CosminPerRam commented Sep 19, 2023

Retries also need to be done for games with proprietary queries.

pub enum ProprietaryProtocol {
TheShip,
FFOW,
JC2MP,
}

... and i think this hints that retrying queries shouldn't be a top-level code in queries (duplicated stuff), but refactoring this should (?) be done in another pr.

@Douile
Copy link
Collaborator Author

Douile commented Sep 19, 2023

and i think this hints that retrying queries shouldn't be a top-level code in queries (duplicated stuff), but refactoring this should (?) be done in another pr.

I agree, I just couldn't see an easy way to do it.

  • Could add to generic query, but then that requires duplicating to every game impl as well
  • Can't add to socket as retrying a recv doesn't work because you don't resend the request

So I think we're kind of stuck in this layer, I guess syntactically might look nicer if added retry to the get_info like function, then query just calls through without needing retry code. But that still would involve a lot of duplicated code just in different places.

@Douile
Copy link
Collaborator Author

Douile commented Sep 19, 2023

My mistake, I see what you mean now: FFOW and JC2MP call the internal query functions which do not implement retries themselves. So implementing the retries there is much preferred.

Retries are now implemented as wrappers on the single function that
would need to be retried on timeout.

In order to avoid cloning of TimeoutSettings, Socket::apply_timeouts()
was changed to accept a borrowed TimeoutSettings. And extra helpers were
added to the TimeoutSettings impl to reduce repetition.
@Douile
Copy link
Collaborator Author

Douile commented Sep 24, 2023

I have changed so all of the retry_on_timeouts now occur in wrappers around the method in each protocol that makes the query. This means that the proprietary games that call these functions directly don't need to implement retries themselves.

I will add some unit tests for retry_on_timeout and then assuming everyone else is happy this is ready to merge.

Sending packets could also timeout and until error_generic_member_access
is stable we have no way of determining the type of the underlying
`std::error::Error`.
@CosminPerRam CosminPerRam merged commit c3281be into gamedig:main Sep 25, 2023
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request protocol This is something regarding a protocol
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants