Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Substrate sometimes sends back an empty response to light client requests #12727

Closed
1 of 2 tasks
tomaka opened this issue Nov 17, 2022 · 4 comments · Fixed by #12732
Closed
1 of 2 tasks

Substrate sometimes sends back an empty response to light client requests #12727

tomaka opened this issue Nov 17, 2022 · 4 comments · Fixed by #12732

Comments

@tomaka
Copy link
Contributor

tomaka commented Nov 17, 2022

Is there an existing issue?

  • I have searched the existing issues

Experiencing problems? Have you tried our Stack Exchange first?

  • This is not a support question.

Description of bug

cc paritytech/smoldot#2975

I don't know under which condition (I don't know if it's a new version or if the node is overwhelmed or something), but Substrate sometimes sends back an empty response to light client requests. By "empty response" I mean that it sends back a 0 byte indicating that the response is empty, then closes the substream as if the response had finished being sent back.

It's really not supposed to do that. In case where the node is overwhelmed, Substrate is supposed to close the substream without sending any response at all without even any length (as otherwise sending back an empty response would be problem in case the response is legitimately empty).

Steps to reproduce

I don't know, unfortunately

@tomaka
Copy link
Contributor Author

tomaka commented Nov 17, 2022

I have no idea how this could happen.

Here are the relevant pieces of code:

  • let response =
    match self.client.read_proof(block, &mut request.keys.iter().map(AsRef::as_ref)) {
    Ok(proof) => {
    let r = schema::v1::light::RemoteReadResponse { proof: proof.encode() };
    Some(schema::v1::light::response::Response::RemoteReadResponse(r))
    },
    Err(error) => {
    trace!(
    "remote read request from {} ({} at {:?}) failed with: {}",
    peer,
    fmt_keys(request.keys.first(), request.keys.last()),
    request.block,
    error,
    );
    None
    },
    };
    Ok(schema::v1::light::Response { response })
  • match self.handle_request(peer, payload) {
    Ok(response_data) => {
    let response = OutgoingResponse {
    result: Ok(response_data),
    reputation_changes: Vec::new(),
    sent_feedback: None,
    };
    match pending_response.send(response) {
    Ok(()) => trace!(
    target: LOG_TARGET,
    "Handled light client request from {}.",
    peer,
    ),
    Err(_) => debug!(
    target: LOG_TARGET,
    "Failed to handle light client request from {}: {}",
    peer,
    HandleRequestError::SendResponse,
    ),
    };
    },
    Err(e) => {
    debug!(
    target: LOG_TARGET,
    "Failed to handle light client request from {}: {}", peer, e,
    );
    let reputation_changes = match e {
    HandleRequestError::BadRequest(_) => {
    vec![ReputationChange::new(-(1 << 12), "bad request")]
    },
    _ => Vec::new(),
    };
    let response = OutgoingResponse {
    result: Err(()),
    reputation_changes,
    sent_feedback: None,
    };
    if pending_response.send(response).is_err() {
    debug!(
    target: LOG_TARGET,
    "Failed to handle light client request from {}: {}",
    peer,
    HandleRequestError::SendResponse,
    );
    };
    },
    }
  • let RequestProcessingOutcome {
    peer,
    request_id,
    protocol: protocol_name,
    inner_channel,
    response: OutgoingResponse { result, reputation_changes, sent_feedback },
    } = match outcome {
    Some(outcome) => outcome,
    // The response builder was too busy or handling the request failed. This is
    // later on reported as a `InboundFailure::Omission`.
    None => continue,
    };
    if let Ok(payload) = result {
    if let Some((protocol, _)) = self.protocols.get_mut(&*protocol_name) {
    if protocol.send_response(inner_channel, Ok(payload)).is_err() {
    // Note: Failure is handled further below when receiving
    // `InboundFailure` event from `RequestResponse` behaviour.
    log::debug!(
    target: "sub-libp2p",
    "Failed to send response for {:?} on protocol {:?} due to a \
    timeout or due to the connection to the peer being closed. \
    Dropping response",
    request_id, protocol_name,
    );
    } else if let Some(sent_feedback) = sent_feedback {
    self.send_feedback
    .insert((protocol_name, request_id).into(), sent_feedback);
    }
    }
    }
  • https://github.com/libp2p/rust-libp2p/blob/6d49bf4a532dc2399bdf7e3bca30529d762abcc4/protocols/request-response/src/handler/protocol.rs#L114-L124

@tomaka
Copy link
Contributor Author

tomaka commented Nov 17, 2022

Could be caused by #12372, as it was released 4 days before I've noticed the errors happening. But looking at the changes I don't see how

@tomaka
Copy link
Contributor Author

tomaka commented Nov 17, 2022

What I suspect happens is:
After #12372, the node is supposed to return a response with no proof, and a problem in the protobuf library causes it to omit the field even though the protobuf format is never supposed to omit fields of type "message".
In other words schema::v1::light::Response { response: None } erroneously encodes to [].

This still raises a question, however: why would Substrate fail to answer these requests? The requests that lead to the problem are at the end of the warp syncing, in other words targeting the currently finalized block, and thus the block isn't supposed to be pruned.

@tomaka
Copy link
Contributor Author

tomaka commented Nov 18, 2022

This still raises a question, however: why would Substrate fail to answer these requests? The requests that lead to the problem are at the end of the warp syncing, in other words targeting the currently finalized block, and thus the block isn't supposed to be pruned.

The answer to that is very simple: smoldot sends storage queries during the warp syncing as well, for reasons that are a bit complicated to explain and off-topic here.
I was confused and thought that these requests were made against the current finalized block, but they're actually not and instead of targeting old blocks.

@paritytech-processbot paritytech-processbot bot moved this to Blocked ⛔️ in Networking Nov 18, 2022
@altonen altonen moved this from Blocked ⛔️ to Done ✅ in Networking Dec 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

1 participant