Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RUST-870 Deserialize server response directly from raw BSON bytes #389

Merged
merged 3 commits into from
Jul 15, 2021

Conversation

patrickfreed
Copy link
Contributor

RUST-870

This PR updates the driver to leverage the new bson::from_slice serde functionality in the BSON library to deserialize all server response directly from BSON rather than going through Document first, resulting in significant performance improvements.

@@ -353,22 +360,72 @@ impl Client {
let start_time = Instant::now();
let cmd_name = cmd.name.clone();

let response_result = match connection.send_command(cmd, request_id).await {
let command_result = match connection.send_command(cmd, request_id).await {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This match constitutes the main change in the PR. Since we want to avoid multiple deserialization steps, we need to deserialize the command response and the body of the response all in one step. Previously we would deserialize a Document, do some processing on it, and then pass that into handle_response for further deserialization of the body. Since we need to do it in one step, we need to include the "metadata" (e.g. ok, $clusterTime) information along with the "body" in the same type. To achieve this, each operation needed to specify their "body" type via an additional associated type (Operation::ResponseType).

To allow for RunCommand to fit into this scheme (which wants to preserve all the "metadata" and not separate it from the "body"), we needed a trait that the ResponseTypes implemented which could yield the ok and clusterTime values. This trait can easily be extended to extract any "metadata" we need in the future too.

}))
}
// for ok: 1 just return the original deserialization error.
_ => Err(deserialize_error),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this case is if the user specifies a T that doesn't match the data, for example.

if !error_response.is_success() =>
{
Err(command_error_response.into())
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this case (and its equivalent in the other branch) really shouldn't be encountered except for when caused by bugs in the server / driver.

// if we failed to deserialize the whole response, try deserializing
// a generic command response without the operation's body.
match response.body::<CommandResponse<Option<CommandErrorBody>>>() {
Ok(error_response) => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most command errors will go through this path


/// A command response backed by a `Document` rather than raw bytes.
/// Use this for simple command responses where deserialization performance is not a high priority.
pub(crate) struct DocumentCommandResponse {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is essentially equivalent to the old CommandResponse.

@@ -561,7 +561,7 @@ impl<T> Collection<T> {

impl<T> Collection<T>
where
T: DeserializeOwned + Unpin,
T: DeserializeOwned + Unpin + Send + Sync,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a breaking change, but it's required now that the Ts are stored on cursors and need to be held across .await points. Send and Sync are implemented on basically every type, so this shouldn't be a problem. See here for more info.

@@ -111,9 +121,7 @@ impl PartialEq for IsMasterCommandResponse {

impl IsMasterCommandResponse {
pub(crate) fn server_type(&self) -> ServerType {
if self.ok != Some(1.0) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we were always validating the response before using this method anyways, so ok was always 1.0.

@@ -1,12 +1,12 @@
use crate::{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most of the operation changes are updating handle_response to work on pre-deserialized response rather than a Document-based one and updating the tests to work with that. It was a lot of refactoring, so I've updated those tests to use a shared handle_response_test function to allow for easier refactorings in the future, if necessary.

/// A response to a command with a body shaped deserialized to a `T`.
#[derive(Deserialize, Debug)]
#[serde(rename_all = "camelCase")]
pub(crate) struct CommandResponse<T> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the type that all operations besides RunCommand use for their response type (parameterized by the generic body T).

@@ -229,28 +323,3 @@ pub(crate) enum Retryability {
Read,
None,
}

#[cfg(test)]
mod test {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to operation/test.rs

@patrickfreed patrickfreed marked this pull request as ready for review July 9, 2021 19:18
@@ -38,7 +38,7 @@ bson-uuid-0_8 = ["bson/uuid-0_8"]
async-trait = "0.1.42"
base64 = "0.13.0"
bitflags = "1.1.0"
bson = "2.0.0-beta.2"
bson = { git = "https://github.com/mongodb/bson-rust" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to be sure to point this back at a published release of bson before the final 2.0.0 release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, filed RUST-901 to remind us to do that.

@@ -393,12 +450,16 @@ impl Client {
}
Ok(response) => {
self.emit_command_event(|handler| {
let response_doc: Document = response
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: this can just be the body of the else, it doesn't need its own binding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -393,12 +450,16 @@ impl Client {
}
Ok(response) => {
self.emit_command_event(|handler| {
let response_doc: Document = response
.raw
.body()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, command events need the body as a Document, so if they're enabled that deserialization is needed despite being able to go directly from the response bytes to the response struct. If that's true, it might be worth calling out in documentation that command events come at a performance penalty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that's right, and that's a good idea, added a line about this to the command_event_handler option in ClientOptions.

@@ -55,15 +55,15 @@ where
exhausted: bool,
client: Client,
info: CursorInformation,
buffer: VecDeque<Document>,
buffer: VecDeque<T>,
_phantom: std::marker::PhantomData<T>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still needed now that buffer uses T?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, good catch, removed.

@patrickfreed patrickfreed force-pushed the RUST-870/raw-deserialize branch from a753821 to 050a55a Compare July 14, 2021 18:42
@patrickfreed
Copy link
Contributor Author

Re-requesting review (sorry!) for the snapshot time / atClusterTime changes needed after rebasing off of master, should only need to take a look at the last two commits.

Copy link
Contributor

@abr-egn abr-egn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks for handling the merge!

@patrickfreed patrickfreed merged commit 0fa2a90 into mongodb:master Jul 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants