Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RUST-1553 Add support for document sequences (OP_MSG payload type 1) #1009

Merged
merged 3 commits into from
Jan 19, 2024

Conversation

isabelatkinson
Copy link
Contributor

@isabelatkinson isabelatkinson commented Jan 18, 2024

Adds support for building commands with document sequences (OP_MSG payload type 1) and updates our Insert operation to specify the list of documents provided as a document sequence. See the OP_MSG spec for more details.


let cmd_name = cmd.name.clone();
let target_db = cmd.target_db.clone();

let serialized = op.serialize_command(cmd)?;
#[allow(unused_mut)]
let mut message = Message::from_command(cmd, Some(request_id))?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I eliminated the RawCommand layer between Command and Message here. RawCommand contained a vec of bytes created by Operation::serialize_command with the assumption that those bytes should be sent as a single payload type 0 section when converting to Message: this no longer makes sense when commands could require multiple sections.

command: Command<T>,
request_id: Option<i32>,
) -> Result<Self> {
let mut document_payload = bson::to_raw_document_buf(&command.body)?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will re-serialize command bodies that were specified as RawDocumentBufs as RawDocumentBufs which isn't great, but all of those payloads should be fairly small since we're now using document sequences for big commands (i.e. inserts and bulk writes). There wasn't a great way to special-case raw docs here without overhead/code duplication, but we can certainly go in that direction if this turns out to be a perf hit (which I think is very unlikely).

Eventually I would like to remove T from Command entirely and use RawDocumentBuf as the type for body. In most places we're just using Document or RawDocumentBuf for T anyway, and it should be fairly simple to replace the structs we use to model some commands with documents. I will file a ticket for this if it sounds good to you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strongly agree, I think that would be a great simplification here. And also agreed that the double serialization is unlikely to be an issue in practice.

Comment on lines +24 to +27
// OP_MSG payload type 0
pub(crate) document_payload: RawDocumentBuf,
// OP_MSG payload type 1
pub(crate) document_sequences: Vec<DocumentSequence>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This design models the OP_MSG structure more closely (i.e. exactly one payload type 0 section, any number of payload type 1 sections). I did keep the MessageSection enum for reading responses since sections can be specified in any order. AFAIK the server never actually returns anything as payload type 1 but it doesn't hurt to keep around that logic for future use.

@@ -151,7 +151,7 @@ pub struct InsertManyOptions {
pub ordered: Option<bool>,

/// The write concern for the operation.
#[serde(skip_deserializing)]
#[serde(skip_deserializing, skip_serializing_if = "write_concern_is_empty")]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tangential change that came up when I was refactoring the Insert op implementation. We currently have a macro to remove empty write concerns from options, but this strategy feels a little cleaner to me. I'll file a ticket to port everything over to this if you agree or I can revert if you prefer the macro approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

// enabled.
const MAX_ENCRYPTED_WRITE_SIZE: u64 = 2_097_152;
// The amount of overhead bytes to account for when building a document sequence.
const COMMAND_OVERHEAD_SIZE: u64 = 16_000;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some context here:

  • A single BSON object (i.e. document) can be maxBsonObjectSize bytes, which is returned from the server (currently 16 MB). We previously used this number to cap the size of the documents array for inserts without worrying about the additional info in the command sent to the server. This is because the server allows an extra 16 KB in a document for overhead (see here).
  • An OP_MSG can be maxMessageSizeBytes bytes total (also returned from the server, currently 48 MB without any overhead allowance). When building a document sequence (which is not constrained by maxBsonObjectSize) we can use this as an upper bound for how many bytes we can fit into the payload, but we still need to account for some overhead space for the command and OP_MSG metadata. I borrowed the 16 KB number from the server's overhead allocation and use maxMessageSizeBytes - COMMAND_OVERHEAD_SIZE to cap the size of a document sequence payload.

@isabelatkinson isabelatkinson marked this pull request as ready for review January 18, 2024 18:35
Copy link
Contributor

@abr-egn abr-egn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! One clarification question only, no need for re-review.

command: Command<T>,
request_id: Option<i32>,
) -> Result<Self> {
let mut document_payload = bson::to_raw_document_buf(&command.body)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strongly agree, I think that would be a great simplification here. And also agreed that the double serialization is unlikely to be an issue in practice.

#[serde(skip)]
pub(crate) name: String,

#[serde(skip)]
pub(crate) exhaust_allowed: bool,

#[serde(flatten)]
#[serde(skip)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT the new behavior (skip and extend_raw_document_buf in Message::from_command) should be equivalent to flatten?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah you're right, reverted

@@ -151,7 +151,7 @@ pub struct InsertManyOptions {
pub ordered: Option<bool>,

/// The write concern for the operation.
#[serde(skip_deserializing)]
#[serde(skip_deserializing, skip_serializing_if = "write_concern_is_empty")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

@isabelatkinson isabelatkinson changed the title RUST-1553 Add support for document sequences (OP_MSG payload type 1) RUST-1553 Add support for document sequences (OP_MSG payload type 1) Jan 19, 2024
@isabelatkinson isabelatkinson merged commit f144620 into mongodb:main Jan 19, 2024
14 of 16 checks passed
@isabelatkinson isabelatkinson deleted the document-sequences branch April 3, 2024 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants