Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provide lazy iterator implementation #449

Merged
merged 17 commits into from
Jan 18, 2024

Conversation

tychoish
Copy link
Contributor

@tychoish tychoish commented Jan 1, 2024

Introduces a RawElement type which will can be used to lazily resolve bson values during iteration.

This does not change the functionality of any existing interfaces for compatibility concerns, (though the iter() functions do return different types) and the IntoIter implementations changed, but semantically it's all the same.

Happy to make changes if ya'll are interested. in this :)

@abr-egn abr-egn self-assigned this Jan 2, 2024
@abr-egn abr-egn self-requested a review January 2, 2024 18:37
@abr-egn
Copy link
Contributor

abr-egn commented Jan 2, 2024

Hi! This is definitely interesting, I'll be looking over the code in detail in the next few days. Just a clarification, when you said

This does change the functionality of any existing interfaces for compatibility concerns

that was a typo for

This does not change the functionality of any existing interfaces for compatibility concerns

correct? We're strict about maintaining backwards compatibility.

@tychoish
Copy link
Contributor Author

tychoish commented Jan 2, 2024

Yes, sorry that's correct.

The only publicly visible change is the return type of the RawDocument's iter() method is now a wrapper around the new implementation (which has the previous type's name): but the returned iterator functions identically.

I'm not opposed to changing the name of the new (this) iterator and putting back a shim implementation with this name and the legacy functionality, though I don't know what things should be named (my inclination would be to call this RawIter, but I defer to you all, of course.

We're strict about maintaining backwards compatibility.

Oh, I know ;)

src/raw/iter.rs Outdated
self.size
}

pub fn key(&self) -> String {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any particular reason to return the string as copy here rather than the ref?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, good call

@@ -275,7 +275,10 @@ impl<'a> Iterator for RawArrayIter<'a> {

fn next(&mut self) -> Option<Result<RawBsonRef<'a>>> {
match self.inner.next() {
Some(Ok((_, v))) => Some(Ok(v)),
Some(Ok(elem)) => match elem.value() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this just be Some(Ok(elem)) => Some(elem.value()) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup!

@@ -577,10 +594,10 @@ impl TryFrom<&RawDocument> for crate::Document {
}

impl<'a> IntoIterator for &'a RawDocument {
type IntoIter = Iter<'a>;
type IntoIter = Box<dyn Iterator<Item = Self::Item> + 'a>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this is a breaking change. I think this can be solved by what you suggested in the conversation - keep the existing functionality named Iter and call the new one RawIter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alas, easy enough done

@abr-egn
Copy link
Contributor

abr-egn commented Jan 5, 2024

It looks like this fails the fuzz test; this program reproduces the failure:

use bson::RawDocument;

fn main() {
    let doc = RawDocument::from_bytes(&[25, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 5, 0, 254, 255, 255, 127, 0, 5, 0, 255, 255, 255, 9, 0]).unwrap();
    for _ in doc {
        println!("_");
    }
}

@tychoish
Copy link
Contributor Author

tychoish commented Jan 5, 2024

Nice! Thanks for the review. I'll look at this this weekend or early next week.

Copy link
Contributor Author

@tychoish tychoish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the review; I made the changes and fixed the fuzz test.

src/raw/iter.rs Outdated
self.size
}

pub fn key(&self) -> String {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, good call

@@ -275,7 +275,10 @@ impl<'a> Iterator for RawArrayIter<'a> {

fn next(&mut self) -> Option<Result<RawBsonRef<'a>>> {
match self.inner.next() {
Some(Ok((_, v))) => Some(Ok(v)),
Some(Ok(elem)) => match elem.value() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup!

@@ -577,10 +594,10 @@ impl TryFrom<&RawDocument> for crate::Document {
}

impl<'a> IntoIterator for &'a RawDocument {
type IntoIter = Iter<'a>;
type IntoIter = Box<dyn Iterator<Item = Self::Item> + 'a>;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alas, easy enough done

@abr-egn
Copy link
Contributor

abr-egn commented Jan 8, 2024

This fails the fuzzer with different input ([23, 0, 0, 0, 5, 0, 255, 255, 255, 255, 255, 255, 255, 3, 0, 0, 0, 247, 4, 247, 245, 0, 0]) and the corpus test (tests::spec::corpus::run) also fails with what is probably a related overflow error.

@tychoish
Copy link
Contributor Author

tychoish commented Jan 8, 2024

This fails the fuzzer with different input ([23, 0, 0, 0, 5, 0, 255, 255, 255, 255, 255, 255, 255, 3, 0, 0, 0, 247, 4, 247, 245, 0, 0]) and the corpus test (tests::spec::corpus::run) also fails with what is probably a related overflow error.

Ok, this time I think I have it.

@abr-egn
Copy link
Contributor

abr-egn commented Jan 9, 2024

Still failing, I'm afraid ([17, 0, 0, 0, 5, 5, 0, 255, 255, 255, 127, 255, 255, 255, 11, 13, 0]). You can run the fuzzer test yourself (assuming you've installed cargo-fuzz and a nightly toolchain) with cargo +nightly fuzz run iterate -- -rss_limit_mb=4096 -max_total_time=60; probably a good idea to do that before pushing.

@tychoish
Copy link
Contributor Author

I fixed it again, ran the fuzz test on my laptop for a while, nothing popped, and everything else passed (though my laptop is linux, it's not un-weird so I'm not sure what makes sense to validate.)

Copy link
Contributor

@abr-egn abr-egn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Tagging in Isabel for review.

Copy link
Contributor

@isabelatkinson isabelatkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thank you for the contribution!

@tychoish
Copy link
Contributor Author

Thanks @isabelatkinson @abr-egn! Do you schedule merges for things like this? Is there a regular release cadence for these libraries?

Cheers,
sam

@abr-egn abr-egn merged commit 9294ee5 into mongodb:main Jan 18, 2024
11 checks passed
@abr-egn
Copy link
Contributor

abr-egn commented Jan 18, 2024

We don't have a specific release schedule, but I anticipate releasing 2.9.0 with this included in a week or so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants