-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: bson builder should handle schema flexibly #2334
Conversation
Is this saying that the current implementation in main would skip bson documents that have unexpected schemas? And the fix here is we can now sanely handle those documents? What happens to the extra columns/fields? Are those just dropped? Also would we be able to throw together a quick test for this? |
Co-authored-by: Sean Smith <scsmithr@gmail.com>
I believe that the current mongodb implementation doesn't have this problem because we project out the fields that we don't expect to see, so that we're never in a case where a document would have a field we don't expect (caveat: nested documents/arrays are an edge case that I'd want to think about.) The failure mode, is that the query will fail, because we'll be building a result set, see a field we don't expect to see and return an error, which could happen with bson-files. For MongoDB if the schema inference does not pick up a field, the query will remove that field from the result set, so we are already dropping data silently (not great, but the solution is to not infer schema, (addressable with #2333),) so this just brings the implementations on par with each other.
So the bug, (which to be fair, isn't released), requires the schema inference to be "wrong," (or incomplete) relative to our expectations. Building a test case that has that would be pretty fragile (all of the new code is hit by the existing test, so I'm not worried about this being worse than baseline.) |
can we add some tests for this. |
In any case, I wrote the duplicate fields always picks the first one test as a rust unittest. |
The
RecordStructBuilder
for bson previously assumed (as is the casewith our MongoDB implementation) that it would never see bson
documents with fields that didn't appear in the schema/projection and
would error otherwise.
While it does mean that the schema inference (which for MongoDB is
done somewhat probabalistically) controls the projection, it does mean
that you will never see documents that have fields that aren't in the
schema.
For handling the pure-BSON document streams, it does mean that it
would be very easy to see a document that has fields that aren't in
the schema. This increases with #2333, but could easily happen
otherwise.