Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Handling Of Empty Collections #13

Merged
merged 2 commits into from
Feb 12, 2021
Merged

Conversation

JamieB-gu
Copy link
Contributor

@JamieB-gu JamieB-gu commented Feb 11, 2021

What does this change?

Paired with @alexduf on this work.

TL;DR It fixes a bug where the parser would incorrectly read the end of a given collection input, which caused problems attempting to parse empty collections.

Background

Thrift is a binary protocol. Decoders operate by working their way through a buffer of binary data and attempting to parse that into language-specific data types.

The bug we discovered exists in the code generated to read structs. The code generated by this project goes through a common process to read each field in the struct.

  1. It reads the preamble for each field to figure out what type and id (number) it is
  2. It checks that the type matches the expected type for the field with that id
  3. Depending on whether the type matches:
    1. If yes, it reads the amount of data from the buffer corresponding to the expected type
    2. If no, it skips over that field

The bug existed when step 3 was applied for collections. When parsing a collection you need to:

  1. Parse the start of the collection
  2. Parse the entries within the collection
  3. Parse the end of the collection

We discovered the issue for collections where the content didn't match the type that was expected, either because the collection was empty (in which case Thrift doesn't provide a type for the content) or because the type simply didn't match (e.g. list<string> rather than list<i32>). In these situations the parsing code would perform step 1, discover that the content types didn't match, and proceed to skip an amount of data equivalent to the collection size. There are two problems with this approach: a) it doesn't parse the end of the collection, and b) it may skip the wrong amount of data because the start of the collection has already been parsed and because it doesn't know how much data the collection contains.

The fix for the empty collection issue is to always parse the end of a collection, and not attempt to skip it, once we have parsed the start. That is what this PR addresses. We're going to do a follow-up to address the type mismatch problem.

Note: We've also added TypeScript as a dev dependency. Previously the tests assumed a globally installed copy of tsc, which isn't always available.

Changes

  • Added package.json and package-lock.json to make TypeScript a dev dependency
  • Updated tests to use locally installed version of tsc
  • Ignored node_modules
  • Update read_field templates to always read collections/maps to end
  • Added test cases for empty maps and empty lists

How to test

Use it against some data with empty collections. You can also run the tests that we've added.

How can we measure success?

It's able to parse empty collections in future.

- Added package.json and package-lock.json
- Updated tests to use locally installed version of tsc
- Ignored node_modules
- Update read_field templates to always read collections/maps to end
- Added test cases for empty maps and empty lists
Copy link
Member

@davidfurey davidfurey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you haven't added a test for "where the types within a collection don't match those expected".

I suspect that the bug still exists for this situation. I think you need to either:

  • roll back the startReadingStatement and then skip the whole field OR
  • read each item in the collection (potentially throwing away the values) and finish with endReadingStatement

@alexduf
Copy link

alexduf commented Feb 12, 2021

Hey David,
I've tried adding a test case for schemas not aligning and I couldn't quite prove that it fails. It does seem to ignore all further fields which isn't great.

But I think I want to address that in a different PR. It's linked to ensuring required fields are provided and throwing an exception if they aren't. This currently isn't implemented.

So I think it's more likely we'll have empty collections in prod than misaligned types, do you see an issue with us merging this change first to unblock the mobile team, then we'll address the other issues in a separate PR?

@davidfurey
Copy link
Member

Yes, that sounds fine to me. Can you edit the PR description to make it clear that it is only fixing the empty collection case? It makes lots of references to fixing the "type simply didn't match" issue which is misleading. Maybe also open an issue to track the required-fields and schema-mismatch work.

BTW- this is the first time I've really looked at this project. It is very impressive!

@JamieB-gu
Copy link
Contributor Author

Can you edit the PR description to make it clear that it is only fixing the empty collection case

Updated.

@JamieB-gu JamieB-gu merged commit 012f40a into main Feb 12, 2021
@JamieB-gu JamieB-gu deleted the fix-handling-of-empty-collections branch February 12, 2021 11:45
@alexduf
Copy link

alexduf commented Feb 12, 2021

Thanks David! I've created an issue here: #15

Feel free to weigh in if you have an opinion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants