Fix Handling Of Empty Collections #13

JamieB-gu · 2021-02-11T16:01:51Z

What does this change?

Paired with @alexduf on this work.

TL;DR It fixes a bug where the parser would incorrectly read the end of a given collection input, which caused problems attempting to parse empty collections.

Background

Thrift is a binary protocol. Decoders operate by working their way through a buffer of binary data and attempting to parse that into language-specific data types.

The bug we discovered exists in the code generated to read structs. The code generated by this project goes through a common process to read each field in the struct.

It reads the preamble for each field to figure out what type and id (number) it is
It checks that the type matches the expected type for the field with that id
Depending on whether the type matches:
1. If yes, it reads the amount of data from the buffer corresponding to the expected type
2. If no, it skips over that field

The bug existed when step 3 was applied for collections. When parsing a collection you need to:

Parse the start of the collection
Parse the entries within the collection
Parse the end of the collection

We discovered the issue for collections where the content didn't match the type that was expected, either because the collection was empty (in which case Thrift doesn't provide a type for the content) or because the type simply didn't match (e.g. list<string> rather than list<i32>). In these situations the parsing code would perform step 1, discover that the content types didn't match, and proceed to skip an amount of data equivalent to the collection size. There are two problems with this approach: a) it doesn't parse the end of the collection, and b) it may skip the wrong amount of data because the start of the collection has already been parsed and because it doesn't know how much data the collection contains.

The fix for the empty collection issue is to always parse the end of a collection, and not attempt to skip it, once we have parsed the start. That is what this PR addresses. We're going to do a follow-up to address the type mismatch problem.

Note: We've also added TypeScript as a dev dependency. Previously the tests assumed a globally installed copy of tsc, which isn't always available.

Changes

Added package.json and package-lock.json to make TypeScript a dev dependency
Updated tests to use locally installed version of tsc
Ignored node_modules
Update read_field templates to always read collections/maps to end
Added test cases for empty maps and empty lists

How to test

Use it against some data with empty collections. You can also run the tests that we've added.

How can we measure success?

It's able to parse empty collections in future.

- Added package.json and package-lock.json - Updated tests to use locally installed version of tsc - Ignored node_modules

- Update read_field templates to always read collections/maps to end - Added test cases for empty maps and empty lists

davidfurey

It looks like you haven't added a test for "where the types within a collection don't match those expected".

I suspect that the bug still exists for this situation. I think you need to either:

roll back the startReadingStatement and then skip the whole field OR
read each item in the collection (potentially throwing away the values) and finish with endReadingStatement

alexduf · 2021-02-12T11:28:10Z

Hey David,
I've tried adding a test case for schemas not aligning and I couldn't quite prove that it fails. It does seem to ignore all further fields which isn't great.

But I think I want to address that in a different PR. It's linked to ensuring required fields are provided and throwing an exception if they aren't. This currently isn't implemented.

So I think it's more likely we'll have empty collections in prod than misaligned types, do you see an issue with us merging this change first to unblock the mobile team, then we'll address the other issues in a separate PR?

davidfurey · 2021-02-12T11:33:01Z

Yes, that sounds fine to me. Can you edit the PR description to make it clear that it is only fixing the empty collection case? It makes lots of references to fixing the "type simply didn't match" issue which is misleading. Maybe also open an issue to track the required-fields and schema-mismatch work.

BTW- this is the first time I've really looked at this project. It is very impressive!

JamieB-gu · 2021-02-12T11:35:02Z

Can you edit the PR description to make it clear that it is only fixing the empty collection case

Updated.

alexduf · 2021-02-12T12:03:36Z

Thanks David! I've created an issue here: #15

Feel free to weigh in if you have an opinion

JamieB-gu added 2 commits February 11, 2021 14:46

Added TypeScript as a dev dependency

20c92a9

- Added package.json and package-lock.json - Updated tests to use locally installed version of tsc - Ignored node_modules

Don't skip invalid maps and collections, read them to end

4993745

- Update read_field templates to always read collections/maps to end - Added test cases for empty maps and empty lists

JamieB-gu requested review from davidfurey, alexduf, mohammad-haque, frankie297 and groakland February 11, 2021 16:01

davidfurey requested changes Feb 11, 2021

View reviewed changes

davidfurey approved these changes Feb 12, 2021

View reviewed changes

JamieB-gu merged commit 012f40a into main Feb 12, 2021

JamieB-gu deleted the fix-handling-of-empty-collections branch February 12, 2021 11:45

alexduf mentioned this pull request Feb 12, 2021

Upgrade the thrift to ts compiler guardian/content-api-models#194

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Handling Of Empty Collections #13

Fix Handling Of Empty Collections #13

JamieB-gu commented Feb 11, 2021 •

edited

Loading

davidfurey left a comment

alexduf commented Feb 12, 2021

davidfurey commented Feb 12, 2021

JamieB-gu commented Feb 12, 2021

alexduf commented Feb 12, 2021

Fix Handling Of Empty Collections #13

Fix Handling Of Empty Collections #13

Conversation

JamieB-gu commented Feb 11, 2021 • edited Loading

What does this change?

Background

Changes

How to test

How can we measure success?

davidfurey left a comment

Choose a reason for hiding this comment

alexduf commented Feb 12, 2021

davidfurey commented Feb 12, 2021

JamieB-gu commented Feb 12, 2021

alexduf commented Feb 12, 2021

JamieB-gu commented Feb 11, 2021 •

edited

Loading