Adds incompleteness detection tests #819

zslayton · 2024-08-21T19:20:51Z

The text reader distinguishes between invalid input and incomplete input. If it encounters incomplete data, it tries to add more data to the buffer and try again. If it encounters invalid data, it surfaces the error to the user.

This PR adds a test utility called DataStraw--an io::Read implementation that only yields a single byte of input per call to read(). By doing this, all subslices of input are tested for correct incompleteness detection.

As you can imagine, this surfaced a number of bugs that needed to be fixed. I've fixed everything the tests surfaced in the Ion 1.1 text parser. In doing so I naturally also fixed the majority of bugs in the Ion 1.0 parser because they share lots of parsing logic. However, I have not set up the same tests for Ion 1.0 yet. I will do that in a follow-on PR.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

zslayton

🗺️ PR Tour 🧭

zslayton · 2024-08-21T19:22:14Z

src/lazy/encoder/text/v1_0/value_writer.rs

-        }
-    }
-}
-


🗺️ I've disabled this check until we address #812.

zslayton · 2024-08-21T19:24:47Z

src/lazy/expanded/struct.rs

-            None => return IonResult::decoding_error(format!("macros in field name position must produce a single struct; '{:?}' produced nothing", invocation)),
+            None => {
+                // The macro produced an empty stream; return to reading from input.
+                return Ok(());
+            }


🗺️ The ion-tests have a few examples where (:void) in struct field name position is elided. My implementation had required output of exactly-one struct. I've loosened it to zero-or-one. I imagine I'll loosen it further to zero-or-more later on.

zslayton · 2024-08-21T19:36:44Z

src/lazy/text/buffer.rs

+            terminated(
+                Self::match_annotated_long_string_in_list.map(Some),
+                Self::match_delimiter_after_list_value,
+            )
+            .map(|maybe_matched| maybe_matched.map(RawValueExpr::ValueLiteral)),


🗺️ The 1.1 list iterator now checks for strings first to guarantee that its incompleteness detection will be used.

zslayton · 2024-08-21T19:46:42Z

src/lazy/text/buffer.rs

+            value(IonType::Null, tag("null")),
+            value(IonType::Bool, tag("bool")),
+            value(IonType::Int, tag("int")),
+            value(IonType::Float, tag("float")),
+            value(IonType::Decimal, tag("decimal")),
+            value(IonType::Timestamp, tag("timestamp")),
+            value(IonType::Symbol, tag("symbol")),
+            value(IonType::String, tag("string")),
+            value(IonType::Clob, tag("clob")),
+            value(IonType::Blob, tag("blob")),
+            value(IonType::List, tag("list")),
+            value(IonType::SExp, tag("sexp")),
+            value(IonType::Struct, tag("struct")),


🗺️ If the input is exhausted before the match is complete, complete_tag will consider it a non-match and let some other parser branch try to handle it. If reaching this parser means that there's no other possible legal interpretation of the input, we don't want that.

If the input is null.timesta, it needs to be marked as Incomplete so no other parser tries to read it and the StreamingRawReader will add more data to the buffer and try again. That's what tag does -- exhausted input results in an Incomplete.

zslayton · 2024-08-21T19:48:38Z

src/lazy/text/buffer.rs

+            value(MatchedFloat::PositiveInfinity, tag("+inf")),
+            value(MatchedFloat::NegativeInfinity, tag("-inf")),


🗺️ If the reader finds an unquoted + or -, it must be the beginning of an infinity. If input gets exhausted during this match, raise an Incomplete.

zslayton · 2024-08-21T19:55:40Z

src/lazy/text/buffer.rs

-            pair(complete_tag("T"), Self::peek_stop_character),
+            pair(tag("T"), Self::peek_stop_character),


🗺️ All of the changes in timestamp are similar to this; if the parser has reached this point but runs out of data before the match, it's an Incomplete.

zslayton · 2024-08-21T19:59:55Z

src/lazy/text/parse_result.rs

-                let tail_backwards = text
-                    .chars()
-                    .rev()
+                let mut head_chars = text.chars();


🗺️ This part of the PR is making sure that the error's Debug formatting only includes a ... in the head or tail output when the buffer has more data than what's shown.

It's the little things.

tgregg · 2024-08-21T21:19:49Z

-

Was this empty file added unintentionally?

It was not. Removed, thanks.

tgregg · 2024-08-21T21:30:59Z

tests/ion_tests/mod.rs

+        if bytes_read == 0 {
+            return Ok(0);
+        }
+        buf[0] = single_byte_buffer[0];


Is byte-by-byte enough for full coverage, or do you also need to test what happens when the stream returns 0 bytes at any point (i.e. having the DataStraw alternate between returning one byte and zero bytes)? Or is the reader not continuable, but just needs to be able to handle arbitrary-sized chunks being provided from the input?

Reading one byte at a time should be enough for full coverage. The reader is always operating on a fixed input buffer slice that holds (at least) the current top level value. If the reader hit an Incomplete and the input yielded zero bytes, the next call to Reader::next() would be trying again on the same buffer slice.

jobarr-amzn · 2024-08-22T04:45:07Z

src/lazy/text/buffer.rs

-                        signature_params.len()
-                    ),
-                )
+                    "macro {id} signature has {} parameter(s), e-expression had an extra argument",


Ha! I just made this change on my own branch. Would be handy to show what the extra argument is, too. In my case I had the following macro:

(macro CN ($name $sum) (C_1 $name $sum (ONE)))

with an invocation like this:

(:CN "OutstandingRequests" 205e0 )

And arg_expr_cache as:

[ EExpArg { parameter: Parameter { name: "$name", encoding: Tagged, cardinality: ExactlyOne, rest_syntax_policy: NotAllowed }, expr: ValueLiteral(text Ion v1.1 {"OutstandingRequests"}) }, EExpArg { parameter: Parameter { name: "$sum", encoding: Tagged, cardinality: ExactlyOne, rest_syntax_policy: NotAllowed }, expr: ValueLiteral(text Ion v1.1 {2}) } ]

Tracking that down made me think I ought to review this PR :D

Yeah, I need to take a pass through the reader and improve all of the error messages. They're too spartan at the moment.

…ssues

zslayton assigned tgregg and popematt Aug 21, 2024

zslayton force-pushed the detect-incomplete branch 4 times, most recently from 93ea648 to a45832c Compare August 21, 2024 19:53

zslayton commented Aug 21, 2024

View reviewed changes

zslayton marked this pull request as ready for review August 21, 2024 20:01

Adds incompleteness detection tests

e4fc43e

zslayton force-pushed the detect-incomplete branch from a45832c to e4fc43e Compare August 21, 2024 20:02

tgregg approved these changes Aug 21, 2024

View reviewed changes

removed empty file

62c14b0

jobarr-amzn reviewed Aug 22, 2024

View reviewed changes

canonicalize skip list file names to avoid (e.g.) Windows backslash i…

10edd72

…ssues

zslayton merged commit 62f0b73 into main Aug 22, 2024
29 of 32 checks passed

zslayton deleted the detect-incomplete branch August 22, 2024 13:20

zslayton mentioned this pull request Aug 23, 2024

Incomplete text detection tests for v1.0 #823

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds incompleteness detection tests #819

Adds incompleteness detection tests #819

zslayton commented Aug 21, 2024

zslayton left a comment

zslayton Aug 21, 2024

zslayton Aug 21, 2024

zslayton Aug 21, 2024

zslayton Aug 21, 2024

zslayton Aug 21, 2024

zslayton Aug 21, 2024

zslayton Aug 21, 2024

jobarr-amzn Aug 22, 2024

tgregg Aug 21, 2024

zslayton Aug 21, 2024

tgregg Aug 21, 2024

zslayton Aug 21, 2024

jobarr-amzn Aug 22, 2024

zslayton Aug 22, 2024

		value(MatchedFloat::PositiveInfinity, tag("+inf")),
		value(MatchedFloat::NegativeInfinity, tag("-inf")),

		pair(complete_tag("T"), Self::peek_stop_character),
		pair(tag("T"), Self::peek_stop_character),

Adds incompleteness detection tests #819

Adds incompleteness detection tests #819

Conversation

zslayton commented Aug 21, 2024

zslayton left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment