Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds incompleteness detection tests #819

Merged
merged 3 commits into from
Aug 22, 2024
Merged

Adds incompleteness detection tests #819

merged 3 commits into from
Aug 22, 2024

Conversation

zslayton
Copy link
Contributor

The text reader distinguishes between invalid input and incomplete input. If it encounters incomplete data, it tries to add more data to the buffer and try again. If it encounters invalid data, it surfaces the error to the user.

This PR adds a test utility called DataStraw--an io::Read implementation that only yields a single byte of input per call to read(). By doing this, all subslices of input are tested for correct incompleteness detection.

As you can imagine, this surfaced a number of bugs that needed to be fixed. I've fixed everything the tests surfaced in the Ion 1.1 text parser. In doing so I naturally also fixed the majority of bugs in the Ion 1.0 parser because they share lots of parsing logic. However, I have not set up the same tests for Ion 1.0 yet. I will do that in a follow-on PR.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@zslayton zslayton force-pushed the detect-incomplete branch 4 times, most recently from 93ea648 to a45832c Compare August 21, 2024 19:53
Copy link
Contributor Author

@zslayton zslayton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ PR Tour 🧭

}
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ I've disabled this check until we address #812.

Comment on lines -537 to +540
None => return IonResult::decoding_error(format!("macros in field name position must produce a single struct; '{:?}' produced nothing", invocation)),
None => {
// The macro produced an empty stream; return to reading from input.
return Ok(());
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ The ion-tests have a few examples where (:void) in struct field name position is elided. My implementation had required output of exactly-one struct. I've loosened it to zero-or-one. I imagine I'll loosen it further to zero-or-more later on.

Comment on lines +958 to +965
terminated(
Self::match_annotated_long_string_in_list.map(Some),
Self::match_delimiter_after_list_value,
)
.map(|maybe_matched| maybe_matched.map(RawValueExpr::ValueLiteral)),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ The 1.1 list iterator now checks for strings first to guarantee that its incompleteness detection will be used.

Comment on lines +1440 to +1452
value(IonType::Null, tag("null")),
value(IonType::Bool, tag("bool")),
value(IonType::Int, tag("int")),
value(IonType::Float, tag("float")),
value(IonType::Decimal, tag("decimal")),
value(IonType::Timestamp, tag("timestamp")),
value(IonType::Symbol, tag("symbol")),
value(IonType::String, tag("string")),
value(IonType::Clob, tag("clob")),
value(IonType::Blob, tag("blob")),
value(IonType::List, tag("list")),
value(IonType::SExp, tag("sexp")),
value(IonType::Struct, tag("struct")),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ If the input is exhausted before the match is complete, complete_tag will consider it a non-match and let some other parser branch try to handle it. If reaching this parser means that there's no other possible legal interpretation of the input, we don't want that.

If the input is null.timesta, it needs to be marked as Incomplete so no other parser tries to read it and the StreamingRawReader will add more data to the buffer and try again. That's what tag does -- exhausted input results in an Incomplete.

Comment on lines +1624 to +1625
value(MatchedFloat::PositiveInfinity, tag("+inf")),
value(MatchedFloat::NegativeInfinity, tag("-inf")),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ If the reader finds an unquoted + or -, it must be the beginning of an infinity. If input gets exhausted during this match, raise an Incomplete.

Comment on lines -2041 to +2238
pair(complete_tag("T"), Self::peek_stop_character),
pair(tag("T"), Self::peek_stop_character),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ All of the changes in timestamp are similar to this; if the parser has reached this point but runs out of data before the match, it's an Incomplete.

let tail_backwards = text
.chars()
.rev()
let mut head_chars = text.chars();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ This part of the PR is making sure that the error's Debug formatting only includes a ... in the head or tail output when the buffer has more data than what's shown.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the little things.

@zslayton zslayton marked this pull request as ready for review August 21, 2024 20:01
- Outdated
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this empty file added unintentionally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was not. Removed, thanks.

if bytes_read == 0 {
return Ok(0);
}
buf[0] = single_byte_buffer[0];
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is byte-by-byte enough for full coverage, or do you also need to test what happens when the stream returns 0 bytes at any point (i.e. having the DataStraw alternate between returning one byte and zero bytes)? Or is the reader not continuable, but just needs to be able to handle arbitrary-sized chunks being provided from the input?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading one byte at a time should be enough for full coverage. The reader is always operating on a fixed input buffer slice that holds (at least) the current top level value. If the reader hit an Incomplete and the input yielded zero bytes, the next call to Reader::next() would be trying again on the same buffer slice.

signature_params.len()
),
)
"macro {id} signature has {} parameter(s), e-expression had an extra argument",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha! I just made this change on my own branch. Would be handy to show what the extra argument is, too. In my case I had the following macro:

    (macro CN ($name $sum) (C_1 $name $sum (ONE)))

with an invocation like this:

(:CN "OutstandingRequests"  205e0 )

And arg_expr_cache as:

[
  EExpArg { parameter: Parameter { name: "$name", encoding: Tagged, cardinality: ExactlyOne, rest_syntax_policy: NotAllowed }, expr: ValueLiteral(text Ion v1.1 {"OutstandingRequests"}) },
  EExpArg { parameter: Parameter { name: "$sum", encoding: Tagged, cardinality: ExactlyOne, rest_syntax_policy: NotAllowed }, expr: ValueLiteral(text Ion v1.1 {2}) }
]

Tracking that down made me think I ought to review this PR :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I need to take a pass through the reader and improve all of the error messages. They're too spartan at the moment.

@zslayton zslayton merged commit 62f0b73 into main Aug 22, 2024
29 of 32 checks passed
@zslayton zslayton deleted the detect-incomplete branch August 22, 2024 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants