-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds incompleteness detection tests #819
Conversation
93ea648
to
a45832c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ PR Tour 🧭
} | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ I've disabled this check until we address #812.
None => return IonResult::decoding_error(format!("macros in field name position must produce a single struct; '{:?}' produced nothing", invocation)), | ||
None => { | ||
// The macro produced an empty stream; return to reading from input. | ||
return Ok(()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ The ion-tests
have a few examples where (:void)
in struct field name position is elided. My implementation had required output of exactly-one struct. I've loosened it to zero-or-one. I imagine I'll loosen it further to zero-or-more later on.
terminated( | ||
Self::match_annotated_long_string_in_list.map(Some), | ||
Self::match_delimiter_after_list_value, | ||
) | ||
.map(|maybe_matched| maybe_matched.map(RawValueExpr::ValueLiteral)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ The 1.1 list iterator now checks for strings first to guarantee that its incompleteness detection will be used.
value(IonType::Null, tag("null")), | ||
value(IonType::Bool, tag("bool")), | ||
value(IonType::Int, tag("int")), | ||
value(IonType::Float, tag("float")), | ||
value(IonType::Decimal, tag("decimal")), | ||
value(IonType::Timestamp, tag("timestamp")), | ||
value(IonType::Symbol, tag("symbol")), | ||
value(IonType::String, tag("string")), | ||
value(IonType::Clob, tag("clob")), | ||
value(IonType::Blob, tag("blob")), | ||
value(IonType::List, tag("list")), | ||
value(IonType::SExp, tag("sexp")), | ||
value(IonType::Struct, tag("struct")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ If the input is exhausted before the match is complete, complete_tag
will consider it a non-match and let some other parser branch try to handle it. If reaching this parser means that there's no other possible legal interpretation of the input, we don't want that.
If the input is null.timesta
, it needs to be marked as Incomplete
so no other parser tries to read it and the StreamingRawReader
will add more data to the buffer and try again. That's what tag
does -- exhausted input results in an Incomplete
.
value(MatchedFloat::PositiveInfinity, tag("+inf")), | ||
value(MatchedFloat::NegativeInfinity, tag("-inf")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ If the reader finds an unquoted +
or -
, it must be the beginning of an infinity. If input gets exhausted during this match, raise an Incomplete
.
pair(complete_tag("T"), Self::peek_stop_character), | ||
pair(tag("T"), Self::peek_stop_character), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ All of the changes in timestamp
are similar to this; if the parser has reached this point but runs out of data before the match, it's an Incomplete
.
let tail_backwards = text | ||
.chars() | ||
.rev() | ||
let mut head_chars = text.chars(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ This part of the PR is making sure that the error's Debug
formatting only includes a ...
in the head
or tail
output when the buffer has more data than what's shown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the little things.
a45832c
to
e4fc43e
Compare
-
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this empty file added unintentionally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was not. Removed, thanks.
if bytes_read == 0 { | ||
return Ok(0); | ||
} | ||
buf[0] = single_byte_buffer[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is byte-by-byte enough for full coverage, or do you also need to test what happens when the stream returns 0 bytes at any point (i.e. having the DataStraw alternate between returning one byte and zero bytes)? Or is the reader not continuable, but just needs to be able to handle arbitrary-sized chunks being provided from the input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading one byte at a time should be enough for full coverage. The reader is always operating on a fixed input buffer slice that holds (at least) the current top level value. If the reader hit an Incomplete
and the input yielded zero bytes, the next call to Reader::next()
would be trying again on the same buffer slice.
signature_params.len() | ||
), | ||
) | ||
"macro {id} signature has {} parameter(s), e-expression had an extra argument", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha! I just made this change on my own branch. Would be handy to show what the extra argument is, too. In my case I had the following macro:
(macro CN ($name $sum) (C_1 $name $sum (ONE)))
with an invocation like this:
(:CN "OutstandingRequests" 205e0 )
And arg_expr_cache
as:
[
EExpArg { parameter: Parameter { name: "$name", encoding: Tagged, cardinality: ExactlyOne, rest_syntax_policy: NotAllowed }, expr: ValueLiteral(text Ion v1.1 {"OutstandingRequests"}) },
EExpArg { parameter: Parameter { name: "$sum", encoding: Tagged, cardinality: ExactlyOne, rest_syntax_policy: NotAllowed }, expr: ValueLiteral(text Ion v1.1 {2}) }
]
Tracking that down made me think I ought to review this PR :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I need to take a pass through the reader and improve all of the error messages. They're too spartan at the moment.
The text reader distinguishes between invalid input and incomplete input. If it encounters incomplete data, it tries to add more data to the buffer and try again. If it encounters invalid data, it surfaces the error to the user.
This PR adds a test utility called
DataStraw
--anio::Read
implementation that only yields a single byte of input per call toread()
. By doing this, all subslices of input are tested for correct incompleteness detection.As you can imagine, this surfaced a number of bugs that needed to be fixed. I've fixed everything the tests surfaced in the Ion 1.1 text parser. In doing so I naturally also fixed the majority of bugs in the Ion 1.0 parser because they share lots of parsing logic. However, I have not set up the same tests for Ion 1.0 yet. I will do that in a follow-on PR.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.