Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update conformance testing DSL to support unknown symbols #833

Merged
merged 4 commits into from
Oct 16, 2024

Conversation

nirosys
Copy link
Contributor

@nirosys nirosys commented Sep 9, 2024

Issue #, if available: n/a

Description of changes:
This PR adds missing symbol notation for Toplevel and Produces clauses.

This turned out to be a lot simpler than I had originally thought. My first (few) pass(es) through the spec left my understanding to be that the notation could define symbol mappings, which would have to be created during the evaluation of the fragments, and supported when evaluating expectations. Unfortunately I didn't realize my misunderstanding until after having implemented creating the symbol mappings.

The functionality now allows symbol IDs to be specified as quoted symbols, with the #$ notation for identifying such symbol IDs.

Details

When the Toplevel fragment is triggered to encode it's contents into a new document fragment, it uses ProxyElement to wrap each of the ion elements that were defined within its body. ProxyElement implements WriteAsIon in order to hijack any symbols being written and detect any #$ prefixed symbols so that it can parse out the symbol table name, and offset. Once parsed, it will write the symbol ID to the document fragment.

When a Produces clause is evaluated, it will walk through its body of ion elements, while reading the concatenated ion document. As each element is evaluated for equality, any symbols found in the Produces' body will also be tested for #$ prefix, and offsets/symtabs extracted.

In addition to Toplevel and Produces support, the Denotes clause was updated to use a reader rather than the Element API, so that it can access the symbol IDs to complete its data model tests.

Additionally

I also used this PR to follow up on #828, and rolled the typed null handling into a read_null function and reset the ion_type back to its original for consistency. I'll follow up with a PR to return the underlying type from ion_type for all value implementations.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@nirosys nirosys marked this pull request as ready for review September 9, 2024 10:42
Comment on lines 197 to 192
return Ok(ValueRef::Null(self.ion_type()));
return self.read_null()?.resolve(context);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resolve method requires a giant match over all of the Ion types (see below). Since we already know it's a (possibly typed) null, I suggest return ValueRef::Null(self.read_null())?;

image

@@ -218,7 +213,7 @@ impl<'top> LazyRawValue<'top, BinaryEncoding_1_1> for &'top LazyRawBinaryValue_1
}

if value.is_null() {
return Ok(ValueRef::Null(value.ion_type()));
return value.read_null()?.resolve(context);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thought here RE: ValueRef::Null(...).

@@ -354,6 +349,16 @@ impl<'top> LazyRawBinaryValue_1_1<'top> {
<&'top Self as LazyRawValue<'top, BinaryEncoding_1_1>>::read(&self)
}

fn read_null(&self) -> IonResult<RawValueRef<'top, BinaryEncoding_1_1>> {
let tpe = if self.encoded_value.header.type_code() == OpcodeType::TypedNull {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let tpe = if self.encoded_value.header.type_code() == OpcodeType::TypedNull {
let ion_type = if self.encoded_value.header.type_code() == OpcodeType::TypedNull {

?

}
}
}
Continuation::Produces(produces) => produces.evaluate(ctx),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this was helpful.

@@ -230,6 +209,66 @@ pub(crate) struct EachBranch {
fragment: Fragment,
}

#[derive(Clone, Debug, Default)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could add docs and a reference link to these like you have on the other types?

@@ -306,3 +306,26 @@ pub(crate) fn parse_text_exp<'a, I: IntoIterator<Item=&'a Element>>(elems: I) ->
let val_string = bytes.iter().map(|v| unsafe { String::from_utf8_unchecked(v.to_vec()) }).collect();
Ok(val_string)
}

/// Parses absent symbol notation from a symbol within a Toplevel fragment, or produces
/// Continuation. The notation is '#$<id>' for an absent symbol id, or '#$<name>#<id>' for a symbol
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought #$<id> was the syntax for a symbol ID to be evaluated within the context being created instead of the current one. Is that what you mean by "absent"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, that's what I mean, that it's absent from the test document's context.

Comment on lines 144 to 147
// newtype to handle writing an Element, after we check to make sure it's not a symbol that has our
// special absent symbol sauce.
#[derive(Debug)]
pub(crate) struct ProxyElement<'a>(pub &'a Element);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a high-level comment to this type/file that explains the general motivation behind and process for creating/reading/writing ProxyElement instances?

true
}
}
ModelValue::Struct(_) => panic!("should not be using element eq for struct"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is it supposed to be using?

Copy link
Contributor Author

@nirosys nirosys Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I transitioned over to using a regular reader instead of the element API in order to access symbol IDs, so the handling of structs, lists, and s-expressions moved over to the LazyValue PartialEq. I cleaned this one up a bit and dropped a comment. Next time I touch this I'll probably get rid of the PartialEqs entirely in favor of something that can return a Result, and get rid of this separation.

}
}

impl<T: ion_rs::Decoder> PartialEq<ion_rs::LazyValue<'_, T>> for &ModelValue {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you do a comments pass?

@@ -11,6 +12,21 @@ use std::str::FromStr;
mod implementation {
use super::*;

#[test]
fn test_absent_symbol() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need tests for the #table#$id syntax?

@nirosys
Copy link
Contributor Author

nirosys commented Sep 12, 2024

As I was reviewing everything I realized I had somehow left out the actual symbol table resolutions. I had stashed some stuff when rolling back some changes and I guess it got lost in the mix. So the latest commit adds in the building of the ion-tests catalog, and validates symbol text between the referenced symbol table, and the current context.

I also tried to document everything mentioned, and added in a couple more tests for 'absent symbols'.

Copy link
Contributor

@zslayton zslayton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Please open an issue for TODO items you mentioned in the comments.

Comment on lines +15 to +16
symbol_tables: Vec<SharedSymbolTable>, // To build Catalogs when needed, Catalog doesn't
// require Clone.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the comment. Doesn't require Clone for...? Is it something we should change or track?

Should this be a MapCatalog instead so you can clone() it?


/// Returns the symbol at the provided offset `offset` within the symbol table named `symtab`,
/// if either the symbol table, or the offset, is not valid then None is returned.
pub fn get_symbol_from_table<S: AsRef<str>>(&self, symtab: S, offset: SymbolId) -> Option<Symbol> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub fn get_symbol_from_table<S: AsRef<str>>(&self, symtab: S, offset: SymbolId) -> Option<Symbol> {
pub fn get_symbol_from_table<S: AsRef<str>>(&self, symtab_name: S, offset: SymbolId) -> Option<Symbol> {

Comment on lines +219 to +220
/// Creates a reader using the provided context, and compares the read values from the input
/// document with the elements specified in the associated Produces clause for equality.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very helpful, thanks.

Comment on lines +13 to +16
//! read by the ion reader they're treated as normal quoted symbols. When the Toplevel fragment is
//! encoded however, each Element read from the Toplevel clause is wrapped within a `ProxyElement`,
//! which by implementing `WriteAsIon` is able to recursively look at all values being written to
//! the test's input and capture any symbols that match the unknown symbol notation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A (perhaps simpler?) way to achieve this would have been to make a Reader wrapper that provided its own implementation of the ElementReader trait.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into this more and sync up with you when I start working on #839, maybe I can sneak changes in there if it simplifies things.

@nirosys nirosys merged commit ec13464 into amazon-ion:main Oct 16, 2024
26 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants