Adds binary 1.1 read support for e-expressions, macro expansion #789

zslayton · 2024-06-10T16:57:45Z

Introduces a RawBinaryEExpression_1_1 type (mirroring the existing RawTextEExpression_1_1 type) and adds methods to the 1.1 ImmutableBuffer to parse them. This requires access to the signature of the expression being parsed, which in turn requires access to the macro table. Previously, the buffer types (ImmutableBuffer, TextBufferView) each held a reference to the allocator in case they needed scratch space for caching child values or sanitizing/decoding input text. I have replaced the allocator field with a context field--the EncodingContextRef type allows access to both the allocator and the macro table.

Finally, this PR also wires the new binary e-expressions up to the macro evaluator for expansion. It adds a handful of unit tests demonstrating this.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…okes `annotations`

zslayton

🗺️ PR tour 🧭

zslayton · 2024-06-10T17:40:16Z

src/lazy/any_encoding.rs

-                Some(Ok(RawValueExpr::MacroInvocation(invocation))) => {
-                    Some(Ok(RawValueExpr::MacroInvocation(LazyRawAnyEExpression {
+                Some(Ok(RawValueExpr::EExp(invocation))) => {
+                    Some(Ok(RawValueExpr::EExp(LazyRawAnyEExpression {


🗺️ I renamed RawValueExpr::MacroInvocation to RawValueExpr::EExp because at the raw level we're always talking about syntactic elements.

zslayton · 2024-06-10T17:43:06Z

src/lazy/any_encoding.rs

-            Text_1_0(r) => Ok(r.next(allocator)?.into()),
+            Text_1_0(r) => Ok(r.next(context)?.into()),
            Binary_1_0(r) => Ok(r.next()?.into()),
-            Text_1_1(r) => Ok(r.next(allocator)?.into()),
-            Binary_1_1(r) => Ok(r.next()?.into()),
+            Text_1_1(r) => Ok(r.next(context)?.into()),
+            Binary_1_1(r) => Ok(r.next(context)?.into()),


🗺️ Most of the buffer types used to hold a reference to the bump allocator in case they needed to decode text escapes or cache child expressions. Now that the reader needs to parse binary e-expressions, the parser needs access to the macro table to look up the macro signature. The encoding context has a reference to both the allocator and the macro table, so now the buffers get a reference to the encoding context.

zslayton · 2024-06-10T17:48:16Z

src/lazy/binary/encoded_value.rs

-    pub annotations_header_length: u8,
+    pub annotations_header_length: u16,
    // The number of bytes used to encode the series of symbol IDs inside the annotations wrapper.
    pub annotations_sequence_length: u16,


🗺️ There was a disagreement between how Ion 1.0 and Ion 1.1 were using these fields.

Ion 1.1 annotations encodings have two parts: a header, and the sequence itself. It treated the annotations_header_length and annotations_sequence_length as descriptions of non-overlapping pieces of the encoding.

Ion 1.0 annotations encodings have several parts: a header, a wrapper length, a sequence length, and the sequence itself. It treated annotations_header_length as the complete length of all of these pieces combined and annotations_sequence_length as the number of bytes at the end of the header that comprised the sequence itself.

For the moment, I've adjusted 1.1's behavior to align with 1.0's. This required me to increase the size of the header field since it's storing the total length. I actually think 1.1's interpretation was better, but switching to that will require changing lots of small accessor methods so I've left it for a future PR.

zslayton · 2024-06-10T17:49:30Z

src/lazy/binary/raw/reader.rs

@@ -125,7 +125,7 @@ impl<'data> LazyRawReader<'data, BinaryEncoding_1_0> for LazyRawBinaryReader_1_0

    fn next<'top>(
        &'top mut self,
-        _allocator: &'top BumpAllocator,
+        _context: EncodingContextRef<'top>,


🗺️ The binary 1.0 reader is the only one that doesn't use anything from the encoding context (the allocator or the macro table) during parsing.

zslayton · 2024-06-10T17:52:25Z

src/lazy/binary/raw/v1_1/immutable_buffer.rs

+    }
+
+    #[test]
+    fn read_eexp_without_args() -> IonResult<()> {


🗺️ These tests confirm that the parser is capable of reading e-expressions based on the number of parameters in their signature. They do not perform any evaluation/expansion.

zslayton · 2024-06-10T18:23:03Z

src/lazy/reader.rs

+    #[test]
+    fn expand_binary_template_macro() -> IonResult<()> {
+        let macro_source = "(macro seventeen () 17)";
+        let encode_macro_fn = |address| vec![0xE0, 0x01, 0x01, 0xEA, address as u8];
+        expand_macro_test(macro_source, encode_macro_fn, |mut reader| {
+            assert_eq!(reader.expect_next()?.read()?.expect_i64()?, 17);
+            Ok(())
+        })
+    }


🗺️ The tests in this file are doing actual expansion of binary encoded e-expressions.

zslayton · 2024-06-10T18:25:38Z

src/lazy/text/buffer.rs

-                        Err(nom::Err::Failure(IonParseError::Invalid(error)))
-                    }
+        let (span, child_exprs) = match TextListSpanFinder_1_1::new(
+            self.context.allocator(),


🗺️ The change from self.allocator to self.context.allocator() caused rustfmt to reflow the entire expression.

zslayton · 2024-06-10T18:25:54Z

src/lazy/text/buffer.rs

-                        Err(nom::Err::Failure(IonParseError::Invalid(error)))
-                    }
+        let (span, fields) = match TextStructSpanFinder_1_1::new(
+            self.context.allocator(),


🗺️ Same here; rustfmt reflow.

zslayton · 2024-06-10T18:26:08Z

src/lazy/text/buffer.rs

-                        .with_description(format!("{}", e));
-                    Err(nom::Err::Failure(IonParseError::Invalid(error)))
+        let (span, child_expr_cache) =
+            match TextSExpSpanFinder_1_1::new(self.context.allocator(), sexp_iter)


🗺️ Reflow here too.

zslayton · 2024-06-10T18:27:12Z

src/lazy/text/raw/v1_1/reader.rs

@@ -34,6 +34,84 @@ pub struct LazyRawTextReader_1_1<'data> {
    local_offset: usize,
 }

+impl<'data> LazyRawReader<'data, TextEncoding_1_1> for LazyRawTextReader_1_1<'data> {


🗺️ I moved this impl block so it would be right below the definition of LazyRawTextReader_1_1. There are no logic changes.

nirosys · 2024-06-13T23:50:08Z

src/lazy/binary/raw/v1_1/immutable_buffer.rs

@@ -21,7 +29,7 @@ use std::ops::Range;
 /// and a copy of the `ImmutableBuffer` that starts _after_ the bytes that were parsed.
 ///
 /// Methods that `peek` at the input stream do not return a copy of the buffer.
-#[derive(PartialEq, Clone, Copy)]


Do you think there is any value in providing a PartialEq implementation that ignores the Context?

I think at one time I was using these buffets in unit tests and wanted assert_eq! to work. I don't think we're using them like that anymore though. We can/should give it a new PartialEq impl like that when we need one.

zslayton added 13 commits June 7, 2024 10:46

Adds binary 1.1 roundtrip unit tests, has_annotations no longer inv…

c78ae2f

…okes `annotations`

Implements reading binary FlexSym annotations

344826c

Fixes timestamp decoding of offsets

8b08c4d

cargo fmt

b8b9e1d

1.1 binary writer now uses 1 for UTC and 0 for unknown

bfa0f03

fixed typo

b71b6e3

clippy suggestion

9470b5b

rename length_code to low_nibble

2cef814

adds low_nibble accessor

d564826

Stubs out binary e-expr, adds BumpAllocator ref to binary buffer

7b4f4a7

plumb EncodingContextRef into buffer types

e1f2ba8

EncodingContext now owns its fields

d19abeb

wires up binary macro evaluation

0f8c894

zslayton marked this pull request as draft June 10, 2024 16:58

zslayton added 2 commits June 10, 2024 14:23

cleanup

afc58cf

cargo fmt

b196461

zslayton commented Jun 10, 2024

View reviewed changes

zslayton marked this pull request as ready for review June 10, 2024 18:27

zslayton requested review from popematt and nirosys June 10, 2024 18:27

Base automatically changed from binary-1_1-roundtrip to main June 10, 2024 18:29

Merge branch 'main' into binary-1_1-eexp

bcc1b62

nirosys approved these changes Jun 14, 2024

View reviewed changes

zslayton merged commit caebdbd into main Jun 14, 2024
32 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds binary 1.1 read support for e-expressions, macro expansion #789

Adds binary 1.1 read support for e-expressions, macro expansion #789

zslayton commented Jun 10, 2024 •

edited

Loading

zslayton left a comment

zslayton Jun 10, 2024

zslayton Jun 10, 2024

zslayton Jun 10, 2024

zslayton Jun 10, 2024

zslayton Jun 10, 2024

zslayton Jun 10, 2024

zslayton Jun 10, 2024

zslayton Jun 10, 2024

zslayton Jun 10, 2024

zslayton Jun 10, 2024

nirosys Jun 13, 2024

zslayton Jun 14, 2024

Adds binary 1.1 read support for e-expressions, macro expansion #789

Adds binary 1.1 read support for e-expressions, macro expansion #789

Conversation

zslayton commented Jun 10, 2024 • edited Loading

zslayton left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zslayton commented Jun 10, 2024 •

edited

Loading