Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for encoding directives, arg groups, length-prefixed e-expressions #796

Merged
merged 14 commits into from
Jul 31, 2024

Conversation

zslayton
Copy link
Contributor

@zslayton zslayton commented Jul 9, 2024

  • Adds support for switching encodings midstream
  • Adds support for parameter cardinality modifiers
  • Adds support for reading argument groups
  • Adds support for length-prefixed e-expressions
  • Optimizes and simplifies macro evaluation
  • Simplifies struct expansion logic
  • Adds static analysis optimizations to template evaluation
  • Adds support for lazily evaluating top-level e-expressions
  • Adds more Ion 1.1 encodings to the read_many_structs benchmark
  • Shrinks many data types

* Adds support for switching encodings midstream
* Adds support for parameter cardinality modifiers
* Adds support for reading argument groups
* Adds support for length-prefixed e-expressions
* Optimizes and simplifies macro evaluation
* Simplifies struct expansion logic
* Adds static analysis optimizations to template evaluation
* Adds support for lazily evaluating top-level e-expressions
* Adds more Ion 1.1 encodings to the `read_many_structs` benchmark
* Shrinks many data types

Switching encodings midstream, initial support for encoding directives

ParameterCardinality, transcribe raw binary 1.1 to text, version switching

refactor transcription, groundwork for variadic params

Adds non-required cardinalities

Read support for binary arg grouping bitmap

binary reader support for arg groups

minor optimizations

allows benchmarks to manually register templates

reworked e-expr parsing

Binary Ion v1.1 uses &'top for value, eexp types

minor optimizations 2

clippy suggestions

per-type read paths for string, int

better

Reworks EncodedValue offset calculation to save on size

Adds support for length-prefixed e-exprs

simplified struct iteration, rewrote eexp parsing

further streamlining of read_sequence_value_expr

Optimization for singleton template macros

Reorganizes read_many_structs benchmark

OneShotMacroEvaluator, EExpArgIterator trait, fixed ion-tests integration

upgradeable MacroEvaluator (OneShot -> Stacked)

Using new macro evaluator at root level

Struct expansion uses new macro evaluator

Optimizations for macro expansion

* Adds `read_resolved` method to the LazyExpandedValue level so
  all Ion-type specific logic can be handled once instead of twice.
* `MacroExpansion::initialize` now delegates the entire implementation
  the invocation's source kinds instead of dispatching on every method call.
* Universal singleton macro expansion optimization support.

Revised TemplateCompiler storage

Shrank macro evaluator sizes

simplified macro evaluation logic

+"optimal path" eexp benchmark, more method inlining

s/quote/literal/, clean up read_many_structs benchmark

remove erroneous profile json

Adds try_next! macro for Option<Result<_>> early returns

cleanup

clippy suggestions

fix doc links

more clippy suggestions
@zslayton zslayton requested a review from popematt July 9, 2024 21:20
@zslayton
Copy link
Contributor Author

zslayton commented Jul 9, 2024

Miri is failing due to #794. I'd like to investigate that after this PR is merged.

Copy link
Contributor Author

@zslayton zslayton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ PR Tour 🧭

@@ -21,7 +21,7 @@ edition = "2021"
# We need at least 1.65 for GATs[1] and 1.67 for `ilog`[2]
# [1] https://blog.rust-lang.org/2022/11/03/Rust-1.65.0.html
# [2] https://blog.rust-lang.org/2023/01/26/Rust-1.67.0.html#stabilized-apis
rust-version = "1.67"
rust-version = "1.77"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Bumping this again while we still can--1.77 added some nice const generics functionality for breaking apart byte arrays.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while we still can

There's a cfg_version RFC that will help with this.

Also, if we haven't already done so, we should define a MSRV policy for this crate.

@@ -3,27 +3,262 @@ use criterion::{criterion_group, criterion_main};
#[cfg(not(feature = "experimental"))]
mod benchmark {
use criterion::Criterion;

pub fn criterion_benchmark(_c: &mut Criterion) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ I've added a variety of binary 1.1 encodings to the read_many_structs benchmark.

let text_1_1_data = r#"(:event 1670446800245 418 "scheduler-thread-6" "example-client-1" "aws-us-east-5f-18b4fa" (: "region 4" "2022-12-07T20:59:59.744000Z"))"#.repeat(num_values);
let mut binary_1_1_data = vec![0xE0u8, 0x01, 0x01, 0xEA]; // IVM
#[rustfmt::skip]
let mut binary_1_1_data_body: Vec<u8> = vec![
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ The binary writer doesn't yet support argument groups, so I've hand-encoded the binary for now.

src/element/sequence.rs Show resolved Hide resolved
@@ -126,13 +134,23 @@ impl<'top> HasRange for LazyRawAnyVersionMarker<'top> {
}

impl<'top> RawVersionMarker<'top> for LazyRawAnyVersionMarker<'top> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ In order to support switching encodings mid-stream, I reworked the RawVersionMarker trait and its impls. I'll call it out again further on. The version() method now returns an IonVersion enum, while major_minor() returns the (major, minor) version tuple.

use crate::{Decoder, HasRange, HasSpan, IonResult, LazyExpandedValue, Span};

#[derive(Copy, Clone, Debug)]
pub struct EExpArg<'top, D: Decoder> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ These encoding-agnostic types should also be in their own file/module.

"Ion version {major}.{minor} is not supported"
));
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ This is now supported; the logic lives in the LazyAnyRawReader.

}
};
Ok(value_ref)
self.expanded_value.read_resolved()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ As described earlier, putting the resolution logic closer to the read logic allowed the compiler to make much better optimizations.

Comment on lines +215 to +231
lazy::expanded::compiler::TemplateCompiler,
lazy::expanded::template::TemplateMacro,
lazy::expanded::template::TemplateBodyExpr,
lazy::expanded::template::TemplateBodyExprKind,
lazy::expanded::macro_table::Macro,
lazy::expanded::macro_evaluator::MacroEvaluator,
lazy::expanded::macro_evaluator::MacroExpansionKind,
lazy::expanded::macro_table::MacroKind,
lazy::expanded::macro_table::MacroTable,
lazy::expanded::EncodingContext,
lazy::any_encoding::IonVersion,
lazy::binary::raw::reader::LazyRawBinaryReader_1_0,
lazy::binary::raw::v1_1::reader::LazyRawBinaryReader_1_1,
lazy::expanded::macro_evaluator::RawEExpression,
lazy::expanded::macro_evaluator::ValueExpr,
lazy::expanded::macro_evaluator::MacroExpr,
lazy::expanded::macro_evaluator::MacroExprKind,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Additional types exposed behind the experimental-reader-writer feature.

.map(|p| p.replace('/', &PATH_SEPARATOR.to_string()))
.map(|p| p.replace('/', PATH_SEPARATOR))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ (wise) clippy suggestion.

@@ -21,7 +21,7 @@ edition = "2021"
# We need at least 1.65 for GATs[1] and 1.67 for `ilog`[2]
# [1] https://blog.rust-lang.org/2022/11/03/Rust-1.65.0.html
# [2] https://blog.rust-lang.org/2023/01/26/Rust-1.67.0.html#stabilized-apis
rust-version = "1.67"
rust-version = "1.77"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while we still can

There's a cfg_version RFC that will help with this.

Also, if we haven't already done so, we should define a MSRV policy for this crate.

benches/read_many_structs.rs Outdated Show resolved Hide resolved
src/lazy/any_encoding.rs Show resolved Hide resolved
}

#[derive(Debug, Default, Copy, Clone, PartialEq)]
pub enum IonVersion {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a common/utility type that should be exported from the crate root, and maybe even be defined in the root module so that it's easily discoverable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is re-exported at the root, but I agree it should live in its own module and/or at the root. I'd like to leave that for another PR though.

src/lazy/any_encoding.rs Outdated Show resolved Hide resolved
src/lazy/expanded/template.rs Outdated Show resolved Hide resolved
src/lazy/expanded/template.rs Outdated Show resolved Hide resolved
src/lazy/expanded/compiler.rs Outdated Show resolved Hide resolved
src/lazy/expanded/macro_evaluator.rs Outdated Show resolved Hide resolved
src/lazy/expanded/macro_evaluator.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@popematt popematt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving so that we can review PR feedback changes in smaller PRs.

@zslayton zslayton merged commit b245172 into main Jul 31, 2024
29 of 32 checks passed
@zslayton zslayton deleted the encoding-directives branch July 31, 2024 00:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants