Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vrl-parser is a dependency of vrl-core and takes 130 seconds to build in a 150 second build #9547

Closed
blt opened this issue Oct 9, 2021 · 10 comments
Labels
domain: performance Anything related to Vector's performance type: enhancement A value-adding code change that enhances its existing functionality.

Comments

@blt
Copy link
Contributor

blt commented Oct 9, 2021

While working on #9531 I've found several opportunities to either remove dependencies or shuffle them around in the build order to improve build speed. See that ticket for details. When building Vector with cargo +nightly build -Z timings --no-default-features from 89e2b08 I find that the total build time on my system is 150 seconds and 130 seconds of that is vrl-parser. I haven't investigated why this crate is slow to build -- though lalrpop/lalrpop#65 and lalrpop/lalrpop#260 probably give an indication -- but it dominates build time.

Top-level vector requires vector-core/vrl which in turn depends on vrl-core. The dependency tree of vrl has a diamond present in it:

core -> {parser, compiler, diagnostic}
compiler -> {parser, diagnostic}
parser -> {diagnostic}
diagnostic -> {}
stdlib -> {core}
cli -> {core}

Reordering VRL internals to remove this diamond or flag out pieces of it that are not necessary unless we're actually parsing VRL in calling code would substantially improve our build speed. We might also avoid whatever makes the parser so slow to build, but that's probably a larger project considering the open tickets linked above.

@blt blt added type: enhancement A value-adding code change that enhances its existing functionality. domain: performance Anything related to Vector's performance labels Oct 9, 2021
@blt
Copy link
Contributor Author

blt commented Oct 9, 2021

/cc @StephenWakely @JeanMertz

@blt
Copy link
Contributor Author

blt commented Oct 11, 2021

Out of an abundance of curiosity I applied this diff to the project:

diff --git a/lib/vrl/parser/build.rs b/lib/vrl/parser/build.rs
index ff4095594..a94f9ab0f 100644
--- a/lib/vrl/parser/build.rs
+++ b/lib/vrl/parser/build.rs
@@ -1,9 +1,11 @@
 extern crate lalrpop;

 fn main() {
-    println!("cargo:rerun-if-changed=src/parser.lalrpop");
     lalrpop::Configuration::new()
-        .always_use_colors()
+        .generate_in_source_tree()
+        .emit_rerun_directives(true)
+        .emit_comments(true)
+        .emit_whitespace(true)
         .process_current_dir()
         .unwrap();
 }
diff --git a/lib/vrl/parser/src/lib.rs b/lib/vrl/parser/src/lib.rs
index 74b8defc1..a61b74e05 100644
--- a/lib/vrl/parser/src/lib.rs
+++ b/lib/vrl/parser/src/lib.rs
@@ -1,9 +1,10 @@
 use lalrpop_util::lalrpop_mod;
-lalrpop_mod!(
-    #[allow(clippy::all)]
-    #[allow(unused)]
-    parser
-);
+// lalrpop_mod!(
+//     #[allow(clippy::all)]
+//     #[allow(unused)]
+//     parser
+// );
+mod parser;

 pub mod ast;
 mod lex;

and find that the generated parser.rs is 250k lines long. That's roughly the same size as vector itself, so that, uh, explains the long compile time for this parser.

@StephenWakely
Copy link
Contributor

This makes me curious as to what those lines are doing?

@blt
Copy link
Contributor Author

blt commented Oct 11, 2021

A good deal of them seem to be lookup tables and massive match statements, lookup structures of some other sort.

@JeanMertz
Copy link
Contributor

One thing that has crept in over time, is that we added more pub rules to the parser grammar. We can remove those by having a single top-level pub rule, which should cut down on generated code significantly.

I'll assign myself to this issue so that I can work on it somewhere in the next couple of weeks.

@JeanMertz JeanMertz self-assigned this Oct 12, 2021
@JeanMertz
Copy link
Contributor

Here's an example on how we can solve this (or at least, how we can shrink the generated code, I'm unsure how significant the difference will be, but it is listed as a possible solution to code generation bloat):

https://github.com/lalrpop/lalrpop/pull/414/files#diff-9c6190d13e89889ea210c578c8a819c49ec247b652175aa4f2c432907f6a141cR13-R19

pub Top: Top = {
    "StartGrammar" <Grammar> => Top::Grammar(<>),
    "StartPattern" <Pattern> => Top::Pattern(<>),
    "StartMatchMapping" <MatchMapping> => Top::MatchMapping(<>),
    "StartTypeRef" <TypeRef> => Top::TypeRef(<>),
    "StartGrammarWhereClauses" <GrammarWhereClauses> => Top::GrammarWhereClauses(<>),
};

@JeanMertz
Copy link
Contributor

JeanMertz commented Oct 12, 2021

Also note that I already did apply this technique for our testing infrastructure:

// This nonterminal exists to aid in unit-testing. It exposes individual rules
// through the "t ..." declaration, to allow testing individual rules without
// having to generate parser functions for each rule, which kills build-times
// and exposes entries into the parser that we don't want or need.
pub Test: Test = {
// root
"?r" <Expr> => Test::Expr(<>),
// expressions
"?e" <Literal> => Test::Literal(<>),
"?e" <Container> => Test::Container(<>),
// arithmetic (math)
"?m" <ArithmeticExpr> => Test::Arithmetic(<>),
// atoms
"?a" <String> => Test::String(<>),
"?a" <Integer> => Test::Integer(<>),
"?a" <Float> => Test::Float(<>),
"?a" <Boolean> => Test::Boolean(<>),
"?a" <Null> => Test::Null(()),
"?a" <Regex> => Test::Regex(<>),
// collections
"?c" <Block> => Test::Block(<>),
"?c" <Array> => Test::Array(<>),
"?c" <Object> => Test::Object(<>),
// other
"?as" <Assignment> => Test::Assignment(<>),
"?fn" <FunctionCall> => Test::FunctionCall(<>),
"?q" <Query> => Test::Query(<>),
};

We basically have to do this for all of our pub rules (Program, Test, Query, Field, Literal), making one top-level rule, and then have it fan out to multiple non-pub rules based on a chosen prefix grammar.

@JeanMertz
Copy link
Contributor

Also, just to make sure we're on the same page; this should only happen on the first build, it shouldn't re-generate the parser on incremental builds as long as the grammar isn't updated, although I realize this isn't much of a solution for situations when you want to do a fresh build (or when you have no choice, such as in some CI set-ups).

@blt
Copy link
Contributor Author

blt commented Oct 12, 2021 via email

blt added a commit that referenced this issue Oct 27, 2021
This commit introduces a matrixed workflow to build images for soak
tests. It is not dynamic but does function. Build times are on the
order of 10 minutes per soak and we might want to consider whether
caching between soaks can be improved, though they all have different
feature flags in play. Improving #9547 would also help quite a bit here.

Closes #9531

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>
blt added a commit that referenced this issue Oct 27, 2021
This commit introduces a matrixed workflow to build images for soak
tests. It is not dynamic but does function. Build times are on the
order of 10 minutes per soak and we might want to consider whether
caching between soaks can be improved, though they all have different
feature flags in play. Improving #9547 would also help quite a bit here.

Closes #9531

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>
blt added a commit that referenced this issue Oct 27, 2021
This commit introduces a matrixed workflow to build images for soak
tests. It is not dynamic but does function. Build times are on the
order of 10 minutes per soak and we might want to consider whether
caching between soaks can be improved, though they all have different
feature flags in play. Improving #9547 would also help quite a bit here.

Closes #9531

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>
@blt blt mentioned this issue Oct 27, 2021
@fuchsnj
Copy link
Member

fuchsnj commented Jun 21, 2023

These are both VRL dependencies, which are no longer in this repo. The compilation time for VRL has also improved since then, so I'm going to close this.

@fuchsnj fuchsnj closed this as completed Jun 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: performance Anything related to Vector's performance type: enhancement A value-adding code change that enhances its existing functionality.
Projects
None yet
Development

No branches or pull requests

4 participants