vrl-parser is a dependency of vrl-core and takes 130 seconds to build in a 150 second build #9547

blt · 2021-10-09T19:07:35Z

While working on #9531 I've found several opportunities to either remove dependencies or shuffle them around in the build order to improve build speed. See that ticket for details. When building Vector with cargo +nightly build -Z timings --no-default-features from 89e2b08 I find that the total build time on my system is 150 seconds and 130 seconds of that is vrl-parser. I haven't investigated why this crate is slow to build -- though lalrpop/lalrpop#65 and lalrpop/lalrpop#260 probably give an indication -- but it dominates build time.

Top-level vector requires vector-core/vrl which in turn depends on vrl-core. The dependency tree of vrl has a diamond present in it:

core -> {parser, compiler, diagnostic}
compiler -> {parser, diagnostic}
parser -> {diagnostic}
diagnostic -> {}
stdlib -> {core}
cli -> {core}

Reordering VRL internals to remove this diamond or flag out pieces of it that are not necessary unless we're actually parsing VRL in calling code would substantially improve our build speed. We might also avoid whatever makes the parser so slow to build, but that's probably a larger project considering the open tickets linked above.

The text was updated successfully, but these errors were encountered:

blt · 2021-10-09T19:07:57Z

/cc @StephenWakely @JeanMertz

blt · 2021-10-11T20:12:00Z

Out of an abundance of curiosity I applied this diff to the project:

diff --git a/lib/vrl/parser/build.rs b/lib/vrl/parser/build.rs
index ff4095594..a94f9ab0f 100644
--- a/lib/vrl/parser/build.rs
+++ b/lib/vrl/parser/build.rs
@@ -1,9 +1,11 @@
 extern crate lalrpop;

 fn main() {
-    println!("cargo:rerun-if-changed=src/parser.lalrpop");
     lalrpop::Configuration::new()
-        .always_use_colors()
+        .generate_in_source_tree()
+        .emit_rerun_directives(true)
+        .emit_comments(true)
+        .emit_whitespace(true)
         .process_current_dir()
         .unwrap();
 }
diff --git a/lib/vrl/parser/src/lib.rs b/lib/vrl/parser/src/lib.rs
index 74b8defc1..a61b74e05 100644
--- a/lib/vrl/parser/src/lib.rs
+++ b/lib/vrl/parser/src/lib.rs
@@ -1,9 +1,10 @@
 use lalrpop_util::lalrpop_mod;
-lalrpop_mod!(
-    #[allow(clippy::all)]
-    #[allow(unused)]
-    parser
-);
+// lalrpop_mod!(
+//     #[allow(clippy::all)]
+//     #[allow(unused)]
+//     parser
+// );
+mod parser;

 pub mod ast;
 mod lex;

and find that the generated parser.rs is 250k lines long. That's roughly the same size as vector itself, so that, uh, explains the long compile time for this parser.

StephenWakely · 2021-10-11T21:10:53Z

This makes me curious as to what those lines are doing?

blt · 2021-10-11T21:20:04Z

A good deal of them seem to be lookup tables and massive match statements, lookup structures of some other sort.

JeanMertz · 2021-10-12T10:01:44Z

One thing that has crept in over time, is that we added more pub rules to the parser grammar. We can remove those by having a single top-level pub rule, which should cut down on generated code significantly.

I'll assign myself to this issue so that I can work on it somewhere in the next couple of weeks.

JeanMertz · 2021-10-12T10:07:01Z

Here's an example on how we can solve this (or at least, how we can shrink the generated code, I'm unsure how significant the difference will be, but it is listed as a possible solution to code generation bloat):

https://github.com/lalrpop/lalrpop/pull/414/files#diff-9c6190d13e89889ea210c578c8a819c49ec247b652175aa4f2c432907f6a141cR13-R19

pub Top: Top = {
    "StartGrammar" <Grammar> => Top::Grammar(<>),
    "StartPattern" <Pattern> => Top::Pattern(<>),
    "StartMatchMapping" <MatchMapping> => Top::MatchMapping(<>),
    "StartTypeRef" <TypeRef> => Top::TypeRef(<>),
    "StartGrammarWhereClauses" <GrammarWhereClauses> => Top::GrammarWhereClauses(<>),
};

JeanMertz · 2021-10-12T10:08:38Z

Also note that I already did apply this technique for our testing infrastructure:

vector/lib/vrl/parser/src/parser.lalrpop

Lines 99 to 131 in 9774570

    
           // This nonterminal exists to aid in unit-testing. It exposes individual rules 
        
           // through the "t ..." declaration, to allow testing individual rules without 
        
           // having to generate parser functions for each rule, which kills build-times 
        
           // and exposes entries into the parser that we don't want or need. 
        
           pub Test: Test = { 
        
               // root 
        
               "?r" <Expr> => Test::Expr(<>), 
        
               // expressions 
        
               "?e" <Literal> => Test::Literal(<>), 
        
               "?e" <Container> => Test::Container(<>), 
        
               // arithmetic (math) 
        
               "?m" <ArithmeticExpr> => Test::Arithmetic(<>), 
        
               // atoms 
        
               "?a" <String> => Test::String(<>), 
        
               "?a" <Integer> => Test::Integer(<>), 
        
               "?a" <Float> => Test::Float(<>), 
        
               "?a" <Boolean> => Test::Boolean(<>), 
        
               "?a" <Null> => Test::Null(()), 
        
               "?a" <Regex> => Test::Regex(<>), 
        
               // collections 
        
               "?c" <Block> => Test::Block(<>), 
        
               "?c" <Array> => Test::Array(<>), 
        
               "?c" <Object> => Test::Object(<>), 
        
               // other 
        
               "?as" <Assignment> => Test::Assignment(<>), 
        
               "?fn" <FunctionCall> => Test::FunctionCall(<>), 
        
               "?q" <Query> => Test::Query(<>), 
        
           };

We basically have to do this for all of our pub rules (Program, Test, Query, Field, Literal), making one top-level rule, and then have it fan out to multiple non-pub rules based on a chosen prefix grammar.

JeanMertz · 2021-10-12T10:32:47Z

Also, just to make sure we're on the same page; this should only happen on the first build, it shouldn't re-generate the parser on incremental builds as long as the grammar isn't updated, although I realize this isn't much of a solution for situations when you want to do a fresh build (or when you have no choice, such as in some CI set-ups).

blt · 2021-10-12T14:45:38Z

That’s true but in practice we hit conditions that make rebuilds likely, primarily on account of the dependency relationships in the project, I think. Like you say clean builds and CI builds are often doing a vrl-parser build but any build that changes feature flags will have to as well. It’s typical for core work to be done with a small set of flags on top of no-default to shave overall build time and the soak test revamp will follow the same pattern. I’m hopeful that with caching we can get some incremental builds going across PRs but I expect differing feature flag sets will make it unlikely. Whence the concern over dependencies and build times I’ve showed recently. I appreciate your both looking at this. Getting the parser build time shaved down would be a big help.

…

On Tue, Oct 12, 2021, at 03:32, Jean Mertz wrote: Also, just to make sure we're on the same page; this should only happen on the *first build*, it shouldn't re-generate the parser on incremental builds as long as the grammar isn't updated, although I realize this isn't much of a solution for situations when you *want* to do a fresh build (or when you have no choice, such as in some CI set-ups). — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#9547 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAA36YBGTYE7GOPVBKALFDUGQFFVANCNFSM5FVSUVMQ>. Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

This commit introduces a matrixed workflow to build images for soak tests. It is not dynamic but does function. Build times are on the order of 10 minutes per soak and we might want to consider whether caching between soaks can be improved, though they all have different feature flags in play. Improving #9547 would also help quite a bit here. Closes #9531 Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

fuchsnj · 2023-06-21T14:07:29Z

These are both VRL dependencies, which are no longer in this repo. The compilation time for VRL has also improved since then, so I'm going to close this.

blt added type: enhancement A value-adding code change that enhances its existing functionality. domain: performance Anything related to Vector's performance labels Oct 9, 2021

JeanMertz self-assigned this Oct 12, 2021

blt mentioned this issue Oct 27, 2021

chore: Build images for each soak test #9789

Merged

blt mentioned this issue Oct 27, 2021

Vrl Performance #9811

Closed

jszwedko unassigned JeanMertz Dec 29, 2022

fuchsnj closed this as completed Jun 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vrl-parser is a dependency of vrl-core and takes 130 seconds to build in a 150 second build #9547

vrl-parser is a dependency of vrl-core and takes 130 seconds to build in a 150 second build #9547

blt commented Oct 9, 2021

blt commented Oct 9, 2021

blt commented Oct 11, 2021

StephenWakely commented Oct 11, 2021

blt commented Oct 11, 2021

JeanMertz commented Oct 12, 2021

JeanMertz commented Oct 12, 2021

JeanMertz commented Oct 12, 2021 •

edited

Loading

JeanMertz commented Oct 12, 2021

blt commented Oct 12, 2021 via email

fuchsnj commented Jun 21, 2023

vrl-parser is a dependency of vrl-core and takes 130 seconds to build in a 150 second build #9547

vrl-parser is a dependency of vrl-core and takes 130 seconds to build in a 150 second build #9547

Comments

blt commented Oct 9, 2021

blt commented Oct 9, 2021

blt commented Oct 11, 2021

StephenWakely commented Oct 11, 2021

blt commented Oct 11, 2021

JeanMertz commented Oct 12, 2021

JeanMertz commented Oct 12, 2021

JeanMertz commented Oct 12, 2021 • edited Loading

JeanMertz commented Oct 12, 2021

blt commented Oct 12, 2021 via email

fuchsnj commented Jun 21, 2023

JeanMertz commented Oct 12, 2021 •

edited

Loading