Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Painless Compiler Extensibility #53702

Open
stu-elastic opened this issue Mar 18, 2020 · 2 comments
Open

Painless Compiler Extensibility #53702

stu-elastic opened this issue Mar 18, 2020 · 2 comments
Assignees
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache >refactoring Team:Core/Infra Meta label for core/infra team

Comments

@stu-elastic
Copy link
Contributor

The Painless compiler can produce more performant code and allow better integration with other query languages by reworking it’s internal structure.

Changing the structure allows us to:

  • Substantially increase runtime performance
  • Integrate other frontends

After this work, we will be able to:

  • Increase painless runtime performance to be comparable to expressions (~20% improvement)
  • Allow SQL to use the painless compiler without awkward workarounds

The monolithic implementation of the existing compiler complicates both of these goals.

We will change the implementation of the compiler to a modern structure:

Frontend -> Intermediate Representation -> Compiler Phases -> Backend

This will allow us to incrementally add performance enhancements and provide an obvious integration point for other languages (they provide the frontend & initial IR).

Background:

Painless uses a single tree with several embedded phases for doing:

  • semantic checking
  • performance improvements
  • bytecode generation
  • collecting appropriate information for determining availability and cacheability of certain input parameters where each of these could happen in the same phase or across multiple phases.

This design has reached its limits, it has several problems:

  • The nodes contained a significant amount of mutable state that changed between phases.
  • The tree itself was mutable due to removal of nodes for constant folding and additional nodes for injection of class scope functions and fields and casting.
  • Tree mutability left certain portions of the tree in possible different states during single phase traversal.
  • Due to the previous issues this made it very difficult to add performance improvements and allow for extensibility for use in other areas such as SQL.
  • Small changes bleed throughout the implementation, a very large state space complicates maintainability.

In progress:

The tree is currently split into a "user" tree and an "IR" tree.

User Tree:
  • Representative of direct input from the generation source, the script author.
  • Nearly immutable at this point in time with some work left to complete to get there fully.
  • Must be checked for semantic validity.
  • Used to generate an IR tree directly. In future work, we will explore adding an extensibility point to produce other types of serialization such as JSON for additional debugging features.
IR Tree:
  • A mutable intermediate representation used by compiler phases to optimize runtime performance.
  • Immutability may make sense, must be investigated to avoid GC issues.
  • Is semantically valid allowing for easier modification.
  • Generates bytecode to create a Java class.

Outcomes:

  • An immutable user tree which is fully representative of the original script.
  • Add an API to allow extensibility for this tree to transform into any type of serialization.
  • Add new external performance phases to the IR tree such as script context-specific optimizations.
    • If a doc is read-only, we can propagate it as a constant and avoid unnecessary map lookups.

Related PRs:
#51278
#51452
#51452
#51690
#51776
#51954
#52612
#52783
#52915
#53075
#53348
#53685

Related Issues:
#49870
#49869

@stu-elastic stu-elastic added :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache >refactoring labels Mar 18, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (:Core/Infra/Scripting)

jdconrad added a commit that referenced this issue Apr 6, 2020
…4443)

This removes the "statement" field from the user tree expression nodes 
output. Instead, we delegate responsibility to check whether or not 
something is a statement to each individual user tree expression node 
based on whether or not the result is read from using input from the 
parent. This removes unnecessary state from the output, is more reliable 
as each node can determine its own correct behavior, and results in better 
"not a statement" error messages to give the user an idea of what is 
considered not a statement.

Relates #53702
jdconrad added a commit that referenced this issue Apr 21, 2020
The ILambda user tree node is no longer necessary with the addition of the ir 
tree as it was only supporting writing the ASM, but had no bearing on semantic 
checking. This moves the code necessary to build the def lambda recipes into 
the ir tree since we know that they are correct for compile time and the 
information necessary to do this is already known by the correct ir nodes.

This change also splits up the ir reference nodes into 
DefReferenceInterfaceNode, TypedInterfaceReferenceNode, and 
TypedCaptureReferenceNode which covers all possible cases of known 
information at compile time for method references. Each of these nodes is built 
by the user tree nodes as necessary.

Relates #53702
Closes #54015
jdconrad added a commit that referenced this issue May 4, 2020
PainlessCast currently exists as mutable state on the AExpression node, but this 
is no longer necessary as each cast is only used directly in the semantic pass 
after its creation. This change moves it to be local state during the semantic 
pass as opposed to mutable state on the nodes.

Relates #53702
@rjernst rjernst added the Team:Core/Infra Meta label for core/infra team label May 4, 2020
@rjernst rjernst added the needs:triage Requires assignment of a team area label label Dec 3, 2020
@stu-elastic stu-elastic removed the needs:triage Requires assignment of a team area label label Dec 9, 2020
@stu-elastic
Copy link
Contributor Author

Still in progress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache >refactoring Team:Core/Infra Meta label for core/infra team
Projects
None yet
Development

No branches or pull requests

4 participants