Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#992: added design doc for script options #477

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

tomuben
Copy link
Collaborator

@tomuben tomuben commented Nov 15, 2024

@tomuben tomuben force-pushed the doc/992_add_design_doc_for_script_options branch from 3522853 to 0a2684d Compare November 15, 2024 14:59
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you create these? If it is drawio, please use the extension drawio.png otherwise it is not clear how to change them and the tooling won't recognize them.


#### Legacy Parser

The legacy parser (V1) parser searches for one specific script option. The parser starts from beginning of the script code. If found, the parser immediately removes the script option from the script code and returns the option value. It validates the
Copy link
Collaborator

@tkilias tkilias Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a narrow interpretation of arc42, the removal would be part of the runtime view and not the building blocks.


Tags: V2

### General Parser Integration
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for the DB

@tomuben tomuben force-pushed the doc/992_add_design_doc_for_script_options branch from b6dca79 to 207e27f Compare November 22, 2024 16:15
@@ -0,0 +1,27 @@
#!/bin/bash
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should at some Point replace this with a nox session, please create an issue

## Constraints

- The parser implementation must be in C++.
- The chosen parser implementation is [ctpg](https://github.com/peter-winter/ctpg), which supports the definition of Lexer and Parser Rules in C++ code.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a constraint, but a decision that follows from the constraints


![Components](diagrams/OveralScriptOptionalsBuildingBlocks.drawio.png)

At the very high level there can be distinguished between the generic "Script Options Parser" module which parses a UDF script code and returns the found script options, and the "Script Options Parser Handler" which converts the Java UDF specific script options. In both modules there are specific implementation for the legacy parser and the new CTPG based parser.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add reason, why we need to implementation


### Script Options Parser

The parser component can be used to parse any script code (Java, Python, R) for any script options. It provides simplistic interfaces, which are different between the two versions, which accept the script code as input and return the found script option(s).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add reason, why the interfaces are different. Here, they work inherently different


As the parser needs to find script options in any given script code, the generated parser must accept any strings which are not script options and ignore those. In order to achieve this, the lexer rules need to be as simple as possible, in order to avoid collisions.

It is important to emphasize that in contrast to the legacy parser, the caller is responsible for removing the script options from the script code.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a good reason?


![LegacyParserHandler](diagrams/LegacyParserHandler.drawio.png)

The `ScriptOptionsLinesParserLegacy` class uses the Parser to search for Java specific script options and forwards the found options to class `ConverterLegacy`, which uses a common implementation for the conversion of the options.
Copy link
Collaborator

@tkilias tkilias Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `ScriptOptionsLinesParserLegacy` class uses the Parser to search for Java specific script options and forwards the found options to class `ConverterLegacy`, which uses a common implementation for the conversion of the options.
The `ScriptOptionsLinesParserLegacy` class uses the Parser to search for Java specific script options and forwards the found options to the class `ConverterLegacy`, which uses a common implementation for the conversion of the options.

common implementation is misleading, because it suggests both handler use it


![CTPGParserHandler](diagrams/CTPGParserHandler.drawio.png)

The `ScriptOptionsLinesParserCTPG` class uses the new CTPG basedParser to search for **all** Java specific script options at once. Then it forwards the found options to class `ConverterV2`, which uses a common implementation for the conversion of the options. `ConverterV2` also implements the functions to convert Jvm otions and JAR options.
Copy link
Collaborator

@tkilias tkilias Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `ScriptOptionsLinesParserCTPG` class uses the new CTPG basedParser to search for **all** Java specific script options at once. Then it forwards the found options to class `ConverterV2`, which uses a common implementation for the conversion of the options. `ConverterV2` also implements the functions to convert Jvm otions and JAR options.
The `ScriptOptionsLinesParserCTPG` class uses the new CTPG based Parser to search for **all** Java specific script options at once. Then it forwards the found options to class `ConverterV2`, which uses a common implementation for the conversion of the options. `ConverterV2` also implements the functions to convert Jvm otions and JAR options.

common again

`ScriptOptionsLinesParserCTPG` uses an instance of `ScriptImporter` to import foreign scripts. Because the new parser collects all script options at once, but backwards compatibility with existing UDF scripts must be ensured, there is an additional level of complexity in the import script algorithm. The algorithm is described in the following pseudocode snippet:
```
function import(script_code, options_map)
import_option = options_map.find("import")
Copy link
Collaborator

@tkilias tkilias Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import_option = options_map.find("import")
import_options = options_map.find("import")

Should this be plural?

static void doSomething() {}
}
```

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add the expected results


### Parser Implementation V1

The legacy parser (V1) parser searches for one specific script option. The parser starts from beginning of the script code. If found, the parser immediately removes the script option from the script code and returns the option value. It validates the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The legacy parser (V1) parser searches for one specific script option. The parser starts from beginning of the script code. If found, the parser immediately removes the script option from the script code and returns the option value. It validates the
The legacy parser (V1) parser searches for one specific script option. The parser starts from the beginning of the script code. If found, the parser immediately removes the script option from the script code and returns the option value. It validates the

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest of the sentence is missing

### Lexer and Parser Rules Option
`dsn~lexer-parser-rules~1`

Lexer and Parser rules to recognize `%optionKey`, `optionValue`, with whitespace characters as separator. The Parser rules will define the grammar to correctly identify Script Options, manage multiple options with the same key, and handle duplicates.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention simple lexer rules to deal with whitespace and escaping correct

### Ignore lines without script options
`dsn~ignore-lines-without-script-options~1`

In order to avoid lower performance compared to the old implementation, the parser must run only on lines which contain a `%` character.
Copy link
Collaborator

@tkilias tkilias Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In order to avoid lower performance compared to the old implementation, the parser must run only on lines which contain a `%` character.
In order to avoid lower performance compared to the old implementation, the parser must run only on lines which contain a `%` character after only whitespaces.


Tags: V2

### Lexer and Parser Rules Whitespace Sequences
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to continue here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants