How to pass deps, products, other objects to pytask-{r|julia|stata} #203
-
Problempytask-r, pytask-julia, pytask-stata, ..., all have in common that they wrap an executable which is used to run a script in another language than Python which contains the actual logic of a task. The task function in Python is only used to register the task with pytask. The function body is empty since it is replaced with an internal implementation - a wrapper around Currently, options to the executable and arguments to the script are passed via the specific decorator of the method. See the readme of pytask-julia for an example. (It is also the only package which differentiates between options for the executable and arguments to the script.) Passing information like paths to dependency or products to the script via the command line has one huge problem: Command line arguments are accessed in the script via positional indexing although label-based indexing proved to be one of pytask's strengths. Another requirement might be to be able to debug the scripts in another language. Unfortunately, most languages do not have a post-mortem debugger like Python which would enable interfaces around Answers do not have to provide a full solution, but can also tackle sub-problems. Feel free to also provide feedback, dissatisfaction with the current approach and your use-case. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 7 replies
-
Exporting inputs to the script to JSON, yaml, etc.This idea was raised by @hmgaudecker and has two elements. The decoratorThe decorator of the specific packages, In addition to that, the decorator accepts two arguments. The first is a function for converting the file arguments to some format like yaml. The second provides the appropriate file ending. Thus, a decorator has the following signature. def julia(
*,
script: str | Path,
options: Iterable[str],
converter: Callable[Any, str] | None = None,
file_suffix: str | None = None
):
... There will be builtin converters for yaml, json, toml which can be easily selected by passing a string to converter. Custom converters need to pass a function to converter and pass a file_suffix. A converter needs to have the following signature. import json
def json_converter(info: dict[str, Any]) -> str:
return json.dumps(info) The auxiliary fileArguments to the script are exported to some auxiliary file format like JSON or yaml. The path to the auxiliary file is passed as the first argument to the script and can be parsed inside the script keeping the nested structure of dependencies and products. The auxiliary file is stored in some folder inside the build folder. It must be ensured that the filename is unique and persists over multiple pytask runs. When pytask is executed, the file is generated. Users who want to run a failing task can see that path to the file using Uniqueness Uniqueness can be ensured by using the short name of a task and converting all forward slashes, squared brackets, dots and colons to underscores.
|
Beta Was this translation helpful? Give feedback.
-
Thanks for summarising (and extending!!!) what I thought of. A couple of reactions if we go down that route:
|
Beta Was this translation helpful? Give feedback.
-
Switching perspectives: Do we need to get paths in or out?First of all, I like the solution presented above. Just want to present this alternative to have another option to discuss. This solution does not talk about options to the executable, just arguments to the script. The above discussion assumes that pytask knows the dependencies and targets from the decorator and the problem is to pass that information to a script. I think for non Python users it would be more intuitive to do as much as possible in their language and keep the task files minimal. Thus it could be preferable to define the paths of dependencies and targets inside stata, matlab, ... as long as there is a way for pytask to infer them. A general syntax parser that would infer the paths is of course out of the scope. But what about a simple pragma? If this was used to run a python script instead of a task function, it could be look like this: # @pytask-dependency
in_path = "a/b/c.csv"
# do stuff
# @pytask-target
out_path = "d/e/f.png" |
Beta Was this translation helpful? Give feedback.
Exporting inputs to the script to JSON, yaml, etc.
This idea was raised by @hmgaudecker and has two elements.
The decorator
The decorator of the specific packages,
@pytask.mark.r
or@pytask.mark.julia
only accepts options to the executable.In addition to that, the decorator accepts two arguments. The first is a function for converting the file arguments to some format like yaml. The second provides the appropriate file ending.
Thus, a decorator has the following signature.
There will be builtin converters for yaml, json, toml…