A new interface for task products #405

tobiasraabe · 2023-08-07T11:38:50Z

tobiasraabe
Aug 7, 2023
Maintainer

This discussion is a spin-off of the discussions started in #361.

Here, I want to present new ways to define task products and discuss their usefulness.

The goal is not to select just one approach over all the others. All approaches have their strengths and weaknesses. It is more about finding a combination of approaches that offers ease of use to beginners, a nice interface and all necessary flexibility and correctness for power users.

`@pytask.mark.produces` and magic argument `produces`

The decorator @pytask.mark.produces seems unnecessary. When using the @pytask.mark.task decorator, it is already possible to pass values to produces by either using the kwargs keyword or a default value for the argument. It will also be generally possible without the task decorator in v0.4.0.

@pytask.mark.task(kwargs={"produces": Path("out.txt")})   # 1. option
def task_example(produces: Path = Path("out.txt")):       # 2. option
    ...

Thus, having produces as a magical keyword argument gives us the same functionality as the decorator, but one does not need to use the decorator.

+ People know it.
+ No decorator anymore.
- Users are forced to insert all there products under the argument name produces.
- For multiple products, the type has to use a container, and with different kinds of products (as you see later) even less clear.

The `Product` annotation

The Product annotation allows users to declare any argument of a task as a product using an annotation.

from pytask import Product
from typing_extensions import Annotated

def task_example(path: Annotated[Path, Product] = Path("out.txt")):
    ...

+ Products do not need to be called produces anymore.
+ Multiple products can be spread across multiple argument names, making much better use of the namespace.
- People have to learn about annotations.
- People need to be able to annotate the task function which you cannot do with third-party functions (later more).

Allowing tasks to return

So far, task functions were not able to return which seems unintutive in the
beginning but many users made their peace with it.

Mainly, return annotations allow to delegate all of the I/O to pytask and
remove it from the task function. We need a little bit more knowledge about the
internals of pytask which is why it is probably more an interface for intermediate to
advanced users.

pytask works with protocols for nodes. Anything that follows the protocol for Node is a valid dependency or product of a task.

pytask/src/_pytask/node_protocols.py

Lines 10 to 35 in 6017b82

    
           @runtime_checkable 
        
           class MetaNode(Protocol): 
        
               """Protocol for an intersection between nodes and tasks.""" 
        
               name: str | None 
        
               """The name of node that must be unique.""" 
        
               @abstractmethod 
        
               def state(self) -> Any: 
        
                   ... 
        
           @runtime_checkable 
        
           class Node(MetaNode, Protocol): 
        
               """Protocol for nodes.""" 
        
               value: Any 
        
               def load(self) -> Any: 
        
                   ... 
        
               def save(self, value: Any) -> Any: 
        
                   ... 
        
               def from_annot(self, value: Any) -> Any: 
        
                   ...

Here, are two proposals for an interface that allows returns.

Returns via annotations

Similar to function argument annotations, we can use return annotations to specify how the function result should be stored. Here, we specify a path in the annotation. Internally, the path will be converted to a PathNode that can store strings and bytes.

path = Path(__file__).parent.joinpath("file.txt")

def task_example() -> Annotated[str, path]:
    return "Hello, World!"

It is also possible to return any PyTree in the function and match it to a PyTree with the same structure in the annotations.

path1 = Path(__file__).parent.joinpath("file1.txt")
path2 = Path(__file__).parent.joinpath("file2.txt")

def task_example() -> Annotated[str, (path1, path2)]:
    return "Hello, ", "World!"

+ Returns are not function arguments anymore.
- Return annotations are only possible if the user defines the function.

Returns via `@pytask.mark.task`

Similar to kwargs, @pytask.mark.task should receive another argument, for example, produces that receives the same PyTree that you would usually define in the annotation of the return.

path = Path(__file__).parent.joinpath("file.txt")

@pytask.mark.task(produces=path)
def task_example() -> str:
    return "Hello, World!"

+ This approach also works with third-party functions in contrast to return annotations.
+ More suitable for a programmatic API where tasks could be lambda or external functions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A new interface for task products #405

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

A new interface for task products #405

tobiasraabe Aug 7, 2023 Maintainer

@pytask.mark.produces and magic argument produces

The Product annotation

Allowing tasks to return

Returns via annotations

Returns via @pytask.mark.task

Replies: 0 comments

tobiasraabe
Aug 7, 2023
Maintainer

`@pytask.mark.produces` and magic argument `produces`

The `Product` annotation

Returns via `@pytask.mark.task`